# PYTHON

Python is a high-level, general-purpose programming language that is widely used for web development, scientific computing, data analysis, and a wide range of other applications. It is known for its simplicity, readability, and flexibility, and is a popular choice for beginners and experienced programmers alike.

Python is an interpreted language, which means that it is executed at runtime by an interpreter, rather than being compiled into machine code that can be run directly on a computer's hardware. This makes Python programs easy to write and debug, but also means that they may not be as fast or efficient as programs written in compiled languages like C or C++.

Python has a large standard library that provides a wide range of built-in functionality, as well as a rich ecosystem of third-party libraries and frameworks that allow developers to do everything from building web applications to working with data and machine learning.

## 002. Regular Expressions

Both patterns and strings to be searched can be Unicode strings (str) as well as 8-bit strings (bytes). However, Unicode strings and 8-bit strings cannot be mixed: that is, you cannot match a Unicode string with a byte pattern or vice-versa; similarly, when asking for a substitution, the replacement string must be of the same type as both the pattern and the search string.

In [1]:
import sys
from pathlib import Path

from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

In [2]:
import re


### 002.001 Verbose Flag

Flags are used to perform additional operations on the pattern or the search

1. Compile the re `pattern` into var `regex`, match text
1. Repeat, but use the flag that will allow you to add comments to each part of the re. Match text
1. Do the same but with an inline flag (NOTE the flag for verbose is NOT v, it's extremely different)
1. Do the same but without flags, just comments as extension notations


In [3]:
text = "00.12"
pattern = r"\d+\.\d+"

# The comments:
# \d+  # the integral part
# \.   # the decimal point
# \d*  # some fractional digits

# solution


### 002.002 New line flag

1. Compile `pattern` into `regex`. Sub `needle` for `haystack`, notice that nothing changes
1. Repeat, but use the correct flag to include "\n" in the `.`
1. Do the same, but this time use inline flag and re.sub instead of compiling

In [4]:
pattern = r",.and"
needle = ", and"
haystack = """It was a bright cold day in April,
and the clocks were striking thirteen."""
# solution


### 002.003 Case insensitive

Flags are used to perform additional operations on the pattern or the search

1. Compile `pattern` into `regex`, replace needle into haystack and notice that it doesn't replace anything
1. Repeat, but use the correct case insensitive flag. Note that it replaces the first instance, but no more
1. Repeat, but change re to use an inline flag and just use `re.sub` instead of compiling
1. Repeat again, this time compile again and use two flags to change all occurrences
1. And again, back to using `re.sub` and inline flag to replicate previous step

In [5]:
haystack = """It was the best of times,
it was the worst of times,
it was the age of wisdom,
it was the age of foolishness,
it was the epoch of belief,
..."""

pattern = r"^it was"
needle = "it will be"
# solution


### 002.004 String replace, greedy matches

1. Use text.replace to save a version of text without newlines in `one_line`
1. Prove that you cannot use re1 with string replace
1. Prove that you can do some replacement with a re only with the appropriate re method. Note that output is now 'It was the [something] of [something], ...',
1. Change the re to be non greedy to obtain "It was the [something] of [something], it was the [something] of [something], ..."

In [6]:
text = """It was the best of times,
it was the worst of times,
it was the age of wisdom,
it was the age of foolishness,
it was the epoch of belief,
..."""
pattern = r"the .+ of .+,"
replacement = "the [something] of [something],"
# solution


### 002.005 Repetitions

1. Try to run `pattern1` on `one_line`, with the re method that only works at the beginning of the line, and see it fail
1. Fix it by using the one that works everywhere in the line
1. Another way of fixing it is to make the re case insensitive
1. Run `pattern2` with the repetition qualifier {1,5} and prove that when searching (and storing the result), it only returns one group, and it contains the last of the 'it was..." (it was the epoch of belief)
1. Change it so that it finds the 1st group instead

In [7]:
text = """It was the best of times,
it was the worst of times,
it was the age of wisdom,
it was the age of foolishness,
it was the epoch of belief,
..."""
one_line = text.replace("\n", "")
pattern1 = r"it was"
pattern2 = r"(it was the .+? of .+?, ?)"
# solution


### 002.006 Sets of characters

`[]` is used to indicate a set of characters

1. String-replace `______` in `text1` with a string that makes `pattern` work
1. Find two regular expressions consisting of character sets, that when applied to re.search will match `1-2` in text2. Print the group it found
1. Find two regular expressions consisting of character sets, that when applied to re.search will match `](` in text3. Print the group it found

In [8]:
text1 = "12f256z won't be found, ______ will"
needle = "______"
pattern = r"[\d().]{4,}"

text2 = "1-2=-1"
text3 = "There are two ways to match ](I can confirm)"
# solution


### 002.007 Non capturing group

1. Q: Non-capturing groups and positive lookaheads can be confusing as they are  similar. Can you tell the difference?
1. Use `re.findall` to find matches for `pattern` in `text`
1. Add a non capturing group to only match those which precede "age". Note that "age" is part of the match
1. Change it so that age is not part of the match

In [9]:
text = """It was the best of times,
it was the worst of times,
it was the age of wisdom,
it was the age of foolishness,
it was the epoch of belief,
..."""

pattern = r"it was the"
# solution
