# PYTHON

Python is a high-level, general-purpose programming language that is widely used for web development, scientific computing, data analysis, and a wide range of other applications. It is known for its simplicity, readability, and flexibility, and is a popular choice for beginners and experienced programmers alike.

Python is an interpreted language, which means that it is executed at runtime by an interpreter, rather than being compiled into machine code that can be run directly on a computer's hardware. This makes Python programs easy to write and debug, but also means that they may not be as fast or efficient as programs written in compiled languages like C or C++.

Python has a large standard library that provides a wide range of built-in functionality, as well as a rich ecosystem of third-party libraries and frameworks that allow developers to do everything from building web applications to working with data and machine learning.

## 002. Regular Expressions

Both patterns and strings to be searched can be Unicode strings (str) as well as 8-bit strings (bytes). However, Unicode strings and 8-bit strings cannot be mixed: that is, you cannot match a Unicode string with a byte pattern or vice-versa; similarly, when asking for a substitution, the replacement string must be of the same type as both the pattern and the search string.

## 00X.000 Assets

Some assets to avoid too much typing

| Maybe       | Data |
|-------------|------|
| Show me     | 1233 |

In [4]:
import sys
from pathlib import Path

current_dir = Path().resolve()
while current_dir != current_dir.parent and current_dir.name != "katas":
    current_dir = current_dir.parent
if current_dir != current_dir.parent:
    sys.path.append(current_dir.as_posix())

In [5]:
import re
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

### 002.001 Verbose Flag

Flags are used to perform additional operations on the pattern or the search

1. Compile the re `"\d+\.\d*"` into pattern1
1. Compile the same re into pattern2, but use the correct flag to add comments to each part of the re  (pattern2)
1. Do the same but with an inline flag (pattern3)
1. Do the same but without flags, just comments as extension notations (pattern4)
1. assert that all can match `"00.12"`


In [6]:
text = "00.12"
# solution
pattern1 = re.compile(r"\d+\.\d*")
pattern2 = re.compile(
    r"""\d +  # the integral part
        \.    # the decimal point
        \d *  # some fractional digits""", 
    re.VERBOSE)
pattern3 = re.compile(
    r"""(?x)  # enable these comments
        \d +  # the integral part
        \.    # the decimal point
        \d *  # some fractional digits"""
    )
pattern4 = re.compile(r"\d+(?# the integral part)\.(?# the decimal point)\d*(?# some fractional digits)")
assert pattern1.match(text)
assert pattern2.match(text)
assert pattern3.match(text)
assert pattern4.match(text)

### 002.002 New line flag

1. Compile `re1` into `pattern`. Run it against text and save result in `changed`, notice that nothing changes. Assert that text and changed are the same
1. Repeat, but use the correct flag to include "\n" in the `.`
1. Do the same, but this time use inline flag and re.sub instead of compiling. Save result to `last_changed`, and assert it is the same as `changed` from step (2)

In [7]:
text = """It was a bright cold day in April,
and the clocks were striking thirteen."""
re1 = r",.and"
# solution
pattern = re.compile(re1)
changed = pattern.sub(", and", text)
changed
assert(changed == text)

pattern = re.compile(r",.and", re.DOTALL)
changed = pattern.sub(", and", text)
changed
assert(changed != text)

last_changed = re.sub(r"(?s),.and", ", and", text)
last_changed
assert(changed == last_changed)

'It was a bright cold day in April,\nand the clocks were striking thirteen.'

'It was a bright cold day in April, and the clocks were striking thirteen.'

'It was a bright cold day in April, and the clocks were striking thirteen.'

### 002.003 Case insensitive

Flags are used to perform additional operations on the pattern or the search

1. Compile `re1` into `pattern`, with the correct flag to make it case insensitive, and run on text. Print out as `f"1: {changed}"`
1. Note that it doesn't replace anything
1. Repeat, but use the correct case insensitive flag. Note that it replaces the first instance, but no more
1. Repeat, but change re to use an inline flag and just use `re.sub` instead of compiling
1. Repeat again, this time compile again and use two flags to change all occurrences
1. And again, back to using `re.sub` and inline flag to replicate previous step

In [8]:
text = """It was the best of times, 
it was the worst of times, 
it was the age of wisdom, 
it was the age of foolishness, 
it was the epoch of belief, 
..."""

re1 = r"^it was"
sub1 = "it will be"
# solution
pattern = re.compile(re1)
changed = pattern.sub(sub1, text)
f"1: {changed}"
pattern = re.compile(re1, re.IGNORECASE)
changed = pattern.sub(sub1, text)
f"2: {changed}"
changed = re.sub(r"(?i)^it was", "it will be", text)
f"3: {changed}"

pattern = re.compile(re1, re.MULTILINE | re.IGNORECASE)
changed = pattern.sub("it will be", text)
f"4: {changed}"
changed = re.sub(r"(?im)^it was", "it will be", text)
f"5: {changed}"



'1: It was the best of times, \nit was the worst of times, \nit was the age of wisdom, \nit was the age of foolishness, \nit was the epoch of belief, \n...'

'2: it will be the best of times, \nit was the worst of times, \nit was the age of wisdom, \nit was the age of foolishness, \nit was the epoch of belief, \n...'

'3: it will be the best of times, \nit was the worst of times, \nit was the age of wisdom, \nit was the age of foolishness, \nit was the epoch of belief, \n...'

'4: it will be the best of times, \nit will be the worst of times, \nit will be the age of wisdom, \nit will be the age of foolishness, \nit will be the epoch of belief, \n...'

'5: it will be the best of times, \nit will be the worst of times, \nit will be the age of wisdom, \nit will be the age of foolishness, \nit will be the epoch of belief, \n...'

### 002.004 String replace, greedy matches

1. Use text.replace to save a version of text without newlines in `one_line`. Display as `f"1: {one_line}"`
1. Prove that you cannot use the re `the .+ of .+,` with string replace by saving result to variable `changed`
1. Prove that you can do a replacement with a re only with the appropriate re method, and save to `changed`
1. Note that `changed` is now 'It was the [something] of [something], ...', and changed the re to be non greedy to obtain "It was the [something] of [something], it was the [something] of [something], ..."

In [9]:
text = """It was the best of times, 
it was the worst of times, 
it was the age of wisdom, 
it was the age of foolishness, 
it was the epoch of belief, 
...""" 
# solution
one_line = text.replace("\n", "")
f"1: {one_line}"
changed = one_line.replace(r"the .+ of .+,", "the [something] of [something]")
f"2: {changed}"
changed = re.sub(r"the .+ of .+,", "the [something] of [something],", one_line)
f"3: {changed}"
changed = re.sub(r"the .+? of .+?,", "the [something] of [something],", one_line)
f"4: {changed}"


'1: It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, ...'

'2: It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, ...'

'3: It was the [something] of [something], ...'

'4: It was the [something] of [something], it was the [something] of [something], it was the [something] of [something], it was the [something] of [something], it was the [something] of [something], ...'

### 002.005 Repetitions

1. Try to run the re "it was" on `one_line`, with the re method that only works at the beginning of the line, and see it fail
1. Fix it by using the one that works everywhere in the line
1. Another way of fixing it is to make the re case insensitive
1. Run the re `(it was the .+? of .+?, ?)` with the repetition qualifier {1,5} and prove that when searching, it only returns one group, and it contains the last of the 'it was..." (it was the epoch of belief)
1. Change it so that it finds the 1st group instead

In [10]:
text = """It was the best of times, 
it was the worst of times, 
it was the age of wisdom, 
it was the age of foolishness, 
it was the epoch of belief, 
...""" 
one_line = text.replace("\n", "")
# solution
found = re.match("it was", one_line)
assert found is None
found = re.search("it was", one_line)
assert(found)
found = re.match("it was", one_line, flags=re.IGNORECASE)
assert(found)

one_line = text.replace("\n", "")
matches = re.search(r"(it was the .+? of .+?, ?){1,5}", one_line)
matches.group(1)
matches = re.search(r"(it was the .+? of .+?, ?){1,5}?", one_line)
matches.group(1)


'it was the epoch of belief, '

'it was the worst of times, '

### 002.006 Sets of characters

`[]` is used to indicate a set of characters

1. Use string replace on `text1` to make `re1` pass
1. There are two ways to handle the "-" in `text2`; use them both. Should match '1-2'
1. There are two ways to handle the "]" in `text3`; use them both. Should match ']('

In [11]:
text1 = "12f256z won't be found, ______ will"
re1 = r"[\d().]{4,}"

text2 = "1-2=-1"
text3 = "There are two ways to match ](I can confirm)"
# solution

text1 = text1.replace("______", "18(.)2.122.")
re.search(re1, text1).group()
re.search(r"[-0-9]+", text2).group()
re.search(r"[1\-2]+", text2).group()
re.search(r"[](]+", text3).group()
re.search(r"[(\]]+", text3).group()


'18(.)2.122.'

'1-2'

'1-2'

']('

']('

### 002.007 Non capturing group

1. Use `re.findall` to find matches for `re1`
1. Add a non capturing group to only match those which preceed "age". Note that "age" is part of the match
1. Change it so that age is not part of the match

In [12]:
text = """It was the best of times, 
it was the worst of times, 
it was the age of wisdom, 
it was the age of foolishness, 
it was the epoch of belief, 
..."""

re1 = r"it was the"
# solution
re.findall(re1, text)
re.findall(r"it was the (?:age)", text)
re.findall(r"(it was the) (?:age)", text)

['it was the', 'it was the', 'it was the', 'it was the']

['it was the age', 'it was the age']

['it was the', 'it was the']

### 002.008 Named groups

1. Create re1 so that the the remaining statements work
1. Now make a dict of the two named entries
1. Create a named group 'quote' for the re ['"] and use it with findall and a backreference to match the pairs of quotes in `text`
1. Apply it to `text2` and note how it breaks because the quotes do not match
1. Use the named groups as replacements in a sub to get the string `'1: I am an \'xxxx\', a "xxxx", and a \'xxxx\'.'`

In [13]:
# re1 = ...
# m = re.match(re1, "Malcolm Reynolds")
# assert ... == 'Malcolm'
# assert ... == 'Reynolds'

text = "1: I am an 'architect', a \"stonemason\", and a 'killer'."
text2 = "2: I am an 'architect\", a 'stonemason\", and a 'killer'."
# solution
re1 = r"(?P<first_name>\w+) (?P<last_name>\w+)"
m = re.match(re1, "Malcolm Reynolds")
assert m.group('first_name') == 'Malcolm'
assert m.group('last_name') == 'Reynolds'

m.groupdict()

re2 = r"(?P<quote>['\"])(.+?)(?P=quote)"
re.findall(re2, text)
re.findall(re2, text2)
re.sub(re2, "\g<quote>xxxx\g<quote>", text)


{'first_name': 'Malcolm', 'last_name': 'Reynolds'}

[("'", 'architect'), ('"', 'stonemason'), ("'", 'killer')]

[("'", 'architect", a '), ("'", 'killer')]

'1: I am an \'xxxx\', a "xxxx", and a \'xxxx\'.'

### 002.009

1. Use lookahead to match "I am a sick man" and print it with `m.group()`
1. Change it to negative lookahead to match "I am a sick man... I am an angry man" and print it with `m.group()`
1. Use positive lookbehind to match "man... " and print it with `m.group(1)`
1. Use negative lookbehind to match "man." and print it with `m.group(1)`


In [14]:
text = "I am a sick man... I am an angry man."
# solution
the_re = r"^.+man(?=...)"
m = re.search(the_re, text)
m.group()

the_re = r"^.+man(?!...)"
m = re.search(the_re, text)
m.group()

the_re = r"(?<=sick) (man\W+)"
m = re.search(the_re, text)
m.group(1)

the_re = r"(?<!sick) (man\W+)"
m = re.search(the_re, text)
m.group(1)

'I am a sick man'

'I am a sick man... I am an angry man'

'man... '

'man.'

#002.010 angle groups

1. Create a group 'angle' for the opening < in the optional angle brackets in, then create `re1` so that it uses the yes-pattern|no-pattern extension notation to pass all the commented out assertions

In [15]:
pass1 = '<user@host.com>'
pass2 = 'user@host.com'
fail1 = '<user@host.com'
fail2 = 'user@host.com>'

# re1 = ...
# assert re1.match(pass1)
# assert re1.match(pass2)
# assert re1.match(fail1) is None
# assert re1.match(fail2) is None

# solution

re1 = re.compile(r"(?P<angle><)?\w+@\w+\.\w+(?(angle)>|$)")
assert re1.match(pass1)
assert re1.match(pass2)
assert re1.match(fail1) is None
assert re1.match(fail2) is None

### 002.011 Split

1. Extract `['words', 'words', 'words']` from words using only list comprehension and re.split
1. The same, but also preserve the `!`
1. How do you use split, and only split, to get `['', '...', 'words', ', ', 'words', '...', '']` from `_words_` ?
1. Use only split to get `` from `_words2_`
1. Split into 2 groups using the correct maxsplit argument `['words', 'words! words!']`
1. Use a flag to extract `['0', '3', '9']` from hex_color


In [33]:
words = "Words! Words! Words!"
hex_color = "0a3B9f"
_words_ = "...words, words..."
_words2_ = "...words..."

# solution
[word.lower() for word in re.split(r'\W+', words) if word]

# () preserve the match
[word.lower().strip() for word in re.split(r'(\W+)', words) if word]

# () prepends and append an empty match, so that the  pattern is always index 1, 3, 5...
re.split('(\W+)', _words_)

# only a sequence of 1 empty match max
re.split('(\W*)', _words_)


[word.lower() for word in re.split(r'\W+', words, maxsplit=1) if word]
[h for h in re.split('[a-f]+', hex_color, flags=re.IGNORECASE) if h]

['words', 'words', 'words']

['words', '!', 'words', '!', 'words', '!']

['', '...', 'words', ', ', 'words', '...', '']

['',
 '...',
 '',
 '',
 'w',
 '',
 'o',
 '',
 'r',
 '',
 'd',
 '',
 's',
 ', ',
 '',
 '',
 'w',
 '',
 'o',
 '',
 'r',
 '',
 'd',
 '',
 's',
 '...',
 '',
 '',
 '']

['words', 'words! words!']

['0', '3', '9']

### 002.012 finditer

1. Use finditer and a loop to print all the words starting with w in text

In [59]:
text = """I am forced to the appalling conclusion that I would never have become a
    writer but for Joan's death, and to a realization of the extent to which this
    event has motivated and formulated my writing. I live with the constant threat
    of possession, and a constant need to escape from possession, from Control""".replace("\n", "")
text
# solution

matches = re.finditer(r"\b(w)\w+", text)
for match in matches:
    match.group()

"I am forced to the appalling conclusion that I would never have become a    writer but for Joan's death, and to a realization of the extent to which this    event has motivated and formulated my writing. I live with the constant threat    of possession, and a constant need to escape from possession, from Control"

'would'

'writer'

'which'

'writing'

'with'

### 002.013

1. Use re.sub with a replacement function to generete `Think of an odd digit: [>>>7<<<]. Now think of an even one: [>>>6<<<]` from def. Call it three times to prove the numbers are always different

In [79]:
import random

text = "Think of an odd digit: [1]. Now think of an even one: [2]"
# solution
def replacement(matchobj):
    num = int(matchobj.group(0))
    if num % 2 == 1:
        return f">>>{random.choice([1, 3, 5, 7, 9])}<<<"
    else:
        return f">>>{random.choice([0, 2, 4, 6, 8])}<<<"
    
re.sub(r"\d", replacement, text)
re.sub(r"\d", replacement, text)
re.sub(r"\d", replacement, text)


'Think of an odd digit: [>>>5<<<]. Now think of an even one: [>>>0<<<]'

'Think of an odd digit: [>>>1<<<]. Now think of an even one: [>>>2<<<]'

'Think of an odd digit: [>>>5<<<]. Now think of an even one: [>>>0<<<]'

### 002.014 Match