# Additional and Advanced
## More on data types

### Sets and frozen sets
* A set is a list without duplicates
* It can't contain mutable objects like lists
* A set is mutable, a frozen set is immutable

In [None]:
myset = {1,2,3,1,2,3,4,5,1}
print(myset)

In [None]:
myset = set("I love Python")
print(myset)

In [None]:
myfset = frozenset("I love Python!")
print(myfset)

#### Operations

In [None]:
myset = {1,2,3,1,2,3,4,5,1}
myset.add(7)
print(myset)

In [None]:
myset.update([10,13,17])
print(myset)

In [None]:
myset.remove(1)
print(myset)

In [None]:
myset.remove(3)
print(myset)

In [None]:
v = myset.pop()
print(myset)

In [None]:
myset.clear()
print(myset)

In [None]:
s1 = {1,2,3,4,5}
s2 = {3,4,5,6,7}

In [None]:
s1.difference(s2) # s1.difference_update(set2)
s1-s2 #s1-=s2

In [None]:
s1.symmetric_difference(s2) #s1.difference_update(set2)
s1^s2  #s1=^s2

In [None]:
s1.intersection(s2)
s1&s2

In [None]:
s1.union(s2)
s1|s2

In [None]:
s1.isdisjoint(s2)

In [None]:
s1.issubset(s2)
s1<=s2

In [None]:
s1.issuperset(s2)
s1>=s2

### Dictionaries
* Also curly braces
* Key-Value pairs
* Keys: normally numbers or strings
* Values: arbitrary
* Creation from sequences (lists or tuples) possible

In [None]:
dict1 = {"A":"cat", "B":"dog", "C":"fish", "D":"hamster"}

In [None]:
seq = [(1,"cat"),(2,"dog"),(3,"fish"),(4,"hamster")]
dict2 = dict(seq)

In [None]:
key_seq = [1,2,3,4]
val_seq = ("cat", "dog", "fish", "hamster")
dict3 = dict(zip(key_seq, val_seq))

In [None]:
print(dict1)
print(dict2)
print(dict3)

In [None]:
print(dict1.keys())

In [None]:
print(dict1.values())

In [None]:
print(dict1["C"])

In [None]:
dict1.update(dict2)

In [None]:
print(dict1)

## Slice operator revisited
* **[start:end:step]**
* **end** is exclusive
* Every part is optional
* Default for **start** is 0
* Default for **end** is the length of the list
* Default for **step** is 1
* Parameters can be negative - counting backwards

       [El1, El2, El3, El4, El5]
         0    1    2    3    4
        -5   -4   -3   -2   -1

In [None]:
liste = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
print(liste[3:15:2])
print(liste[15:3:2])
print(liste[15:3:-2])

## Unpacking operator
* Returns the elements of a sequence seperately
* **\*** for lists and tuples
* **\*\*** for dictionaries
* For example when using as function argument

In [None]:
def func(a, b, c):
    return (a+b)*c

list = [1, 2, 3]      # or list = ['apple', 'strawberry', 3]
print(func(*list))

In [None]:
def func2(*l):
    sum = 0
    for item in l:
        sum += item
    return sum

print(func2(1,2,3,4,5))

In [None]:
coord = {'latitude': '37.24N', 'longitude': '-115.81W'}
print('Coordinates: {latitude}, {longitude}'.format(**coord))

## Lists of lists

In [None]:
listOfLists = [[0] * 4] * 5
print(listOfLists)

In [None]:
listOfLists[0][1] = 42
print(listOfLists)

In [None]:
innerList = [0] * 4
listOfLists = [innerList] * 5
innerList[1] = 42
print(listOfLists)

In [None]:
listOfLists = [[0] * 4 for i in range(5)]
listOfLists[0][1] = 42
print(listOfLists)

### Similar when copying lists

In [None]:
colors = ['pink', 'orange']      
copycolors = colors 
print(colors)
print(copycolors)     

In [None]:
from IPython.display import Image
Image("pics/copy1.png", width=500)

In [None]:
copycolors = ['pink', 'red']       
print(colors)
print(copycolors)         

In [None]:
Image("pics/copy2.png", width=500)

In [None]:
copycolors = colors
copycolors[1] = 'red'
print(colors)
print(copycolors) 

In [None]:
Image("pics/copy3.png", width=500)

In [None]:
copycolors = colors[:]
copycolors[1] = 'red'
print(colors)
print(copycolors) 

#### Same problem again when copying list with lists

In [None]:
colors = ['pink', 'orange', ['green', 'blue']] 
copycolors = colors[:]  
print(colors)
print(copycolors)   

In [None]:
copycolors[1] = 'red'
print(colors)
print(copycolors)  

In [None]:
copycolors[2][1] = 'violet'
print(colors)
print(copycolors)  

Same problem again when changing sublists!

Solution: **deepcopy** from module **copy**

In [None]:
from copy import deepcopy

colors = ['pink', 'orange', ['green', 'blue']] 
copycolors = deepcopy(colors)   
print(colors)
print(copycolors)  

In [None]:
copycolors[2][1] = 'violet'
print(colors)
print(copycolors)  

## Generators
* **range(0,1000000000)** would need lots of memory
* But **range()** does not return a list but an iterator
* Next element generated when needed
* Generator functions have no **return** but one or more **yield**

In [None]:
def pizza_generator():
    yield("Yeast")
    yield("Flour")
    yield("Water")
    yield("Salt")
    yield("Tomatoes")
    yield("Spices")
    yield("Mozzarella")

print(pizza_generator())

for item in pizza_generator():
    print(item)

pizza = pizza_generator()
print(next(pizza))
print(next(pizza))

## Comprehensions
For lists using **[]**, generators using **()** and sets using **{}**

In [None]:
colors = ["red", "green", "yellow", "blue"]
things = ["house", "car", "tree"]
colored_things = [(x,y) for x in colors for y in things]
print(colored_things)

In [None]:
#Set of all sums of numbers between 0 and 2: 
{x+y for x in range(0,3) for y in range(0,3)}

## Function's arguments
* Duck typing is heavily used! 
* Function parameters don't have a special type
* When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.

In [None]:
class Duck:
    def quack(self):
        print("Quack, quack!")
    def fly(self):
        print("Flap, Flap!")

class Person:
    def quack(self):
        print("I'm Quackin'!")
    def fly(self):
        print("I'm Flyin'!")

def in_the_forest(mallard):
    mallard.quack()
    mallard.fly()

in_the_forest(Duck())
in_the_forest(Person())

## Closures
* Nested function (a function in a function)	
* Inner function is returned by the outer as function object
* Returned function can refer to variable defined by outer function even after it's called

In [None]:
def gen_power_func(n):      #outer function
    def nth_power(x):       #inner function / closure
        return x**n         #n defined in outer function!!
    return nth_power

to_4 = gen_power_func(4)
print(to_4(2))              #it still knows n = 4!

## Decorators
* Objects that modify a function, method or class defintion
* Basically again a function, that gets a function as argument
* **@property, @x.setter,  @staticmethod**

In [None]:
def simple_decorator(function):
    print("doing decoration")
    return function

@simple_decorator
def function():
    print("inside function")

function()

In [None]:
import time
def timing_function(some_function):
    def wrapper():
        t1 = time.time()
        some_function()
        t2 = time.time()
        return "Time it took to run the function: " + str((t2 - t1)) + "\n"
    return wrapper

In [None]:
@timing_function
def my_function():
    num_list = []
    for num in (range(0, 10000)):
        num_list.append(num)
    print("\nSum of all the numbers: " + str((sum(num_list))))

In [None]:
print(my_function())

## Documentation
* Comments: Use them - but use them wisely
* Docstrings: Always in `'''docstring'''`
    * Beginning of file: Description of content
    * After class definition: Description of class
    * After function definition: Description of function (what does it do, what parameter it takes...)
* Docstring are printed when using `help` or `pydoc`

## Regular Expressions
Challenge: 
* Find sequence *ACGTG* or *GCGTG*  --> 2 possibilities
* Find a sequence starting with A or G, then has C or T, then 1 to 4 Cs, then T, then any base, then maybe a G and ending any base but A --> $ 2 \cdot 2 \cdot 4 \cdot 1 \cdot 4 \cdot 2 \cdot 3 = 384$ possibilities
* Find any sequence starting with A and ending with T --> infinite possibilities

Solution: Regular Expressions:
* `.` any character
* `[ACG]` any (one!) of A, C, G
* `[B-O]` any (one!) of the characters from B to O (case sensitive)
* `[^A]` any (one!) but A
* `{n,m}` n to m times repeated
    * `?` is `{0,1}` optional
    * `*` is `{0,}` 0 or more
    * `+` is `{1,}` at least 1
* `^A` A at the beginning of the "sentence"
* `A$` A at the end of the "sentence"
* `(...)` subexpressions

Examples:
* Any sequence starting with A and ending with T: `placeholder`
* A sequence starting with A or G, then has C or T, then 1 to 4 Cs, then T, then any base, then maybe a G and ending any base but A:  `placeholder`
* A G at the beginning of the sequence and the T at the and of a sequence: `placeholder`

### RegExp in Python
* A regular expression is used like a string with a prepended r: `r"Some.Reg*[Ex]?"`
* Module `re`
* `matchobject = re.search(regex, string)`

In [None]:
import re
s1 = "Meier is a common name."
s2 = "But we have many ways to write Mayr."

In [None]:
o=re.search(r"M[ae][iy]e?r", s1) 
print(o)

In [None]:
o.group()

In [None]:
o.span()

In [None]:
s = s2 + "\n" + s1
print(s)

In [None]:
o=re.search(r"^M[ae][iy]e?r", s)       #only beginning of whole string
print(o)

In [None]:
o=re.search(r"^M[ae][iy]e?r", s, re.M) #M for MULTILINE; beginning of every line
print(o)

In [None]:
s = s1 + " " + s2
print(s)
re.findall(r"M[ae][iy]e?r", s)

In [None]:
re.sub(r"M[ae][iy]e?r", "Müller", s)

In [None]:
s = "A,B-C.D E"
re.split(r"[ ,.-]", s)

#### Character classes 
* `\d = [0-9]` decimal digits `\D` complement
* `\s = [ \t\n\r\f\v]` all whitespace `\S` complement
* `\w = [a-zA-Z0-9_]` alphanumeric `\W` complement
* `\b` empty string in beginning or end of a word
* `\B` empty string not in beginning or end of a word