# Day 3: Useful stuff

## Itertools



On day 1, we talked about iterables and iterators. As a reminder:
- An iterable is an object that can be iterated over (e.g. a list) and that has an `__iter__` method. 
- An iterator is an object that can be iterated over (e.g. a list) and that has (possibly an `__iter__` and) a `__next__` method.

A bit more information about them from the [Python documentation](https://docs.python.org/3/glossary.html#term-iterable):

- **iterable**
    An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements sequence semantics.

    Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also iterator, sequence, and generator.

- **iterator**
    An object representing a stream of data. Repeated calls to the iterator’s __next__() method (or passing it to the built-in function next()) return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its __next__() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.

Itertools is a module that provides a set of functions that operate on iterables. It is part of the standard library, so you don't need to install it. It creates more complex and powerful iterators from simpler ones. This makes it possible to write very efficient code that does not need to create intermediate lists, so our code is faster and uses less memory.

Why would you want to do that? Well, think about some of the functions like `enumerate()` or `zip()` that we have already seen. They take iterables/iterators as an argument and return more powerful constructs. Itertools provides a lot of other functions that do similar things, but in a more general way.

First let's look at a short glossary of useful functions in itertools (from [here](https://jmduke.com/2013/11/29/itertools)) and then we'll look at how we can use them in the wild.

In [15]:
import itertools

letters = ['a', 'b', 'c', 'd', 'e', 'f']
booleans = [1, 0, 1, 0, 0, 1]
numbers = [23, 20, 44, 32, 7, 12]
decimals = [0.1, 0.7, 0.4, 0.4, 0.5]

In [16]:
#chain() - combines several iterables into one long one
print(itertools.chain(letters, booleans, decimals))
print(list(itertools.chain(letters, booleans, decimals)))

#chain.from_iterable() - flattens one iterable of iterables
print(list(itertools.chain.from_iterable([letters, booleans, decimals])))

<itertools.chain object at 0x107ca6e20>
['a', 'b', 'c', 'd', 'e', 'f', 1, 0, 1, 0, 0, 1, 0.1, 0.7, 0.4, 0.4, 0.5]
['a', 'b', 'c', 'd', 'e', 'f', 1, 0, 1, 0, 0, 1, 0.1, 0.7, 0.4, 0.4, 0.5]


In [17]:
#count() - infinite iterator, returns evenly spaced values starting with the number you pass in
counter = itertools.count(10, 0.1)
print(next(counter))
print(next(counter))

10
10.1


In [22]:
#repeat() - infinite iterator, returns the element you pass in over and over again (can be made finite with times argument)
repeater = itertools.repeat('On', 3)
print(next(repeater))
print(next(repeater))
print(next(repeater))
# print(next(repeater))

On
On
On


In [None]:
#cycle() - infinite iterator, returns elements from the iterable you pass in over and over again
cycle_counter = itertools.cycle(['On', 'Off'])
print(next(cycle_counter))
print(next(cycle_counter))
print(next(cycle_counter))

In [18]:
#compress() - filters one iterable with another
print(list(itertools.compress(letters, booleans)))


['a', 'c', 'f']


In [7]:
#dropwhile() - returns an iterator that returns elements of the iterable after a certain condition becomes false for the first time
print(list(itertools.dropwhile(lambda x: x<5, [1, 4, 6, 4, 1])))
print(list(itertools.dropwhile(lambda x: x<5, [1, 4, 0, 0, 1])))


[6, 4, 1]
[]


In [8]:
#filterfalse() - returns an iterator that returns elements of the iterable for which the passed in function returns false
print(list(itertools.filterfalse(lambda x: x%2, range(10))))

[0, 2, 4, 6, 8]


In [9]:
#zip_longest() - returns an iterator that aggregates elements from two or more iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted
print(list(itertools.zip_longest('abcdefg', range(3), fillvalue='?')))

[('a', 0), ('b', 1), ('c', 2), ('d', '?'), ('e', '?'), ('f', '?'), ('g', '?')]


In [12]:
#isslice() - returns an iterator that returns selected elements from the iterable. It takes three arguments: start, stop, and step
print(list(itertools.islice('abcdefg', 0, None, 2)))

['a', 'c', 'e', 'g']


In [23]:
#starmap() - returns an iterator that computes the function using arguments obtained from the iterable
print(list(itertools.starmap(pow, [(2,5), (3,2), (10,3)])))

[32, 9, 1000]


In [29]:
#tee() - returns several independent iterators (defaults to 2) based on a single original input
it = itertools.tee(range(5), 3)
print(it)
print(list(it[0]))
print(list(it[1]))
print(list(it[2]))

(<itertools._tee object at 0x10d08b1c0>, <itertools._tee object at 0x10d087940>, <itertools._tee object at 0x107b9b9c0>)
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


It is also interesting to note that some methods that were originally in itertools have been moved to the built-in functions in Python 3. For example, `zip()`, `map()` and `filter()` are now built-in functions, but were originally in the itertools library as `izip()`, `imap()` and `ifilter()` (the standard library functions back then returned lists (iterables), not iterators, making them slower and less memory efficient). You can find more information about this [here](https://docs.python.org/3/library/itertools.html#itertools-recipes).

Now let's look at some [recipes](https://docs.python.org/3/library/itertools.html#itertools-recipes) from the itertools documentation. All of these and more are implemented in the `more_itertools` library, which you can [install](https://more-itertools.readthedocs.io/en/latest/index.html) with `pip install more_itertools`.

In [14]:
def batched(iterable, n):
    "Batch data into tuples of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while batch := tuple(itertools.islice(it, n)):
        yield batch

print(list(batched('ABCDEFGHIJKLMNOP', 3)))

[('A', 'B', 'C'), ('D', 'E', 'F'), ('G', 'H', 'I'), ('J', 'K', 'L'), ('M', 'N', 'O'), ('P',)]


In [31]:
def repeatfunc(func, times=None, *args):
    """Repeat calls to func with specified arguments.

    Example:  repeatfunc(random.random)
    """
    if times is None:
        return itertools.starmap(func, itertools.repeat(args))
    return itertools.starmap(func, itertools.repeat(args, times))

print(list(repeatfunc(pow, 5, 2, 5)))

[32, 32, 32, 32, 32]


In [26]:
import collections

def sliding_window(iterable, n):
    # sliding_window('ABCDEFG', 4) --> ABCD BCDE CDEF DEFG
    it = iter(iterable)
    window = collections.deque(itertools.islice(it, n), maxlen=n)
    if len(window) == n:
        yield tuple(window)
    for x in it:
        window.append(x)
        yield tuple(window)

print(list(sliding_window('ABCDEFG', 4)))

[('A', 'B', 'C', 'D'), ('B', 'C', 'D', 'E'), ('C', 'D', 'E', 'F'), ('D', 'E', 'F', 'G')]


In [32]:
def partition(pred, iterable):
    "Use a predicate to partition entries into false entries and true entries"
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = itertools.tee(iterable)
    return itertools.filterfalse(pred, t1), filter(pred, t2)

result = partition(lambda x: x%2, range(10))
print(result)
print(list(result[0]))
print(list(result[1]))

(<itertools.filterfalse object at 0x107da8f40>, <filter object at 0x107da8c40>)
[0, 2, 4, 6, 8]
[1, 3, 5, 7, 9]


In [34]:
set([1,2,4,1,2,3])

{1, 2, 3, 4}

## Functools



Functools is another module in the standard library that provides some useful functions. It is also part of the standard library, so you don't need to install it. It provides some useful functions that can be used to manipulate functions and other callables. We will look at some of them here. You can find more information about them [here](https://docs.python.org/3/library/functools.html) and in [this video](https://www.youtube.com/watch?v=gX9krgyGw1k) with [slides](https://sjirwin.github.io/hitchhikers-guide-to-functools/#/2) which I will follow. Functools mostly deals with higher order functions, which are functions that take other functions as arguments or return functions as their result. We will look at some of them here.

### Simplifying function signatures: partial, partialmethod

`partial` is a function that takes a function and some of its arguments and returns a new function with the arguments already filled in. This is useful when you have a function that takes a lot of arguments, but you want to use it with some of the arguments already filled in. For example, let's say we have a function that takes 2 arguments:

In [1]:
from functools import partial

def add(a, b):
    return a + b

add_one = partial(add, 1)
print(add_one(4))

5


This might seem like a silly example, but it is useful when you have a function that takes a lot of arguments, but you want to use it with some of the arguments already filled in to make your code more readable. [This blog](https://chriskiehl.com/article/Cleaner-coding-through-partially-applied-functions) has some good examples of this. Here is one:

In [7]:
#more complex example where we use partial to create a function
import re 
#re.search(pattern, string, flags=0) 

#implement functions directly on string without partial

print(re.search('[a-zA-Z]\s\=', 'a = 1'))
print(re.search('[a-zA-Z]\s\=', 'a=1'))
print(re.search('[a-zA-Z]\=', 'a=1'))

is_spaced_apart = partial(re.search, '[a-zA-Z]\s\=')
is_grouped_together = partial(re.search, '[a-zA-Z]\=')

print(is_spaced_apart('a = 1'))
print(is_spaced_apart('a=1'))

#have some long text to work with
text = 'x=5 \n y = 118\n Somelinesdonothavebreaks\n z = 17'

for line in text.splitlines():
    if is_spaced_apart(line):
        print(line)

<re.Match object; span=(0, 3), match='a ='>
None
<re.Match object; span=(0, 2), match='a='>
<re.Match object; span=(0, 3), match='a ='>
None
 y = 118
 z = 17


`partialmethod` is similar to `partial`, but it is used to create partial functions that are bound to a class. This means that the first argument of the function will always be the class instance. This is useful when you want to create a method that is bound to a class, but you don't want to create a new class for it. For example, let's say we have a class that has a method that takes 2 arguments:

In [None]:
from functools import partialmethod
class Cell:
    def __init__(self):
        self._alive = True

    @property
    def alive(self):
        return self._alive

    def set_state(self, state):
        self._alive = bool(state)

    set_alive = partialmethod(set_state, True)
    set_dead = partialmethod(set_state, False)

c = Cell()
print(c.alive)
c.set_dead()
print(c.alive)

### Function wrappers: wraps, update_wrapper

Sometimes you want to create a function that wraps another function. This is useful when you want to add some functionality to a function without changing its signature. Compared to partial, we want to add, not remove, arguments. This is where `wraps` comes in. It is a decorator that takes a function and returns a new function that wraps the original function. It is useful when you want to add some functionality to a function without changing its signature (e.g. for compatibility reasons). It basically is syntactic sugar for `update_wrapper`. Here is an example:

In [12]:
#example for wraps
from functools import wraps

def my_decorator(func):
    #@wraps(func)
    def wrapper(*args, **kwargs):
        print('Calling decorated function')
        return func(*args, **kwargs)
    return wrapper

@my_decorator
def example():
    """Docstring"""
    print('Called example function')

example()
print(example.__name__)
print(example.__doc__)

Calling decorated function
Called example function
wrapper
None


This example is probably the most common use case for `wraps`. We have a decorator that takes a function and returns a new function that wraps the original function. We want the function modified by the decorator to have new functionality, but still keep its old name and docstring. This is where `wraps` comes in.

In [1]:
#example for wraps
from functools import wraps

def my_decorator(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        print('Calling decorated function')
        return func(*args, **kwargs)
    return wrapper

@my_decorator
def example():
    """Docstring"""
    print('Called example function')

example()
print(example.__name__)
print(example.__doc__)

Calling decorated function
Called example function
example
Docstring


### Caching: lru_cache, cache, cached_property

lru_cache is a decorator that takes a function and returns a new function that caches the results of the original function. This is useful when you have a function that takes a long time to run, but you want to use it multiple times with the same arguments. It basically is syntactic sugar for `cache`. Here is an example:

In [None]:
from functools import lru_cache

@lru_cache(maxsize=None)
def factorial(n):
    return n * factorial(n-1) if n else 1

print(factorial(10))  # Output: 3628800


cached_property is similar to lru_cache, but it is used to create properties that are cached.

In [None]:
from functools import cached_property

class Circle:
    def __init__(self, radius):
        self.radius = radius

    @cached_property
    def area(self):
        print("Calculating area...")
        return 3.14 * self.radius ** 2

c = Circle(5)
print(c.area)  # Output: Calculating area... 78.5
print(c.area)  # Output: 78.5 (No calculation is performed this time)

### Ordered types and comparisons: total_ordering, cmp_to_key

total_ordering is a decorator that takes a class and returns a new class that implements all of the comparison methods (e.g. __lt__, __le__, __eq__, __ne__, __gt__, __ge__) based on a single comparison method (e.g. __lt__). This is useful when you have a class that implements some of the comparison methods, but not all of them. It basically is syntactic sugar for `cmp_to_key`. Here is an example:

In [None]:
from functools import total_ordering

@total_ordering
class Student:
    def __init__(self, name, grade):
        self.name = name
        self.grade = grade

    def __eq__(self, other):
        return self.grade == other.grade

    def __lt__(self, other):
        return self.grade < other.grade

alice = Student('Alice', 90)
bob = Student('Bob', 85)

print(alice > bob)  # Output: True

In [None]:
from functools import cmp_to_key

def reverse_numeric(x, y):
    return y - x

sorted_array = sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
print(sorted_array)  # Output: [5, 4, 3, 2, 1]

### reduce

reduce is a function that takes a function, an iterable and an optional initializer and returns a single value. It is useful when you want to apply a function to an iterable and get a single value. Here is an example:

In [None]:
from functools import reduce

def multiply(x, y):
    return x * y

product = reduce(multiply, [1, 2, 3, 4, 5])
print(product)  # Output: 120

### Function Overloading: singledispatch, singledispatchmethod

Function overloading is a technique that allows you to define multiple functions with the same name, but different signatures. This is useful when you want to define multiple functions with the same name, but different signatures. For example, let's say we have a function that takes a string and returns a string:

In [None]:
from functools import singledispatch

@singledispatch
def fun(arg, verbose=False):
    if verbose:
        print("Let me just say,", end=" ")
    print(arg)

@fun.register
def _(arg: int, verbose=False):
    if verbose:
        print("Strength in numbers, eh?", end=" ")
    print(arg)

@fun.register
def _(arg: list, verbose=False):
    if verbose:
        print("Enumerate this:")
    for i, elem in enumerate(arg):
        print(i, elem)

fun("Hello, world.")
fun(42, verbose=True)
fun(['apple', 'banana', 'cherry'], verbose=True)

singledispatch method is similar to singledispatch, but it is used to create functions that are bound to a class. This means that the first argument of the function will always be the class instance. This is useful when you want to create a method that is bound to a class, but you don't want to create a new class for it. For example, let's say we have a class that has a method that takes a string and returns a string:

In [None]:
from functools import singledispatchmethod

class Negator:
    @singledispatchmethod
    def negate(self, arg):
        raise NotImplementedError("Cannot negate this type")

    @negate.register
    def _(self, arg: int):
        return -arg

    @negate.register
    def _(self, arg: str):
        return arg[::-1]

n = Negator()
print(n.negate(5))  # Output: -5
print(n.negate("Hello"))  # Output: olleH

## Dataclasses

Dataclasses is a module in the standard library that provides a decorator and functions for creating classes that are used to store data. It gives us some functionality already inbuilt that we normally have to implement ourselves with normal classes over and over again, so it helps us following the DRY principle (don't repeat yourself). They also play nicely with type hints which we discussed last time to provide very readable code. It is also part of the standard library, so you don't need to install it. It got introduced quite late in Python 3.7, so there are several alternatives to achieving the same functionality with different pros and cons:

- [attrs](https://www.attrs.org/en/stable/) is a third-party library that provides a decorator and functions for creating classes that are used to store data. It is very similar to dataclasses, but it has some additional features like validators and converters
- [namedtuple](https://docs.python.org/3/library/collections.html#collections.namedtuple) is a built-in function that creates a tuple subclass with named fields. It is immutable, so you can't change the values of the fields after creation


### Using dataclasses

Let's look at an example of how we can use dataclasses. We will create a class that represents a person. It will have a first name, last name and age. We will also add a method that returns the full name of the person. We will use type hints to make our code more readable.

In [9]:
from dataclasses import dataclass
from typing import Any

@dataclass
class Codon:
    triplet: str
    amino_acid: str
    frequency: float
    info: Any = None


In [11]:
#create two sequences and compare them
att = Codon('ATT', 'Isoleucine', 'I', 0.49)
gtc = Codon('GTC', 'Valine', 'V', 0.20)
tag = Codon('TAG', 'Stop', 0.11, 'Stop codon')
gtc

Codon(triplet='GTC', amino_acid='Valine', frequency='V', info=0.2)

In [None]:
from typing import List

@dataclass
class CodonTable:
    cards: List[Codon]

In [13]:
def make_ecoli_codon_table():
    #construct a codon table from scratch using the Codon dataclass
    codons = ['UUU', 'UUC', 'UUA', 'UUG', 'UCU', 'UCC', 'UCA', 'UCG', 'UAU', 'UAC', 'UAA', 'UAG', 'UGU', 'UGC', 'UGA', 'UGG', 
          'CUU', 'CUC', 'CUA', 'CUG', 'CCU', 'CCC', 'CCA', 'CCG', 'CAU', 'CAC', 'CAA', 'CAG', 'CGU', 'CGC', 'CGA', 'CGG',
          'AUU', 'AUC', 'AUA', 'AUG', 'ACU', 'ACC', 'ACA', 'ACG', 'AAU', 'AAC', 'AAA', 'AAG', 'AGU', 'AGC', 'AGA', 'AGG',
          'GUU', 'GUC', 'GUA', 'GUG', 'GCU', 'GCC', 'GCA', 'GCG', 'GAU', 'GAC', 'GAA', 'GAG', 'GGU', 'GGC', 'GGA', 'GGG']

    amino_acids = ['Phe', 'Phe', 'Leu', 'Leu', 'Ser', 'Ser', 'Ser', 'Ser', 'Tyr', 'Tyr', 'Stop', 'Stop', 'Cys', 'Cys', 'Stop', 'Trp',
               'Leu', 'Leu', 'Leu', 'Leu', 'Pro', 'Pro', 'Pro', 'Pro', 'His', 'His', 'Gln', 'Gln', 'Arg', 'Arg', 'Arg', 'Arg',
               'Ile', 'Ile', 'Ile', 'Met', 'Thr', 'Thr', 'Thr', 'Thr', 'Asn', 'Asn', 'Lys', 'Lys', 'Ser', 'Ser', 'Arg', 'Arg',
               'Val', 'Val', 'Val', 'Val', 'Ala', 'Ala', 'Ala', 'Ala', 'Asp', 'Asp', 'Glu', 'Glu', 'Gly', 'Gly', 'Gly', 'Gly']

    # The frequencies are approximate and in terms of occurrences per thousand codons
    frequencies_in_E_coli = [17.6, 20.3, 14.3, 76.7, 24.7, 21.7, 15.1, 5.4, 13.3, 15.3, 0.8, 0.2, 7.1, 10.0, 1.0, 13.1,
                         12.2, 9.4, 7.3, 39.7, 17.5, 15.2, 17.0, 6.9, 15.1, 10.3, 12.3, 33.6, 6.7, 10.4, 5.9, 11.4,
                         45.7, 22.0, 16.6, 22.0, 27.3, 27.9, 20.1, 8.3, 21.4, 22.1, 24.4, 31.9, 12.0, 13.1, 12.1, 12.2,
                         32.7, 24.6, 11.4, 29.0, 27.8, 26.9, 18.7, 10.8, 21.5, 23.8, 28.7, 38.7, 24.7, 23.5, 14.9, 16.5]
    
    return CodonTable([Codon(codon, amino_acid, frequency) for codon, amino_acid, frequency in zip(codons, amino_acids, frequencies_in_E_coli)])


In [14]:
from typing import List
from dataclasses import dataclass, field

@dataclass
class CodonTable:
    cards: List[Codon] = field(default_factory = make_ecoli_codon_table)



In [17]:
Ecoli_table = CodonTable()
Ecoli_table

CodonTable(cards=CodonTable(cards=[Codon(triplet='UUU', amino_acid='Phe', frequency=17.6, info=None), Codon(triplet='UUC', amino_acid='Phe', frequency=20.3, info=None), Codon(triplet='UUA', amino_acid='Leu', frequency=14.3, info=None), Codon(triplet='UUG', amino_acid='Leu', frequency=76.7, info=None), Codon(triplet='UCU', amino_acid='Ser', frequency=24.7, info=None), Codon(triplet='UCC', amino_acid='Ser', frequency=21.7, info=None), Codon(triplet='UCA', amino_acid='Ser', frequency=15.1, info=None), Codon(triplet='UCG', amino_acid='Ser', frequency=5.4, info=None), Codon(triplet='UAU', amino_acid='Tyr', frequency=13.3, info=None), Codon(triplet='UAC', amino_acid='Tyr', frequency=15.3, info=None), Codon(triplet='UAA', amino_acid='Stop', frequency=0.8, info=None), Codon(triplet='UAG', amino_acid='Stop', frequency=0.2, info=None), Codon(triplet='UGU', amino_acid='Cys', frequency=7.1, info=None), Codon(triplet='UGC', amino_acid='Cys', frequency=10.0, info=None), Codon(triplet='UGA', amino_ac

In [18]:
@dataclass
class Codon:
    triplet: str
    amino_acid: str
    frequency: field(metadata={'unit': 'degrees'})
    info: Any = None

## Building a project: Namespaces, Packages etc

Often it is not enough to have a single python file. You might want to have multiple files that are related to each other. This is where packages come in. A package is a directory that contains a special file called `__init__.py`. This file can be empty, but it is often used to initialize the package. It can contain code that is run when the package is imported. This is useful for initializing the package and importing submodules. For example, let's say we have a package called `mypackage` that contains a module called `mymodule`. We can import the module than in other files. Let's build a simple project with which we can create QR codes to see how it works.

To follow along, you will need to install the [qrcode](https://pypi.org/project/qrcode/), [OpenCV](https://pypi.org/project/opencv-python/) and [Pillow](https://pypi.org/project/Pillow/) libraries. You can do this by running the following commands in your terminal:

```bash
pip install qrcode
pip install opencv-python
pip install Pillow
```

In [19]:
%load_ext autoreload
%autoreload 2

How does Python know where to look for qrcode_utils? It first looks in the current directory; if it does not find anything it looks in the PYTHONPATH. We can see where it looks by printing the `sys.path` variable:

In [4]:
import pandas
import sys
print(sys.path)

print(pandas.__file__)

['/Users/kierandidi/advanced_python_for_scientists/notebooks', '/Users/kierandidi/.pyenv/versions/3.9.13/lib/python39.zip', '/Users/kierandidi/.pyenv/versions/3.9.13/lib/python3.9', '/Users/kierandidi/.pyenv/versions/3.9.13/lib/python3.9/lib-dynload', '', '/Users/kierandidi/.pyenv/versions/3.9.13/envs/standard_env/lib/python3.9/site-packages', '/Users/kierandidi/Documents/improved-diffusion', '/Users/kierandidi/poses_benchmark']
/Users/kierandidi/.pyenv/versions/3.9.13/envs/standard_env/lib/python3.9/site-packages/pandas/__init__.py


In [5]:
try:
    import qrcode_utils
except ImportError:
    print('Module not found')

Module not found


## Modules vs scripts

Modules in Python are files that contain Python code. They can be imported by other modules or scripts. Scripts are files that are meant to be run directly. They are not meant to be imported by other modules or scripts. They are usually run from the command line. They can also be run from other scripts, but this is not recommended. Scripts are usually used to run a program, while modules are used to define functions and classes that can be used by other modules or scripts. You can read more about the different types of modules and scripts [here](https://tenthousandmeters.com/blog/python-behind-the-scenes-11-how-the-python-import-system-works/). 

Often, we sometimes want to use a Python file as a module and sometimes as a script. For example, we might want to use a Python file as a module in our program, but we might also want to run it directly from the command line. This is where the `__name__` variable comes in. It is a special variable that is set to `__main__` when the file is run directly from the command line and to the name of the module when it is imported by another module or script. This allows us to use the same file as a module and as a script (more on that [here](https://realpython.com/if-name-main-python/#:~:text=name%2Dmain%20idiom.-,In%20Short%3A%20It%20Allows%20You%20to%20Execute%20Code%20When%20the,is%20executed%20as%20a%20script.)). 

In [23]:
from modules import script

Mean:  51.785714285714285
Median:  49.5
Lower Quartile:  26.0
Upper Quartile:  73.75
Shout:  HELLO WORLD!
Whisper:  hello world...


If we gather multiple modules into a directory, we call it a package. A package is a directory that contains a special file called `__init__.py`. This file can be empty, but it is often used to initialize the package. It can contain code that is run when the package is imported. This is useful for initializing the package and importing submodules. You can read a bit more about this special file [here](https://dev.to/methane/don-t-omit-init-py-3hga) and [here](https://pcarleton.com/2016/09/06/python-init/).

Here another example on how to use this (based on [this video](youtube.com/watch?v=GxCXiSkm6no)):

In [5]:
from modules import arithmetics #works
arithmetics.add(1,2)

3

The __init__.py files are required to make Python treat directories containing the file as packages. Python packages can be imported as modules, which makes organizing large codebases easier. This is especially true for projects that have multiple modules with interdependencies.

As of Python 3.3, this is no longer strictly necessary for the interpreter to recognize packages due to the introduction of [Implicit Namespace Packages](https://realpython.com/python-namespace-package/#in-short-python-namespace-packages-are-a-way-to-organize-multiple-packages). These allow for the creation of a package without an __init__.py file, and they are useful in large-scale project where you want to separate different parts of the package without having a shared state.

However, using __init__.py files still has some advantages:

1. Explicit is better than implicit: Having an __init__.py file signifies that the directory is a Python package.

2. Backward compatibility: If you're working with older versions of Python, an __init__.py file is required for the directory to be recognized as a package.

3. Ability to execute initialization code: If you need to do any setup when the package is imported, like initializing package-level variables or running other initialization code, you can do this in the __init__.py file.

4. Defining __all__ for package imports: If you want to define a list of modules to be imported when from package import * is encountered, you can specify this list in __init__.py by defining a list named __all__.

So, even though __init__.py is not always strictly necessary in newer versions of Python, it can still be a good idea to include it for these reasons.

It is debated whether these implicit namespace packages are a good idea or not, for more background on that in [this video](https://www.youtube.com/watch?v=2Xvb79hOUdM).

Let's build a module with which we can create QR codes to see how it works.

*see test.py in src folder*

Here we used the `qrcode_utils.py` module to get some added functionality to the basic packages we imported.

But we can also create our own package, which just means putting multiple modules in a folder with an `__init__.py` file to indicate to Python to treat this folder as a package.

*see test.py in src folder*

## Data Structures in Python (optional)

### Dictionaries

In [24]:
#create a dictionary
d = {'a': 1, 'b': 2, 'c': 3}
x = 5
print(x.__hash__())
print(hash("a"))
# print(d.__hash__)
# print(hash(d))

5
2258155850630925145


In [30]:
#ordered dictionary; maintains order of keys as they are added
from collections import OrderedDict
d = OrderedDict()
d['a'] = 1
d['b'] = 2
d['c'] = 3
print(d)

#default dictionary; returns a default value if the key is not found
from collections import defaultdict
d = defaultdict(lambda: 0)
d['a'] = 1
d['b'] = 2
print(d['c'])

#ChainMap - combines several dictionaries or mappings (updates are made to the first dictionary)
from collections import ChainMap
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'd': 4}
d3 = {'e': 5, 'f': 6}
d = ChainMap(d1, d2, d3)
print(d['a'])
print(d['c'])
d['e'] = 10
print(d)

OrderedDict([('a', 1), ('b', 2), ('c', 3)])
0
1
3
ChainMap({'a': 1, 'b': 2, 'e': 10}, {'c': 3, 'd': 4}, {'e': 5, 'f': 6})
