<h1>My Personal Python Bible</h1>

by Ing. Giovanni Frison

In [1]:
from datetime import date
print(f' last update: {date.today()}')

 last update: 2022-05-09


# TOC

1. [PEP8 naming convention](#PEP8-naming-convention)
2. [Everything is an Object](#Everything-is-an-Object)
3. [Modules](#Modules)
    1. [Some clarifications](#Some-clarifications)
    2. [Reload a module](#Reload-a-module)
4. [Packages](#Packages)
    1. [Package structure](#Package-structure)
    2. [Package Namespace PEP 420](#Package-Namespace-PEP-420)
4. [Variables and Memory](#Variables-and-Memory)
    1. [id function](#id-function)
    2. [Reference Counting](#Reference-Counting)
    3. [Shared Reference](#Shared-Reference)
    4. [Garbage Collection](#Garbage-Collection)
    5. [Object Mutability](#Object-Mutability)
    6. [Variable Equality](#Variable-Equality)
5. [Built-in methods](#Built-in-methods)
   1. [isinstance(object, class)](#isinstance(object,-class))
   2. [issubclass(object, class)](#issubclass(object,-class))  
16. [Numeric Types](#Numeric-Types)
    1.  [Integers](#Integers)
        1.  [Operations](#Operations)
        2.  [Base](#Base)
    2.  [Rational Numbers](#Rational-Numbers)
    3.  [Floats (Real Numbers)](#floats-real-numbers)
        1.  [equality](#equality)
    4.  [Booleans PEP 285](#Booleans-PEP-285)
        1.  [Booleans operators](#Booleans-operators)
        2.  [Short-Circuiting](#Short-Circuiting)
6. [Iterable and Iterators](#Iterable-and-Iterators)
    1. [Consume iterators manually](#Consume-iterators-manually)
    2. [Lazy Iterables](#Lazy-Iterables)
    3. [iter() method](#iter()-method)
        1. [iter() with callables](#iter()-with-callables)
    4. [Delegating Iterators](#Delegating-Iterators)
    5. [Reversed Iteration](#Reversed-Iteration)
    6. [Caveats: using iterators as function arguments](#Caveats:-using-iterators-as-function-arguments)
7. [Generators](#Generators)
   1. [Iterables from generators](#Iterables-from-generators)
   2. [Generator expressions](#Generator-expressions)
   3. [Yield from](#Yield-from)
8. [Sequence Type](#Sequence-Type)
   1. [Mutating sequence](#Mutating-sequence)
      1. [in-place concatenation and repetition](#in-place-concatenation-and-repetition)
      2. [Mutation by assignment](#Mutation-by-assignment)
      3. [Never return a mutated object](#Never-return-a-mutated-object)
   2. [Copying Sequences](#Copying-Sequences)
      1. [Shallow copies](#Shallow-copies)
      2. [Deep Copies](#Deep-Copies)
   3. [Slicing](#Slicing)
   4. [Custom Sequences](#Custom-Sequences)
      1. [Mutation in custom sequences](#Mutation-in-custom-sequences)
   5. [Sorting sequences](#sorting-sequences)
   6. [Zero-based Index](#Zero-based-Index)
   7. [Application - Polygon](#Application---Polygon)
   8. [Lists vs Tuples](#Lists-vs-Tuples)
       1. [Copying](#Copying)
       2. [Storing Efficiency](#Storing-Efficiency)
9.  [Iteration Tools - The itertools module](#Iteration-Tools---The-itertools-module)
    1. [Aggregators](#Aggregators)
    2. [iSlicing](#iSlicing)
    3. [Selecting and Filtering](#Selecting-and-Filtering)
    4. [Infinite iterators](#Infinite-iterators)
    5. [Chaining and Teeing](#Chaining-and-Teeing)
    6. [Mapping and Accumulation](#Mapping-and-Accumulation)
10. [Strings](#Strings)
    1. [Common methods](#Common-methods) 
11. [Lists](#Lists)
    1. [List comprehension](#List-comprehension)
12. [Tuples](#Tuples)
   9. [Named Tuples](#Named-Tuples)
      1. [Introspection](#Introspection)
      2. [Modify and Extending](#Modify-and-Extending)
      3. [Docstring](#Docstring)
      4. [Defaults values](#Defaults-values)
13. [Dictionaries](#Dictionaries)
14. [Unpacking iterables](#Unpacking-iterables)
   10. [Unpacking with *](#Unpacking-with-*)
   11. [Nested unpacking](#Nested-unpacking)
15. [Loops](#Loops)
   12. [While loop](#While-loop)
   13. [Try statement](#Try-statement)
16. [Functions](#Functions)
    1.  [Docstrings and annotations - PEP 257](#Docstrings-and-annotations---PEP-257)
    2.  [lambda expression](#lambda-expression)
    3.  [Function Introspection](#Function-Introspection)
    4.  [\*args and **kwargs](#\*args-and-**kwargs)
    5.  [Parameters default](#Parameters-default)
    6.  [Map, Filter and Zip functions](#Map-Filter-and-Zip-functions)
    7.  [Reducing Functions](#Reducing-Functions)
    8.  [Partial functions](#Partial-functions)
    9.  [The operator module](#The-operator-module)
17. [Classes](#Classes)
    1. [Getters and Setters](#Getters-and-Setters)
    2. [Overload methods](#Overload-methods)
    3. [\_\_str\_\_ method](#\_\_str\_\_-method)
    4. [\_\_repr\_\_ method](#\_\_str\_\_-method)
    5. [\_\_eq\_\_ method](#\_\_str\_\_-method)
18. [Scopes and Namespaces](#Scopes-and-Namespaces)
    1.  [Masking](#Masking)
    2.  [Nonlocal scope](#Nonlocal-scope)
19. [Closure](#Closure)
    1.  [Shared extend scope](#Shared-extend-scope)
    2.  [Nested Closure](#Nested-Closure)
    3.  [Application](#Application)
20. [Decorators](#Decorators)
    1.  [Multiple decorators](#Multiple-decorators)
    2.  [Memoization](#Memoization)
    3.  [Parametrized decorators](#Parametrized-decorators)
    4.  [Decorator class](#Decorator-class)
    5.  [Monkey Patching and Decorating classes](#Monkey-Patching-and-Decorating-classes)
    6.  [Single Dispatch Generic Functions](#Single-Dispatch-Generic-Functions)
        1.  [Application - Htmlizer](#Application---Htmlizer)
21. [Python optimizations](#Python-optimizations)
    1.  [Interning](#Interning)
    2.  [Peephole](#Peephole)
22. [Common Modules](#Common-Modules)
    1. [string](#string)
    2. [functools](#functools)
    3. [itertools](#itertools)
    4. [collections](#collections)
    5. [random](#random)
    6. [timeit](#timeit)
    7. [argparser](#argparser)
23. [Tips and tricks](#Tips-and-tricks)

# PEP8 naming convention
---

- `packages`: short lowercase and without underscore es. `utilities`
- `modules`: short lowercase and with underscore es. `db_utils`
- `classes`: first letter of each word are uppercase, no spaces and no underscore es. `MyClass`
- `functions` lowercase and with underscore es. `open_account`
- `variables` lowercase and with underscore es. `account_id`
- `constants` all uppercase with underscore es `MIN_VAL`

back to [TOC](#TOC)

# Everything is an Object
---
In python everything is an object. Functions for example, inherit from the built-in function class; the same happen for classes which inherit from class function. This implies that every objects has a memory address (yes, even function and classes). In the same way every object che by assigned to a variable, passed as argument to a function or returned by a function. We can look at the object type of any variable with the `type` built-in function.

back to [TOC](#TOC)

# Modules
---

First lest define what is the `Namespace`: essentially it is a dictionary that contains all the reference in memory that are currently loaded into the python interpreter. It can be access with two different keywords, `globals()` to access the global namespace, and `locals()` to access the local namespace of, let's say, a function.

Like everything in python, also modules are object of the type module. When a module is imported, it get cached into memory (not in the namespace), and its memory reference is reported in the global namespace. 
Due to this, if we were to import the same module from two different scripts, the memory address would be the same across the scripts (we can think about it as a singleton object). We can look at the OS cache using the `sys.modules` which is a dictionary aswell.

The importing is done at runtime (i.e. while the python interpreter is already running) and not at compilation time like in C for example. The way python retrieve its modules is quite complex but to help understand it we can make use of the `sys` module which has some useful functionalities. For example we can look at where the current python execution ans the relative C binaries are with:

In [2]:
import sys
sys.prefix, sys.exec_prefix

('/usr', '/usr')

And we can see we are using the active virtual environment where both the installation and the C binaries are locate. As a matter of fact, modifying the `prefix` is the way python use to activate/deactivate a virtual environment.

But where python looks for import modules? There is a list of directory where python is looking, and these can be inspect with `sys.path`. If a module import fails, we can check if it is actually stored in one of the directory listed in path.

In [3]:
sys.path

['/home/pyfry/Desktop/Python_Projects/py_deep_dive',
 '/usr/lib/python38.zip',
 '/usr/lib/python3.8',
 '/usr/lib/python3.8/lib-dynload',
 '',
 '/home/pyfry/.local/lib/python3.8/site-packages',
 '/usr/local/lib/python3.8/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/home/pyfry/.local/lib/python3.8/site-packages/IPython/extensions',
 '/home/pyfry/.ipython']

The operations that python does during the import of a modules are:
* checking if already exist in cache -> `sys.modules`
* if not, create a new modules type object -> `types.ModuleType`
* load the source code from file
* add the entry to sys.modules
* compile adn execute the source code -> N.B. the code in the imported module is executed!

So, the importing of a module seems to be quite straightforward, what is more complicated is how python actually find the module we want to load. We can simply saying that there are 3 constructure at play:
* finders
* loaders
* finder + loaders == importer

First the `finders` are question wheter or not they know anything about the module we are trying to import; the list of the available finders can be found like this:

In [4]:
sys.meta_path

[_frozen_importlib.BuiltinImporter,
 _frozen_importlib.FrozenImporter,
 _frozen_importlib_external.PathFinder,
 <six._SixMetaPathImporter at 0x7f62cf3bb550>,
 <pkg_resources.extern.VendorImporter at 0x7f62ce160ee0>,
 <pkg_resources._vendor.six._SixMetaPathImporter at 0x7f62ce17c2b0>]

If one of the importer know the module it will built a `Modulespec` and tell the `loader` to load it. (es. the module math, since is a builti-in module, is found by the `BuiltinImporter` finder)

In [5]:
import math
math.__spec__

ModuleSpec(name='math', loader=<class '_frozen_importlib.BuiltinImporter'>, origin='built-in')

We can find if a module is in the python path and look at its specs at the same time with the built-in module `importlib` :

In [6]:
import importlib
importlib.util.find_spec('math')

ModuleSpec(name='math', loader=<class '_frozen_importlib.BuiltinImporter'>, origin='built-in')

if `module_name` exist in `sys.path`, then its spec will be returned, if not we can solve the issue appending to the path list the directory where the module is found:

```py
import sys
sys.path.append('module_path')
```

If this has to become systematic, for example in a project where many paths have to added, the best way to procede is to compile a `.pth` file. For more information look at [https://docs.python.org/3/library/site.html](#https://docs.python.org/3/library/site.html)

## Some clarifications
We have seen that when a module is imported for the first time, if founded in `sys.path`, its address is added to `sys.module`. Instead, what goes inside the namespace `globals()` depends on how we import the module itself, wheter with an alias or not. For example:

In [7]:
import math

# in this way the module math is loaded into sys.module and the name `math` is added to the namespace globals()
# both the `math` names point to the same address

import math as math_alias

# in this way the module math is loaded into sys.module but name `math` is not added to the namespace globals()
# instead we found the name `math_alias` which point to the same address associated to `math` in sys.module

from math import sqrt

# in this way the module math is loaded into sys.module but in the namespace globals() we found only `sqrt`
# that points to the function `math.sqrt`

from math import *
# in this way the module math is loaded into sys.module and the name of every function inside the math module is added to the namespace globals()


N.B. if a name is already present in the namespace and we import something that has the same name, it gets overwritten. This is why it is not recommended to use the `import *` unless we are fully aware of every name we are importing and that there is no conflict between different module's names.

N.B.B some things that using an explicit import of a function in a module like `from math import sqrt` is more lightweight; this is essentially not true because the entire math module is loaded in the `sys.module`, the only thing that change is that only the mane `sqrt` is added to the namespace. There is a very small advantage in calling the function because `sqrt(2)` has one less dictionary look-up that `math.sqrt(2)`, but since a dict-lookup is super fast the difference in efficiency is very very small. Therefore, in this case, do things for READABILITY and not EFFICIENCY.

## Reload a module
If, for some reason, we want to reload a module, it is not sufficient to repeat the `import` statement, because the module name is already in the `sys.module`, and when the loader finds it it will skips its reloading. Neither using `del module_name` is enough because we are only deleting the name reference from the namespace. What we need to do is to delete the memory reference in the `sys.module` dictionary: `del sys.module['module_name']` . Now if we re-import the module, we are creating a new object, with a new `id`.

As an alternative, if we want to reload the module without destroy and recreate the module object, we can use the `importlib.reload(module_name)` function, which will reload the module keeping the same memory id both for the `sys.module` and the namespace. However, care must be taken when reloading modules, for example, if we are loading only a specific function of a module to the namespace, let's say `from math import sqrt` we can directly reload the module since we don't have the name `math` in the namespace, but we'll have to refer to the sys.module reference -> `imporlib.reload(sys.module[math])`. However, even if `math` has been now reloaded, the same is not happened for the `sqrt` function. To do taht we would have needed to do something like `sqrt = sys.module[math].sqrt`. So be aware! reloading is not a safe process, isn't something we want to do in a production environment!

From more references on modules and module import look at :ù
* `PEP 302`
* [https://docs.python.org/3/tutorial/modules.html](#https://docs.python.org/3/tutorial/modules.html)
* [https://docs.python.org/3/reference/import.html](#https://docs.python.org/3/reference/import.html)

back to [TOC](#TOC)

# Packages
---

Packages are a collection of modules and possibly sub-packages that usually have some kind of specialized scope. The substantial difference that can tell us if a module is a package is the presence of a non-empty `__path__` attribute.


The 99% of the packages are file-based, therefore structured into directories in the file system. In particular we have that the `directory name` became the `package-name` and the directory needs to contain the code somewhere (since a package is also a module but not vice-versa). In fact, the code goes inside the `__init__.py` file inside the directory, and the pair `directory + __ini__.py` compose a `module`. Substantially, when python finds inside a folder a `__init__.py` file it knows that it is looking at a `package` and not a standard directory. If not, python create an `implicit namespace package` that allow us to navigate trough folders inside the package itself.

Now lets look at a typical app folder structure containing modules, package and sub-packages:

```
app/

    module_1.py

    pack_1/
        `__init__.py`
        module_1_a.py
        
        pack_1_1/
            `__init__.py`
            module_1_1_a.py   
```

Imagine we are executing our python interpreter inside the `app` folder. We can now simply say:
* `import module_1` : import whats inside module_1.py. If we try to look at `module_1.__path__` we will find that it is empty `''` since module_1 is a module and not a package;
* `import pack_1` : import the package `pack1` and loads what inside the `__init__.py`. Now the `pack1.__path__` is not empty since pack1 is a package. Also, python has added `pack1` has been added to both the `sys.modules` and `globals()` (namespace)
* `import pack_1_1`: result in an ERROR since the python finder can't navigate up to this package; instead we need to:
  * `import pack_1.pack_1_1` to be able to access the nested package. N.B. now `pack_1.pack_1_1` is stored inside `sys.modules` but not in `globals()` where only the root package is stored (`pack_1`); this because we are anyway always gonna refer it as `pack_1.pack_1_1` in our code.
  * `from pack_1 import pack_1_1`: in this way we are storing `pack_1_1` in `globals()` but it is only a placeholder since it is actually pointing to the same objects as `pack_1.pack_1_1` (the `id(pack_1_1 == id(sys.modules['pack_1.pack_1_1']`)

## Package structure

When creating a package we need to keep in mind two quite opposite point of view: the `developer` of the package, which needs a good breakdown of the package's functionality to have a better modularity, better debugging, reading etc.. and the `user` which instead only needs to access the functionality coded into the package itself. Therefore, care is needed when structuring the package, and a good use of the `__init__.py` files has a key role. 

Looking at the previous dummy example, if the user need to access something inside module_1_1_a.py, the import would be something like `import pack_1.pack_1_1.module_1_1_a` i.e. something very tedious and inefficient. If instead the `__init__.py` inside `pack_1` contains directly something like `from pack_1_1 import *`, the user would be able to access everything just doing `import pack_1` because when the package is imported also the `__init__.py` is executed and added to the namespace. 

However, more then often, using a `*` import is something that we want to avoid, since inside a package there will be many functions/classes that are needed only by the developer and not the user (it might be even 'dangerous' if the user is able to access them). To avoid this there are essentially two methods:

* name the 'private' (developer side) functions with an underscore in front (es. `def _func_only_for_dev()`) since the `*` import will avoid those
* specify inside each module the list -> `__all__` = ['names_i_want_to_export_from_a_module'], i.e. the list of all the objects we want to be imported with `*` import

## Package Namespace PEP 420
Packages Namespace are essentially packages without a `__init__.py` file. They have the advantage that can be spread wherever in the file system (even in a zip file) but the import of sub-modules/packages can't be flatten (we don't have a `__init__.py` to leverage).

back to [TOC](#TOC)

# Variables and Memory
---
When a variable is created, what python is doing under the hood is to link the variable name to the memory slot (slots) which contains the element assign to the variable. Therefore the name is nothing more than a reference to the memory slot.

## `id` function
`id` is the function that returns the memory address of a variable in base-10 ( can be converted with `hex` to see the hexadecimal representation).

## Reference Counting
Reference counting is a process carried out by the python memory manager internally. Each time we create a new variable, we are creating a reference to a memory slot. If we create a new variable that is equal to an existing one, we are adding a reference to the same memory slot (which now has a reference count equal to two).

In [8]:
import ctypes
import sys

my_var_1 = list() # my_var is pointing to the memory slot id(my_var)

other_var = my_var_1  # other var is pointing to the same memory slot of my_var
# at this point, the ref count of id(my_var) is equal to 2

print('sys.refcount returns:' , sys.getrefcount(my_var_1))
# return the ref count of the variable + 1 far the call of sys itself

print('ctypes returns:' , ctypes.c_long.from_address(id(my_var_1)).value)
# is a lower level way to find the ref count of a memory slot

sys.refcount returns: 3
ctypes returns: 2


## Shared Reference
When python variables share memory references?

In [9]:
a = 10
b = a
# b is not copying the content of a, it is pointing to the same memory address

a, b, a == b, a is b

(10, 10, True, True)

In [10]:

a = 10
b = 10
# since the number 10 is immutable, both a and b are pointing to the same memory address

a, b, a == b, a is b

(10, 10, True, True)

In [11]:

a = [1,2,3]
b = a
b.append(4)
# now both b and a are equal to [1,2,3,4] since a list is mutable object and appending an element modify only its internal state with the same memory address

a, b, a == b, a is b

([1, 2, 3, 4], [1, 2, 3, 4], True, True)

In [12]:

a = [1,2,3]
b = [1,2,3]
# in this case python doesn't create shared references, so a and b are pointing to different objects, this to prevent that modifying b affects also a

a, b, a == b, a is b

([1, 2, 3], [1, 2, 3], True, False)

N.b. There will be always a shared reference to `None` object, created automatically by python.

## Garbage Collection
It is the way python use to avoid memory leaks such that generated by circular references (objects pointing one to the other). Garbage collection can be controlled using the `gc` method. It can be turned off (only if we are super-sure that there are not circular reference in the code, in order to improve performance). The gc runs periodically on its own but can also be called manually to program a specific cleanup of the code.

## Object Mutability
Changing the data inside an object is called `modifying the internal state` since the memory address is not changed but only its content (es. appending an element to a list). So we can distinguish between `Mutable` and `Immutable` object depending on the possibility of changing the internal state.

`Immutable objects:`

- Numbers
- Strings
- Tuples (if contains mutable elements, es. lists, those remain mutable)
- Frozen Sets
- User_Defined Classes (if so defined)

`Mutable objects:`

- Lists
- Sets
- Dictionaries
- User_Defined Classes (if so defined)

Care must be taken when we talk about immutability of an object that is given to a function as an argument. We have to distinguish between the `module scope` and the `function scope`.
When we pass an object to a function we are in reality passing the `reference` of the object itself. So if we are passing to a function an immutable object, say a string, at the beginning both the module scope and the function scope point to the same memory reference, but as soon as the function modify the string (es. concatenating another string), then a new object with a new reference is created. If the object is mutable, say a list, and the function modify the list (es. appending an element), then python doesn't create a new object but simply modify the internal state of the existing memory reference

In [13]:
# IMMUTABLE
def process_str(s):
  # s has still the same memory reference as my_string
  s = 'hello' + s
  # now s has been modified, and since it was an immutable object, a new object with a new reference is created.
  return s

my_string = 'world'

my_string, process_str(my_string), id(my_string), id(process_str(my_string))

('world', 'helloworld', 140062307458992, 140062307458736)

In [14]:
# MUTABLE
def process_lst(lst):
  # lst has the same memory reference as my_list
  lst.append(5)
  # since lst is a mutable object, only the internal state is changed but the memory reference is still the same.
  return lst

my_list = [1,2,3]

my_list, process_lst(my_list), id(my_list), id(process_lst(my_list))

([1, 2, 3, 5, 5], [1, 2, 3, 5, 5], 140062307000000, 140062307000000)

## Variable Equality
There are two ways to verify the equality of two variables in python: the `is` and the `==` operators, respectively the identity and equality operators. While the identity operator compares the memory reference of two objects, the equality operator compares their internal state (data). Their negation are `is not` and `!=`.

In [15]:
a = 10
b = a
print(a is b) # True since the memory address is the same (int are immutable objects)
print(a == b) # True

True
True


In [16]:
a = 500
b = 500

print(a is b) # False, preloaded integers are in the range [-5, 256] see Interning
print(a == b) # True

False
True


In [17]:
a = [1,2,3]
b = [1,2,3]

print(a is b) # False, different memory address
print(a == b) # True

False
True


In [18]:
a = 'hello'
b = 'hello'

print(a is b) # True, but not always! only if strings are stored as Singleton
print(a == b) # True

True
True


In [19]:
a = 10.0
b = 10

print(a is b) # False, float and int are different objects
print(a == b) # True, python recognize the have the same value

False
True


back to [TOC](#TOC)

# Built-in methods
---

## `isinstance(object, class)`
Return `True` if an object is an instance of a particular class, `False` otherwise.

## `issubclass(subclass, class)`
Return `True` if class inherits from another upper class, `False` otherwise.


back to [TOC](#TOC)

# Numeric Types
---

## Integers
Integer are represented internally using base-2 digits (binary representation)

```
# es. binary representation of 19

 0   0   0   1   0   0   1   1 
--- --- --- --- --- --- --- ---   -> max num of bits = 8
2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0

1x2^4 + 0x2^3 + 0x2^2 + 1*2^1 + 1*2^0 = 16 + 2 + 1 = 19

(10011)base_2 = (19)base_10

To represent the number 19 are required 5 digits, hence 5 bits.
```

But which is the largest number we can store depending on the number of bits we want to store? It depends whether or not we care about negative values, since in order to store the sign we have to allocate one bit.
The general formula is:


```py
max_unsigned_digit = 2^n -1 # where n is the number of bits
max_signed_digits = [-2^(n-1), 2^(n-1)-1]
```


Side-Note: a 32 bit Operative system can store 2^32 unsigned integers (roughly 4Gb) and this limits also the number of memory address that can be stored at the same time. This is why having more than 4Gb of ram on a 32 bit OS is essentially useless since the machine can't store more than 4 Gb at the same time.

### Operations
How mod `//` (floor division) and div `%` (modulo) operators works in python? mod returns the floor division (rounded to the smaller integer) while div return the remainder. They have to satisfy the following equation:



```py
n = d * (n // d) + (n % d) # where n is the numerator and d is the denominator

#n.b. the `floor` of a real number `a` is the largest integer `<= a` 
floor(3.14) = 3
floor(-3.1) = 3
```


### Base
We can create an `int` object by calling the `int()` constructor; this has an optional parameter that is the base that python has to use to translate the argument (it may also be a string). The default values is base=10, since it is the way we are use to read numbers (while machines works in binary, so base=2). If the base is greater then 10 that the numbers start to be encoded with letters ( base[0, 10] = [0, 10], base[11, 27] = [A, Z])

Python has some built-in function to translate the most common base like `bin()` for binary `hex()` for hexadecimal.



```py
bin(10) = 0b1010 # the 0b is telling us that the base 2 (binary)
```


## Rational Numbers
Rational numbers are those number which are not integer and can be represented with a finite number of digits or translated into a fraction of rational numbers. The module `fraction` can be used to represent rational numbers, since the float representation can be misleading due to machine precision.


```py
from fraction import Fraction

Fraction(1,2) #(numerator, denominator)
Fraction('0.125')
Fraction(22/7)

# CAVEAT
'''
Some numbers can't have a finite representation due to machine precision. For example 0.3 it is actually an approximation. The problem is that we have to look at something like the 20th decimal position in order to realize that this is the case. If we pass this number to Fraction() we would imagine to receive 3/10 as output; instead we would get a fraction of very huge numbers that best approximate that imprecision in the machine representation of 0.3 (0.2999999999999999999998977..)
'''
```


## Floats (Real Numbers)
In CPython floats are implemented as `C double` which implements the `binary64` (IEEE 754).
Floats use a fixed size of 64 bits divide as follow:

- sign -> 1 bit
- exponent (in the range[-1022, 1023]) -> 11bit
- significant digits -> 52 bit (15-17 significant digit in base_10)

To have a precise representation of real numbers (since float may be effected by machine precision), we can use the `decimal` module.

```
# decimal representation of a real number
123.45 = 1*10^3 + 2*10^1 + 3*10^0 + 4*10^-1 + 5*10^-2 
```

### equality
Care must be taken when looking at the equality of floats since there are some decimal numbers that cannot be represented by a finite binary representation:

In [20]:
# base_10(0.1) = base_2(0.0 0011 0011 0011 ...) 
# therefore
x = 0.1 + 0.1 + 0.1 
y = 0.3 

print(f'{x:.20f}')
print(f'{y:.20f}') 
x == y 

0.30000000000000004441
0.29999999999999998890


False

One workaround is to set a range delta (es. a percentage of the size of the larger number involved in the equality operation) as discriminant values to determine if two numbers are equal:

```
|a - b| < epsilon
```

The pythonic way to approach this problem is to use the module `math.isclose` with the care of specifying appropriate relative and absolute tolerance:

In [21]:
import math

# math.isclose(x, y, rel_tol, abs_tol)
math.isclose(x, y)

True

## Booleans PEP 285
Booleans are a subclass of the int class (i.e. it inherits all its methods). Two constant are defined: `True` (int = 1) and `False` (int = 0). They are Singleton objects, i.e. the point to a fixed address in memory and can be compared with the identity and the equality operator aswell. N.b. even if True and False evaluates to the int 1 and 0 respectively, they don't point ot the same memory address, since they are not the same type of object:


In [22]:
int(True) == 1 and int(False) == 0

True

In [23]:
id(True) != id(1)

True

In [24]:
id(False) != id(0)

True

Objects have an associated `truth value`, meaning that they have a defined truth state. In particular, every object will evaluate to `True` by default except for:

- None
- False
- 0 (in any numeric type, float, complex ..)
- empty sequence (list, tuple string..)
- empty mapping (dictionary, sets..)
- implementing __bool__ or __len__ in a custom class

Has a matter of fact, when we call `bool()` on an object, it will look for the definition of the dunder method `__bool__`, if this is not defined, then it will look for the `__len__` method and if also this is not defined the it will evaluate to `True`.

In [25]:
# es. __bool__ implementation for the int class
def __bool__(self):
  return self != 0

### Booleans operators

```
# Truth Table
X Y  not X  X and Y  X or Y
0 0    1       0       0
0 1    1       0       1
1 0    0       0       1 
1 1    0       1       1
```

```
# De Morgan's Theorem
not(A or B) == (not A) and (not B)
not(A and B) == (not A) or (not B)

# Operations precedence (in descending order)
< > <= >= == != in is
not
and
or
```

### Short-Circuiting
Looking at a truth table there are two case in which the program can simply is job evaluating only part of a boolean statement. Thi is called `short-circuiting`:

In [26]:
True or Y # -> True 
# with an or statement, id the first argument is True it doesn't matter whether the second argument is True or False, the operation will always evaluate to True

True

In [27]:
False and Y # -> False
# with an and statement, id the first argument is False it doesn't matter whether the second argument is True or False, the operation will always evaluate to False

False

This is very useful when we have to concatenate two conditions together, one of the which may results in rising an exception (and breaking the code) if evaluated. With short-circuiting we can add a first statement that check for a particular exception that may rise with second argument.

In [28]:
my_string = 'ciao'
if 'a' == my_string[0]:
  pass
# we are checking if 'a' is the first letter of my_string. But what happen if my_string is empty? the code breaks. We can solve this with short-circuiting the first


In [29]:
my_string = ''
if 'a' == my_string[0]:
    pass

IndexError: string index out of range

In [None]:
if my_string and 'a' == my_string[0]:
  pass
# Since an "and" expression will evaluate to True only if the two members are True, we can safeguard our code from breaking checking first if my_string evaluates to False (i.e. if is an empty string);
# if it is so, the second part of the "and" statement won't be executed thus safeguarding our code of breaking due to IndexError exception.

back to [TOC](#TOC)

# Iterable and Iterators
---
Iterables are, in essence, containers that can be iterated, nothing more. For example, sequences are particular iterables upon which we can iterate on an index-base, but this is not always the case for an iterbale. An iterable only need the concept of `next` item, meaning that it should return another element from the container, without any ordering implied. Moreover iterators needs to keep track of the elements that have been handed out, since we don't want the same element twice, and a stopping creterion is needed to tell us when the container is exhausted and no more elements are available. Other features are: having a finite number of elements in the container, the possibility to "start from the beginning" i.e. reuse the iterable if needed, the use of list comprehension etc..

Python has a special method called `next()` that lives under `__next__` and fit exaclty the purpose of iteraring over an iterable, i.e handing out a new elemnt each time it is called on an iterable. We also have a built-in exception that is made to check if the iterable is exhausted, this is `StopIteration`.

Let's imagine we want to create our custom iterable class, how can we tell Python how to behave properly on the `__next__` method? how to not exhaust our class instance once we have iterated over all the elements? 
To answer this we need to use a **`protocol`** i.e. a way to tell to the python interpreter that our class has to implement certain functionality.

In particular, for an `iterator` we need a `iterator protocol` that requires the class to have 2 methods:
* `__iter__` a method that should only return the class instance (why???)
* `__next__` to handle the elements return and the eventual raise of StopIteration

If our class satify this prerequisite than we have an **`iterator`** upon which we can use for loops, list comprehension etc.. except for the reusability of the iterable, once it is consumed it has to be reinstaciated.

At the end, the solution to our problem is to have two distinct objects: the `iterable` that is the collection of the elements, it never get exhausted, it is just a container; and the `iterator`, a copy of the iterable object that is responsible for iterating over the elements. the iterable is created once while the iterator is created any time we want to restart the iterating process. This is a good deal form amny reasons, for example we may have a use set of data stored in our iteable and we don't want to reload them each time we need to restart the iteration!

As a formal distinction, an `iterable` is a python object that implement an `iterable protocol` that only requires the definition of the `__iter__` method which will return a new instance of the iterator each time is called:

```py
class Iterator
    def __init__(self):
        pass
        
    def __iter__(self): # return itself
        return self
        
    def __next__(self):
        # criterion for StopIteration

class Iterable:
    def __init__(self):
        pass
        
    def __iter__(self): # return the iterator
        return Iterator(self)
```

Basically, we are not iterating over the instance of the iterable object but over the iterator generated by the iterable itself! 

N.B. when we ask python to iterate over an object it will first look for the `__iter__` method and only after for the `__getitem__` method. This means that if we implement both in our custom class, during iteration python will use the former.

Python has different built-in (lazy) iterable and iterators and it is fundamental to know who is what now that we know the difference between the two constructs:

* `range()` -> iterable
* `dict.keys()` / `values()` / `items()` -> iterable
* `zip()` -> iterator
* `enumerate()` -> iterator
* `open()` -> iterator

## Consume iterators manually
Imagine we need to parse a csv file  where the first two lines are the header and the data types. We could essentially do a for loop with nested if condition to split the header and the data type from the actual data or we could apply what just learned about iterators. Essentially we want to create and iterator from the iterable (the file to be parsed) and assign directly the headers and the data types to a variable and performe a for loop only on the data rows.

```py
with open(data_file.csv) as file:
    file_iter = iter(file)
    header = next(file_iter) # first row
    data_types = next(file_iter) # second row
    data = [row for row in file_iter] # from the third row and on
```

The results is a very clean and efficient way to parse the file

## Lazy Iterables
Let's first define what is a **Lazy Evaluation**: it is often referred to class properties that are not directly evaluated when the instance of the class is created but are computed, and becomes available, only when the propety is requested. Once the propery is requested for the first time, its value is cached in the instance state, therefore it doesn't need to be computed again.

In [None]:
from math import pi

class Circle:
    def __init__(self, r):
        self.radius = r # using the setter
        self._area = None # Lazy area
        
    @property
    def radius(self):
        return self._radius
    
    @radius.setter
    def radius(self, r):
        self._radius = r
        self._area = None # reset area when the radius is changed
        
    @property
    def area(self):
        if self._area is None:
            print('Calculating area')
            self._area = pi*self.radius**2
            #return self._area
    
        return self._area # if area was not None we already have the cached value   

In [None]:
c1 = Circle(2)

In [None]:
# if called again
c1.area

In [None]:
# Now the result is cached
c1.area

In [None]:
# if the radius is changed
c1.radius = 3
c1.area

This concept can be applied also to iterables, and in fact it is something used very often in python `generators`; essentially, the element is returned only when `next()` is called. Fro example we can create an infinte lazy iterable that will compute factorial only when requested.

In [None]:
from math import factorial

class Factorial:
    def __iter__(self):
        return self.FactIter()
    
    class FactIter:
        def __init__(self):
            self.i = 0 # initializer
            
        def __iter__(self):
            return self
        
        def __next__(self):
            f = factorial(self.i)
            self.i += 1
            return f

In [None]:
f = iter(Factorial())
for _ in range(10):
    print(next(f))

Looking at the code we can se that the iterable has almost nothing to do, all the computation is carried out in the iterator class. `enumerate()` and `zip()` are built-in lazy iterator of the python stack.

## iter() method
What happens when we call `iter(obj)`? Python essentialy first look for an `__iter__` method defined in the object class, if not, it look for the `__getitem__` method where basically the iterator is created over a sequence that is iterated with `next` as if it were a while loop, catching the exception for `IndexError` and raising the `StopIteration`. Folloqiong an example of a Sequence-Iterator class (something very similar to what python does when we call `iter()` on an object that has only `__getitem__` defined.

In [None]:
class SeqIter:
    def __init__(self, seq):
        self.seq = seq
        self.index = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        try:
            item = self.seq[self.index]
            self.index += 1
            return item
        except IndexError:
            raise StopIteration

### Iter() with callables
There is a second form of the `iter()` method that is useful to iterate over callables. In the form `iter(callable, sentinel)` we can specify a sentinel argument wich is the criterion to call the StopIteration on the iterable. Of course, if the sentinel value is never met, we'll generate an infinite iterable.

In [None]:
def call():
    i = 0
    
    def inner():
        nonlocal i
        i += 1
        return i
    return inner

inner = call()
iter_call = iter(inner, 5) # set the sentinel value to 5

for c in iter_call:
    print(c)

## Delegating Iterators
Let's imagine we have a class that is storying a list of elements in a pseudo-private variable (i.e. beginning with \_). Once an instance of the class is created, unless specifically implemented, we cannot iterate over this list (unless we are aware of the private variable, but aniway shouldn't be the case). On option could be to add the capability to the class to generate an iterator out of that list, but it is code that we dont want to handle directly. What we can do is to `delegate` the generation of the iterator to the `iter` method itself; in this way we only need to implement the `__iter__` method in our class. As a matter of fact, in real practice we will often not create a custom iterable, unless we need specific functionality, but we'll leverage the `iter()` method, delegating to it theduty of returning an iterator.

In [None]:
class Delegate:
    def __init__(self, some_list):
        self._somelist = some_list
        # do something to this list
        
    def __iter__(self):
        return iter(self._somelist)
    
# since some_list is already an iterable (a list) we can create easily an iterator with iter().
# In this way the class Delegate is simply trasformed in an iterable!

## Reversed Iteration
It may happen to need to iterate over an iterable in reversed order, and what we usually do is to use a for loop and slicing like:
```py
for i in lst[::-1]:
    print(i)
```
In this way we are creating a copy of the list, maybe only to iterate over some few elements in a huge sequence! A quite bad looking sintax solution would be to iterate backwards knowing the length of the sequence:
```py
for i in range(len(lst)):
    print(lst[-i-1])
```
However, the best approach, both in terms of efficiency and readability, is tu tranform the list in a `reversed iterator`:

In [None]:
seq = [i for i in range(5)]
for s in reversed(seq):
    print(s)

the method `reversed()` is creating an iterator of the iterable seq, therefore we are not creating a copy of the sequence. What Python does behinde the scene is to call the `__reversed__` method, and this is it , at least for sequence type.

However, the same does not apply if we want to reverse iterate over a general iterable that doesn't support indexing. In this case we need the `__reversed__` method to return an iterator. As for the `iter()` method, python will first search for the `__reversed__` method, and if don't find it it wil look for both the `__getitem__` and `__len__` methods (basically to do what we have shown before with the for loop).

## Caveats: usign iterators as function arguments
Due to the fact that we can only iterate once over an iterator before it is exhausted, we need to be careful to use it as a function argument. 

Let's immagine that we created a class that returns an iterator consisting in a list of numbers: now suppose we need to find the max and the min betweeen these. If we call, for example the `min()` method we will get the minimum values, but to get this python had to iterate over the iterator exhausting it! Therefore, if now we ask for `max()` we'll incurr in a `ValueError` since our iterator has reached the `StopIteration` condition with the `min()` method and now it is nothing more than an empty sequence!

back to [TOC](#TOC)

# Generators
---
Generators are functions that contain at least one `yield` statement, are essentially iterator (iterato protocol implemented), and are inherently lazy.

Let's imagine we want to create a function that can be stopped during execution (like a for loop that is stopped at each iteration) so that we can handle the data to do something else before the function is exhausted. In python there is a particular keyword called `yield` which emits a value and pause the function execution; than it can be resumed calling `next`. Once the function execution is finished, a `StopIteration` exception is raised.

In [None]:
# es.
def using_yield():
    print('Name')
    yield 'Giovanni'
    print('Surname')
    yield 'Frison'   
    
test = using_yield()
test

As we can see, `test` is not a function but a `generator object` upon which we can call the `next()` method to execute the body until a `yield` statement is reached.

In [None]:
next(test)

In [None]:
next(test)

In [None]:
next(test)

As for an iterator, when the function body is exahusted, i.e. no yield statment remains, the `StopIteration` is raised.

A function that use the `yield` statment in his body is called `generator function` and is essentially a `generator factory`. The `using_yield` function above is just a regular function, but since it contains a `yield` in the body, python compiler knows that it is a generator function/factory, and each time it is called a generator is created. In this way we are able to execute the function piece-wise, one `yield` statement at the time, exiting and re-entering the fucntion in different part of our code, calling `next` on the generator function.

The generator behave like an iterator because it is an iterator! it has the iterator protocol implemented (`__iter__` and `__next__` methods). As a matter of fact, generatos are powerful tools to create iterator in a very effective way. Es. looking at the Factorial implementation that can be found in [Lazy Iterables](#Lazy-Iterables) section, we can easily re-implementing it using a yield statement:

In [None]:
import math

def factorials(n):
    for i in range(n):
        yield math.factorial(i)

In [None]:
for f in factorials(5):
    print(f)

## Iterables from generators
Being an iterator, a genrator has the same caveat, meaning that once it is exhausted it has no use and sometimes this can lead to unwanted bugs in our code. However, there is a solution to this, we could make an iterable from our generator simply by defining an iterable class that in the `__iter__` method returns not `self` but an instance of our generator. Let's look at an example:

In [None]:
def square_gen(n):
    for i in range(n):
        yield i ** 2

sq = square_gen(4)
for s in sq:
    print(s)

So, we created a generator factory (`square_gen`) and a generator from it (`sq`) that it now exhausted due to the for loop.

In [30]:
list(sq)

NameError: name 'sq' is not defined

However, we can define a custom iterable ourself (implementing the iterable protocol).

In [None]:
class Square:
    def __init__(self, n:int):
        self.n : int = n
            
    # iterable protocol
    def __iter__(self):
        # we need to retunr an iterator or in this case 
        # our generato function
        return Square.square_gen(self.n) 
    
    @staticmethod # it doesn't use any of the class properties
    def square_gen(n):
        for i in range(n):
            yield i ** 2   
    

Now we can call `sq` as many time as we want since it time is automatically returning a new instance of the generator function `square_gen`.

In [None]:
sq = Square(4)
for s in sq:
    print(s)

In [None]:
list(sq)

## Generator expressions
Using the same sintax of list comprehension we can create generator expression, the only difference is in that generators wants round brackets `()`. The advantage of generator expressions is that they are ineherently `lazy`, meaning that the expressions are not evaluated until requested by the `next()` function

In [31]:
lst = [i ** 2 for i in range(5)] # iterable
gen = (i **2 for i in range(5)) # iterator

lst, gen

([0, 1, 4, 9, 16], <generator object <genexpr> at 0x7f62cc076f90>)

comparing list and generator comprehension we have:
* list takes longer to create since they have to evaluate each expression, while generator are returned immediately
* list iteration is faster since the object has already been created

Therefore, we canc conclude that:
* if we need to iterate over all the elements, the  time performance is almost the same, but generators occupy less space since once iterated are destroyed
* If we need to iterate only on some elements then generators are the way to go, since we don't compute all the unwanted iterations

## Yield from
In it easiest application, `Yield from` is just a way to replace a for loop over an iterable inside a generator expression. Let's take for example:

In [32]:
def loop_on_nested_generator(n):
    gen = ((i*j for i in range(1,n+1)) for j in range(1,n+1))
    for row in gen:
        for item in row:
            yield item

In [33]:
loop = loop_on_nested_generator(2)
for l in loop:
    print(l)

1
2
2
4


Instead we can sustitute the last for loop with a `yield from` an achieve the same result.

In [34]:
def yield_from_nested_generator(n):
    gen = ((i*j for i in range(1,n+1)) for j in range(1,n+1))
    for row in gen:
        yield from row

In [35]:
loop = yield_from_nested_generator(2)
for l in loop:
    print(l)

1
2
2
4


back to [TOC](#TOC)

# Sequence Type
---
In math terms a sequence is nothing more than a countable group of items that have a positional ordering, meaning that each element can be accessed by an index representing its position. In Python, a `list` is a sequence type while a `set` is not since it doesn't have a positional order.
A sequence can be `mutable` (list, bytearrays) or `immutable` (string, tuples, range, bytes). 
Again, sequence can be `homogeneous`, if they held elements of the same type (like strings) or `heterogenuos` (lists). 
A sequence is also an `iterable type` since we can reach each element one-by-one hence iterating over the sequence (n.b. an iterbale is not always a sequence, set is an example).

Common methods on sequence are:
* `in` and `not in`
* `+` for concatenation (not for range type)
* `* int` for repetition (not for range type)
* len() to retrieve the length of the sequence
* index(x) to retrieve the occurence of element x



## Mutating sequence

With mutation in Python we refer to the change in the internal state of a mutable object without modifying it memory address. For example list concatenation is not a mutation since a new object is created while usign the method `append` is:

In [36]:
# concatenation
l = ['Giovanni']
l_id = id(l)
l = l + ['Frison']
id(l) == l_id

False

In [37]:
# appending
l = ['Giovanni']
l_id = id(l)
l.append('Frison')
id(l) == l_id

True

Other mutation can be achieved with:
* slicing assignation -> `s[i] = x`
* delete of elements -> `del s[i]`
* removing all the objects in the container -> `s.clear()`
* inserting elements -> `s.insert(i, val)`
* extend with another iterable -> `s.extend(iterable)`
* pop (return and remove the element at index i) -> `s.pop(i)`
* remove the first occurrence of x -> `s.remove(i)`
* reverse in-place -> `s.reverse()`
* and many more ..

N.B. not all the sequence type must have this methods, in particular if custom made by us

### in-place concatenation and repetition
Sequence can be concatenate or repeted using the `+` or the `*`. If We do it in the stadard way we obtain a new object:

In [38]:
l1 = [1,2,3]
l2 = [4,5,6]
l1_prev_id = id(l1) 
l1 = l1 + l2

id(l1) == l1_prev_id

False

however, if the sequence is mutable, we use inplace concatenation `+=` or repetition `*=` we are mutating the object, therefore the `id` remain the same. If the sequence is immutable, like a tuple, a new object will be created anyway.

In [39]:
l1 = [1,2,3]
l2 = [4,5,6]
l1_prev_id = id(l1) 
l1 += l2

id(l1) == l1_prev_id

True

### Mutation by assignment
Some mutable sequence, like lists, support the assignment via index, meaning that we can replace elements by assigning new values to the list. This works also with slices provided that the we provide as substituing value an iterable, and it doesn't even need to be of the same length of the slice we are replacing! We can aslo have stepwise slices as a replacement, but in that case the length of the iterable must match the number of element selected. In the same way we can delete elements just by replacing a slice with an empty list. At last, usign a trick we are also able to insert an iterable inside a sequence: first we need to create an empy assignation to  a slice on the same index e.g. `l[1:1] = []`, and then we can assign an iterable to that slice.

In [40]:
# replacing a slice with as many elements as we want
l = [1,2,3,4,5,6,7,8,9,10]
l[:2] = ['a', 'b', 'c', 'd'] # place 4 elements instead of the first 2
l

['a', 'b', 'c', 'd', 3, 4, 5, 6, 7, 8, 9, 10]

In [41]:
# replacing a stepwise slice with the same number of element selected
l = [1,2,3,4,5,6,7,8,9,10]
l[::2] = ['a', 'b', 'c', 'd', 'e'] # 5 element change with 5 element
l

['a', 2, 'b', 4, 'c', 6, 'd', 8, 'e', 10]

In [42]:
# delete elements
l = [1,2,3,4,5,6,7,8,9,10]
l[:5] = []
l

[6, 7, 8, 9, 10]

In [43]:
# insert elements
l = [1,2,3,4,5,6,7,8,9,10]
l[1:1] = []
l[1:1] = [1,2,3]
l

[1, 1, 2, 3, 2, 3, 4, 5, 6, 7, 8, 9, 10]

### Never return a mutated object
If we write a plain function it is best practice not to modify the element we are passing as argument but return a modified copy of it. So called `in-place methods` are generally bouded to classes.

In [44]:
# what we should do
def reverse(s):
    s.reverse()

# what we shoudn't do
def reverse(s):
    s.reverse()
    return s

s1 = [1,2,3]
s2 = reverse(s1)

s1, s2

([3, 2, 1], [3, 2, 1])

## Copying Sequences

While copying immutable sequence is in general a safe procedure, the same cannot be stated about mutable ones. A trivial example comes from care concatenation and repetition, because the repeting/concatenating will create a copy of the object and if we then modify one of the copy, the same happen to the other.

### Shallow copies
We created a function that apply an in-place method to the argument `s` of the function and then `return s`. The user then expect to use that return and assign the function call to a variable `s2` thinking that he created a new object while instead `s2` is now poiting to the same object of `s1` that as been modified as well. What we should do is **not** returning the function.

Even better, it would be not to do in-place modification to our objects, but create a copy first and than pass it to the function that modifies it. To copy a sequence, or any objects, there are a variety of ways, some more pythonic then others:

In [45]:
# Ways of coopying a list creting a new object but leaving the same
# memory reference to the elements inside the list

s = [1, 2, 3]

# 1. simple loop (horrible)
cp = []
for i in s:
    cp.append(i)
    
# 2. list comprehension
cp = [i for i in s]

# 3. copy method (not for immutable sequence like tuple or strings)
cp = s.copy()

# 4. slicing (with tuple the same element is returned)
cp = s[:len(s)]

# 5. list method (with tuple the same element is returned)
cp = list(s)


# therefore we have
s == cp,  s is cp,  [s[i] is cp[i] for i in range(len(s))]


(True, False, [True, True, True])

What we have performed are called `shallow copies`, i.e. we have created a copy only of the sequence object but the elements inside have the same memory reference. If these elements are immutable objects than the copy is safe, meaning that we can modifying it without affecting the source. However, if the sequence contains mutable objects then a shallow copy may not be enough. Lets see an example:

In [46]:
l = [[1, 2], [3, 4]]
l

[[1, 2], [3, 4]]

Now `l` is a sequence that contains 2 mutable objects. We can create a copy and try to sustitute one of its element and see what happens:

In [47]:
cp = l.copy()
cp[0] = 'python'
cp is l, cp, l

(False, ['python', [3, 4]], [[1, 2], [3, 4]])

The copy `cp` is actually a new object, and when we say `cp[0] = 'python'` we are actually mutating this object, without affecting the original list `l`. but what happens if we try to modify the mutable objects inside `cp` (which share the same memory reference with the one contained in `l`)?

In [48]:
cp = l.copy()
cp[0][0] = 100
cp, l

([[100, 2], [3, 4]], [[100, 2], [3, 4]])

As we can see noth the inner list in `cp` and `l` have been modified, this because the `l.copy()` is a shallow copy and effetcs only the outer object. As a matter of fact:

In [49]:
cp[0] is l[0], cp[1] is l[1]

(True, True)

### Deep Copies

To performe a copy at the deepest level of an object a recursive approach, able to handle circular references, is needed. Python has the built-in module `copy` to carry out a deep copy.

In [50]:
from copy import deepcopy
cp = deepcopy(l)
l, cp

([[100, 2], [3, 4]], [[100, 2], [3, 4]])

In [51]:
cp[0] is l[0], cp[1] is l[1]

(False, False)

the deepcopy is intelligent enough to retin references also after the deep copy. Lets see an example:

In [52]:
class MyClass:
    def __init__(self, a):
        self.a = a
        
x = MyClass(100)
y = MyClass(x)
lst = [x, y]

x is y.a

True

Now `lst` is a sequence that contains two elements, `x` and `y` that has an attribute `y.a` that point to `x` (not a circular reference. Now if we performe a deepcopy, python will create new objects for each element, but will retain the relationship between `y.a` and `x`.

In [53]:
cp = deepcopy(lst)
# now even if we have different objects
x is cp[0], y is cp[1]

(False, False)

In [54]:
# the relationship is retained
cp[0] is cp[1].a # same of x is y.a

True

## Slicing

Slicing is an opertaion that works with indexing, therefore it is appliecable only to sequence type objects. A `slice` is an object of type slice that can also be created, and assigned to a varible, with the keyword `slice()`.
Slicing always return a new object

The esiest way to slice is specify the star and stop:
* `l[i:j]` with i included and j excluded

Slice are independente from the sequence they are slicing, therefore even if the stop of the slice is out of bound for the sequence, python won't throw an error but slice to the end of the sequence.

In [55]:
l = [1,2,3,4,5]
l[0:3], l[2:100]

([1, 2, 3], [3, 4, 5])

It is possible also to specify a third argument for the slice which is the step (or stride), default to 1. Moreover, if the stop argument is not given, python automatically considere the `len` of the sequence as stopping point; same goes for the start argument

In [56]:
l[::2] # start:stop:step

[1, 3, 5]

As a conseguence, it is easy to reverse a list with slicing:

In [57]:
l[::-1]

[5, 4, 3, 2, 1]

In general, we can have arguments of th slice that are greater the the length of the sequence or even negative. To understand which range are we actually taking into account we need to remeber the folloqing rules:

given `l[i:j:k]`:

* if `k > 0`:
    * if `i, j > len(seq)` -> `len(seq)`
    * if `i, j < 0` -> `max(0, len(seq) + i(or j)`
    * if `i` is omitted or `None` -> `0`
    * if `j` is omitted or `None` -> `len(seq)`
* if `k < 0`:
    * if `i, j > len(seq)` -> `len(seq) - 1`
    * if `i, j < 0` -> `max(-1, len(seq) + i(or j)`
    * if `i` is omitted or `None` -> `len(seq) - 1`
    * if `j` is omitted or `None` -> `-1`

If in the end we are not sure of the slices we are doing, or we don't remember this rules, we can create a slice object with `i,j,k` needed and the use the method `indices()` that will return the indices `i,j,k` corresponding to the `range(i,j,k)`.

In [58]:
slice(0,6,2).indices(len(l))

(0, 5, 2)

In [59]:
# which is equivalent to
l, list(map(lambda x: l[x], range(0, 5, 2)))

([1, 2, 3, 4, 5], [1, 3, 5])

## Custom Sequences
An essential feature that as to be implemented in a custom sequence object is the `__getitem__` method since it is the one that make possible to iterate over the sequence enabling all sort of iterations, from list comprehension to for loops. The `__getitem__` method should be coded in order to handle positive and negative integers as well as slice objects. An idea of implementation could be the following:

In [60]:
 class MySequence(object):
        
    def __init__(self, length):
        if isinstance(length, int):
            self.length = length
            self.sequence = [i for i in range(length)]
        else:
            raise TypeError('Length must be an integer.')
            
    
    def __repr__(self):
        return f'{self.__class__.__name__}(length={self.length})'
        
    
    def __getitem__(self, index):
        if isinstance(index, int):
            if index < 0: # handle negative index
                index = self.length + index
            if index < 0 or index > self.length: # handle out of bound index
                raise IndexError
            else:
                return self.sequence[index]

        elif isinstance(index, slice):
            start, stop, step = index.indices(self.length)
            rng = range(start, stop, step)
            return [self.sequence[i] for i in rng]

        else:
            raise TypeError

In [61]:
seq = MySequence(10)
list(seq), seq[0], seq[-1], seq[1:3],

([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 0, 9, [1, 2])

### Mutation in custom sequences
We have seen that mutating a sequence means to change it without creating a new object. Example of mutation can be concatenation or repetition and these can happen also in-place. To add this capability to our custom sequence we need to `overload` the definition of the symbols `+, +=, *, *=` by implementing the methods `__add__` and `__iadd__` or `__mul__` and `__imul__`. 

Other common methods that we could implement in our sequence are:
* `__setitem__` as the complement of `__getitem__`
* `__contains__` to add the `in` functionality to check if an element is in the sequence
* `__delitem__` to delete elements with the keyword `del`

## Sorting sequences
Python as a built-in sorting method called `sorted()` with a default ascending order and an optional parameter `reverse=False`. When talking about ordering we have to take into consideration that, excecpt for numbers, it is not trivial to have a ordering criterion for each type of object. An easy example is with strings, that can be ordered lexicographically, but what about lower and upper case? In python for example, the convention is that lower cases come first, i.e. `'a' > 'A'` (ASCII charchters have a code that can be retrieved with `ord(str)`).

In some case, when the arguments are not directly comparable because they dont have a natural ordering, we need to create a `sorting key`, ie.e a rule that will help python understand which is the order it has to consider:

In [62]:
l = [1,'a', 'x', 'Z', '?', 100, 'A']
sorted(l) # the key argument is not provided hence python will try to sort in natural order

TypeError: '<' not supported between instances of 'str' and 'int'

In [None]:
l_func = lambda x: ord(str(x)) if isinstance(x, str) else x 
sorted(l, key=l_func)

Sorting return a copy of the iterable with the sorted elements in a list, use the TimSort algorithm (after Tim Peters, author of `import this` the zen of python). Also, it is a `stable sort`, meaning that if, given a key or not, there is equality in the order of two elements, the one that appear first before the sorting will be the first also after the sorting. Lists object have a `sort()` method that instead is an in-place sorting. They have the same algorithm but the `sorted()` method have a greater overhaed since it has to create a copy of the list.

## Zero-based Index

Why are python sequence indexed starting with `0` and not `1` ?  

Essentially Because:
* we want to describe a range of indices using `range(l, u)` with `l <= n < u`
* in this way the length (number of elements) is precisely the upper bound of the sequence (`l - u`)
* if we want to know how many elements precede the element `s[i]`, in a base-0 system is exaclty `i`
* the only starnage behavior is the last index of a sequence which is `len(s) - 1`

## Application - `Polygon`
Link to [Polygon_Class](Polygon_Class.ipynb)

## Lists vs Tuples

Lets look at the difference between lists and tuples (as immutable data-structure) and in particular why tuples are more efficient and should be used instead of lists if the mutability is an attribute not required.

To do this, we have first to definte what `constant folding` is: the process of recognize and evauating constant expressione at compile time and not at run time as computation.

To look at the different compilation of lists and tuples we'll make use of the `dis` module that essentially disassemble the steps that the python compiler execute.

In [None]:
from dis import dis

# let's compile a tuple and a list and disassemble the process
list_dis = dis(compile("[1,2,3,'a']", 'string', 'eval'))
tuple_dis = dis(compile("(1,2,3,'a')", 'string', 'eval'))

list_dis, tuple_dis

We can see the huge difference in compilation between tuples and list; the first load only one constant representing all the elements while the latter load one element at the time. In this case both the containers have immutable elements, but if the tuple contains a mutable object (es. a list) then the compilation advantage is lost, since first python build the list and only after the tuple.

In [None]:
dis(compile("(1,2,3,['a'])", 'string', 'eval'))

### Copying

There is a difference in copying a list and a tuple since one is mutable and the other not. If for example we do a shallow copy of a list, a new object is created, with a tuple instead, since it doesn't make sense for python to have two identical immutable objects.

In [None]:
l1 = [1,2,3,4]
t1 = (1,2,3,4)
l2 = list(l1) # shallow copy
t2 = tuple(t1)

l1 is l2, t1 is t2

### Storing Efficiency

From Python 3.8 there is not much difference in storing efficiency between tuple and list if the dimension of the final object is known. therefore there is little difference in storage between doing:

In [None]:
import sys

l = list(range(100))
t = tuple(range(100))

sys.getsizeof(l), sys.getsizeof(t)

However, if the final dimension of the sequence is unkown, i.e. we append elements to the list, then the list constructor allocate extra memory when it sees that the size is being filled

In [None]:
l = list()
size_prev = sys.getsizeof(l) # to catch the overhead of list creation

for i in range(10):
    l.append(i)
    size_l = sys.getsizeof(l)
    delta, size_prev = size_l - size_prev, size_l
    print(f' n° item: {i+1}, list size: {size_l}, delta:{delta}')

back to [TOC](#TOC)

# Iteration Tools - The itertools module
---
The itertools module is a collection of lazy iterator functions (i.e. very efficient) that can be very usefull in different situations. In this section we will look to several functions that can leverage the use of the itertools module.

## Aggregators
aggregators are function that iterate over an iterable and return a bulk summary of the its content as a singel value. Example are the functions `max()`, `min()`, `sum()` etc.. 

The function `any()` and `all()` are usefull agregator that looks at the truth values of the elements inside an iterator. `any()` will return `True` if at least one element evaluates to `True` while `all()` will evaluate to `True` only if all the elements evaulate to `True`.

N.B. remember that in Python every object has an associated truth value that by default evaluate to `True`. Only elements like `0`, `''`, `None`, `[]` evaluate to `False`. The truth values can be coded also in a custom class by defininig a specific rule in the `__bool__` method. If the bool method is ont defined, python will look for the `__len__` method, where 0 evaulate to `False`. If neither of the two is defined, the custom object will evaulate to `True` by default.

A `predicate` is a function that takes a single argument and return `True` or `False`, like `bool()`, and can be used in conjunction with `all()` or `any()`. Let's say for example that we want to find if all the elements in a list are greater then 0. We could for sure iterate over the whole list, but a more clever way is to apply a predicate to the list and then check the content with `all()`. In this way if, for example there is an element < 0 in the first position, the program won't waste resources iterating over the whole list, saving memory and time. 

In [None]:
lst = [0, -1, 10, 5, -3, 6]
gen = (l >= 0 for l in lst)
all(gen)

## iSlicing
Slicing is the operation of cutting a sequence type object in different ways. The classical notation is composed by up to three terms `[i:j:k]` respectively the start, stop and step parameters. However, with classical slicing is not possible to slice iterables. To do this, we have the `itertools.islice` method.

Of course, `islice` is a lazy iterators, that iterate over an iterable and return a lazy evaluated slice.

In [None]:
from itertools import islice
result = islice(lst, 0, 3)
result, list(result)

Since the result of `islice` is an iterator, once we have mapped it into a list, it is exahusted, therefore we won't be able to use it again. As a matter of fact, calling `list` again will result in an empty list.

In [None]:
list(result)

## Selecting and Filtering

Python has a builting function `filter` that takes an iterable and a predicate and return an iterator. What it doesn under the hood is basically forming a generator expression that loops over the iterable veryfing the predicate on each element.

In [None]:
lst = [1, 2, 3, 4, 5]
list((l for l in lst if l>3)), list(filter(lambda x: x>3, lst))

as expected the two methods are exactly equivalent, both return an iterator that once iterated is exausted. 

From the itertools module we have some lazy version of the filter function, for example `filterfalse` that will essentially filter the negation of the predicate function.

In [None]:
from itertools import filterfalse
lst = [0, '', 'ciao', 1]
list(filterfalse(None, lst)) 
# N.B. if None is supplied as predicate python will look at the truth values of each element

another usefull itertools function for selecting and filtering is `compress` wich takes two iterables, one with the data to be selected and one with a series of selector (values that explicitly evaluates to True or False), adn maps the two. The result is a lazy iterator that contains only the elements in the first iterable that were in the position that evaluated to True in the second one. Better an example than 1000 words:

In [None]:
from itertools import compress
lst = [1, 2, 3, 4 ,5, 6]
test = [True, False, None, 0, 1]
list(compress(lst, test))
# N.B. if the lst has more elemtns than test, the remaining are evalutated to None and therefore discarted

The `takewhile` function takes an iterable and a predicates as arguments and returns an iterator that yield values until the predicate evaluates to `True`; when a `False` evalutation is encountered, at that point the iterator is exahusted. 

In [None]:
from itertools import takewhile
lst = [1,2,3,5,2,1]
list(takewhile(lambda x: x<5, lst))

As expected, when `takewhile` encoutered a predicated that evaluated to `False` (5<5) it stopped yielding eveen if the last elemebts would evaluate to `True`.

Similarly `dropwhile` will do the opposite, it will start yield values from the iterable as soon as the predicated evaluate to `False` the first time:

In [None]:
from itertools import dropwhile
lst = [1,2,3,5,2,1]
list(dropwhile(lambda x: x<5, lst))

## Infinite iterators
From the itertools module we also have a set of infinite iterators that can comes handy. `count` for example  is a function similar to `range`, since we can define a start and a step, but it has no stop parameter. Moreover, start and step can be an numeric type.

In [None]:
from itertools import count, takewhile
list(takewhile(lambda x: x < 11,count(10, 0.3)))

The `cycle` function let us iterate over an iterable or an iterator (yes, also an iterator) over and over. 

In [None]:
from itertools import cycle
lst = [1, 2, 3, 4]
match  = ['a', 'b']
for i, j in zip(lst, cycle(match)):
    print(i, j)

The `repeat` instead simply yields the same element indefinitely, or a defined number of time if specified. N.B. the elemetn that is repeated is actually alwasy the same object!

In [63]:
from itertools import repeat
ciao = repeat('ciao')
for _ in range(3):
    print(next(ciao))

ciao
ciao
ciao


## Chaining and Teeing

Chaining is essentially the operation of concatenate multiple iterables together, something that we can easily fo with the `+` sign. What `itertools.chain` add is the ability to concatenate iterators lazily. `chain(*args)` takes a variable number of arguments that can be iterable or iterators. However, there is a caveat: imagine we have list `l` containing 3 iterators that we want to chain, if we pass `l` with unpacking we are losing the lazyness since unpacking with `*` is an eager procedure (it requires python to iterate over the object to unpack, and if it is an iterator it will exhaust it). In order to pass directly an iterable there is a dedicated method in the chain module: `itertools.chain.from_iterable`, another lazy iterators!

In [64]:
from itertools import chain

l1 = (i**2 for i in range(3))
l2 = (i**3 for i in range(3))
l3 = (i**4 for i in range(3))

list(chain(l1,l2,l3))

[0, 1, 4, 0, 1, 8, 0, 1, 16]

If we try to use an iterable of iterators:

In [65]:
l = [(i**j for i in range(3)) for j in range(2,5)]
list(chain(l))

[<generator object <listcomp>.<genexpr> at 0x7f62cc07dc80>,
 <generator object <listcomp>.<genexpr> at 0x7f62cc07dc10>,
 <generator object <listcomp>.<genexpr> at 0x7f62cc07dba0>]

What we get back is a list of the generators created inside `l` and not their chaining.

Instead if we use the `.from_iterable` method:

In [66]:
list(chain.from_iterable(l))

[0, 1, 16, 0, 1, 16, 0, 1, 16]

Since iterator are one-time-use it can be beneficial to be able to create copies if we need to use them more than once in our code. The simplest way would be to use a for loop to populate, let's say, an empty list with an arbitrary number of calls of our generator function, however python has a smarter way to do it. The operation is called  "Teeing" and is performed by the `itertools.tee` function that will take 2 arguments, an iterable/iterator and the number of time we want to copy it. The copies will be independent object with different memory address.

N.B. what come back from `tee` is always an iterator even if the object passed was an iterable!

In [67]:
from itertools import tee

def square(n):
    for _ in range(n):
        yield n**2
        
single_iterator = square(5)

multiple_iterators = tee(single_iterator, 5)

multiple_iterators
# each object in the tuple has a different memory address!

(<itertools._tee at 0x7f62cc083600>,
 <itertools._tee at 0x7f62cc0e5900>,
 <itertools._tee at 0x7f62cc0e5600>,
 <itertools._tee at 0x7f62cc126c80>,
 <itertools._tee at 0x7f62cc150840>)

## Mapping and Accumulation
Mapping is essentially the application of a callable (a function) to every element of an iterable, something that can be easily achieved with the `map` function (return a lazy iterator), a list comprehension or a generator function.

Of course, itertools has its own mapping function to work with iterators. `starmap` is similar to map but it is able to unpack every sub element of the iterable and mapping to a function. Moreover, we can pass to `starmap` a function that takes multiple argument, ideally a number equal to the number of elements in the nested iterables. As for every function in the itertools module, what is returned by starmap is a lazy iterator.

In [68]:
from itertools import starmap
l = [[1,2,3], [3,4,5]]
list(starmap(lambda x, y, z: x + y + z, l))


[6, 12]

Accumulation is a process that reduce an iterable to a single value; es. the `sum` function or more generally the `reduce` function (lazy iterator), which has the advantage to take an arbitrary function to apply to each element of the iterable and also to specify an initializer (see [Partial functions](#Partial-functions))

Itertools has a function similar to `reduce` that is called `accumulate` which take an iterable and a function as arguments (it doesn't support an initializer) and returns (lazily) each intermediate result of the accumulation process

In [69]:
from itertools import accumulate
from functools import reduce
import operator

lst = [i for i in range(1,5)]

reduce(operator.mul, lst), list(accumulate(lst, operator.mul))

(24, [1, 2, 6, 24])

## Zipping
The classic built-in `zip` function is a lazy iterator that takes an arbitrary number of iterables and return an iterator that produces tuples. If the iterables passed have different length, the shortes will command the lenght of the resulting iterator.

In [70]:
# the shortest iterable command the iterator length
list(zip([1,2], ['a', 'b', 'c'], [10, 20, 30 , 40]))

[(1, 'a', 10), (2, 'b', 20)]

However, we may want to zip on the longest iterable, filling the holes with a predetermined value. To do this we have the `itertools.zip_longest` which takes a variable number of iterators as well, but it let us specify a `fillvalue` (defaulted to `None`) that will serve as a placeholder for the shorter iterables.

In [71]:
from itertools import zip_longest
list(zip_longest([1,2], ['a', 'b', 'c'], [10, 20, 30 , 40], fillvalue='Filled'))

[(1, 'a', 10), (2, 'b', 20), ('Filled', 'c', 30), ('Filled', 'Filled', 40)]

## Grouping
Sometimes, while iterating over an iterable, let's say a list of tuples, we may need to group the elements based on a specific pattern or a key. To do this there is the function `itertools.groupby` wich takes an iterable, a key function and returns a lazy iterator

In [72]:
from itertools import groupby

iterable = [(1, 10, 100), (1, 11, 101), (1, 12, 102),
            (2, 20, 200), (2, 21, 201),
            (3, 30, 300), (3, 31, 301), (3, 32, 302)]

groups = groupby(iterable, lambda x: x[0]) # key function groubing based on the first element of the tuples
for group in groups:
    print(f'key is {group[0]}')
    print(f'resulting in group:\n{list(group[1])}', end='\n\n')

key is 1
resulting in group:
[(1, 10, 100), (1, 11, 101), (1, 12, 102)]

key is 2
resulting in group:
[(2, 20, 200), (2, 21, 201)]

key is 3
resulting in group:
[(3, 30, 300), (3, 31, 301), (3, 32, 302)]



calling next on the sub iterator `group[1]` will actually consumes the original iterator created upond the iterable passed as argument to `groupby`. However, if we decide to skip iterating, let's say, on the second group of elements, when we call next on the third group, python automatically iterates also over the second in order to return in output the elements of the correct group.

N.B. `groupby` will create groups with elements that consecutively have the same key, it doesn't sort the iterbale first, therefore, depending on the need, it migh be needed to pre-sort the iterables before passing it to `groupby`.

### Caveat: lazy iterators in I/O operation
Imagine we want to read a csv file that contains a list of products (brand, product) in rows; we want to group this data by the the brand and we opt to use the `itertools.groupby function`. We can immagine to structure something like this:

```python
from itertools import groupby

with open('my_file.csv') as f:
    grouped = groupby(f, lambda x: x[0])
```

Now we surely would expect to be able to look at the grouped iterator by, for example, castin it to a list:

```python
list(grouped)
```

however, what we would have in return is a `ValueError: I/O operation on closed file.`. this is because `groupby` is a lazy iterator and therefore it won't actually evaluate its content untill requested, that in our case is outside the `with` statement, i.e. when the file `my_file.csv` has already been closed.

## Combinatorics
The itertools module has also some useful functions realted to combinatorics; as a matter of fact we have `permutations`, `combinations` and `cartesian product` of multiple iterables, all returning a lazy iterators.

### Cartesian Product
The cartesian product is the conbination of all the elements in 2 or more sets (don't need to be of the same length). In 2 dimension is something that we do preatty often, and can be achieved easily with a nested for loop, but when the dimension start to increase it can become messy. To easily handle an `n-dimensional` cartesian product we can use the `itertools.product` which takes an arbitrary number of arguments and return lazily their cartesina product.

In [146]:
from itertools import product

l1 = [1,2,3]
l2 = [1,2,3]

list(product(l1, l2))

[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]

### Permutations
Statistically speaking, simple permutations are all the possible combinations (without repetition, i.e. no duplicates element are present in the set) of all the elements in a set. Givena set of dimension `n` the toal number of permutation is given by `n!`. To performe a permutation in python we can rely on the `itertools.permutations` function wich takes as argument an iterable and, optionally, the length of the permutation. There is a caveat thou, since all the elements in an iterable, say a list, are distinct objects even if they have the same value; this means that, in case of duplicated values in an iterable, the permutation will contains an apparent repetition, which in reality is not because the object underlying is being switched in potition generating *de facto* a new permutation. **Elements are unique based on their position, not their value!**

In [4]:
from itertools import permutations

string = 'abc'

list(permutations(string))

[('a', 'b'), ('b', 'a')]

but, be careful, uniqueness is given by position not values, therefore if we a duplicate value in a different position, this result in a new object, therefore, not counting as a duplicate.

In [7]:
string = 'aba'

list(permutations(string))

[('a', 'b', 'a'),
 ('a', 'a', 'b'),
 ('b', 'a', 'a'),
 ('b', 'a', 'a'),
 ('a', 'a', 'b'),
 ('a', 'b', 'a')]

### Combinations
Unlike permutations, combinations don't care about the order of the elements (i.e `'ab'=='ba'`). Combinations can be defined `without replacement`, meaning that once an element is picked from a set it cannot be picked again, and `with replacement`. To perform combinations in python we can use the `itertools.combinations` or i`tertools.combinations_with_replacement`; both functions take an iterable and an the optional length of the combination

In [11]:
from itertools import combinations

l1 = [1,2,3]

list(combinations(l1, r=2))

[(1, 2), (1, 3), (2, 3)]

In [12]:
from itertools import combinations_with_replacement

l1 = [1,2,3]

list(combinations_with_replacement(l1, r=2))

[(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]

back to [TOC](#TOC)

# Strings
---

Strings are immutable object of the sequence type, therefore they have indices and can be used as an iterator. String are an homogeneous containers with fixed length and order.

## Common methods
- isalpha() -> check if is alphanumeric
- isprintable() -> check if is printable

back to [TOC](#TOC)

# Lists
---
List are mutable object 0f the sequence type... (to be extended)

## List comprehension
The goal of list comprehension is to **generate a list by `tranforming`, and optionally `filtering`, another iterable**.

Like a function, list comprehension have a localscope (what is inside the sqaured brackets is essentially the body of a function) but they can freely access also the globalscope. As a matter of facts, when python compile the list comprehension rhs, it creates a temporary function which is essentially the equivalent of a for loop (with eventual if statements). This also implies that, after the excution, the varibale inside the local scope are deleted and can't be retrieve in the global namespace.

We can try to disassemble a list comprehension to see whats happening:

In [73]:
from dis import dis

compiled = compile('[i**2 for i in (1,2,3)]', filename='string', mode='eval')
dis(compiled)

  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f62cc07ed40, file "string", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_CONST               2 ((1, 2, 3))
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f62cc07ed40, file "string", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                12 (to 18)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LOAD_CONST               0 (2)
             12 BINARY_POWER
             14 LIST_APPEND              2
             16 JUMP_ABSOLUTE            4
        >>   18 RETURN_VALUE


as we can see, python is create a function `4 MAKE_FUNCTION` 

List comprehension can be nested one inside the other, creating closures betwen themself.

In [74]:
lst = [[i*j for i in range(5)] for j in range(3)]
lst 

[[0, 0, 0, 0, 0], [0, 1, 2, 3, 4], [0, 2, 4, 6, 8]]

List comprehension can have as many nested for loop as we want:

In [75]:
l = []
for i in range(2):
    for j in range(2):
        for k in range(2):
            l.append((i,j,k))
            
            
l_1 = [(i, j, k) for i in range(2) for j in range(2) for k in range(2)]

print(l)
print(l_1)

[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]


Note that the order in the for loop is the same in the list comprehension

We can also add if statement inside the for loop and of course the order matter! They have to be referenced after the for loops.

In [76]:
l = []
for i in range(2):
    for j in range(2):
        if i == j:
            l.append((i,j))
            
l_1 = [(i, j) for i in range(2) for j in range(2)  if i == j]

print(l)
print(l_1)     

[(0, 0), (1, 1)]
[(0, 0), (1, 1)]


We can add an else statement, but in this case the `if..else` statement must come before the for loop. since it acts like a filter for the for expression.

In [77]:
l = []
for i in range(2):
    for j in range(2):
        if i == j:
            l.append((i,j))
        else:
            l.append('else')
            
l_1 = [(i, j) if i == j else 'else' for i in range(2) for j in range(2)  ]

print(l)
print(l_1)     

[(0, 0), 'else', 'else', (1, 1)]
[(0, 0), 'else', 'else', (1, 1)]


In [78]:
l_1 = [(i, j)  for i in range(2) for j in range(2) if i == j else 'else' ]

SyntaxError: invalid syntax (1162910910.py, line 1)

back to [TOC](#TOC)

# Tuples
---
Tuple are known as immutable list, they are object of the sequence type, therefore they have indices and can be used as an iterator. Tuples can be homogeneous or heterogeneous containers. Together with immutability, in comparison with lists, the main difference are that tuples have a fixed length and a fixed order (cannot be im-placed sorted or reversed like lists).

Due to this property, we can think of tuples as data records, where the position of the data, one define have a precise meaning:

```py
# Circle(x, y, radius)
circ1 = (0, 0, 10)
```
Once we have define the structure of our container, tuples can be used to store data efficiently, since once created we are sure that nobody will be able to accidentally modify it.

## Named Tuples
Sometimes, defining a custom class can be an excessive effort if what we want to achieve is simply storing data in a custom data structure. Classes require at least some methods to be implemented to be property employed in our code (such the `__eq__` and `__repr__` method for example); moreover the instance of a class is not immutable and can lead to potential errors. On the opposite, plain tuples are immutable and well opt to store data, but accessing properties with indices can be troublesome for the user or even for other developer to read. 

```py
# Class vs tuple approach
class Person:

    def __init__(self, name, age)
    self.name = name
    self.age = age

p1 = Person('Luca', 44)
p1.name, p1.age # 'Luca', 44

p1 = ('Luca', 44)
name = p1[0]
age = p1[1]
```

Of course, python as a perfect solution to this kind of problem, i.e. named tuples. `namedtuples` are functions that comes shipped in the `collenctions` standard library; they are a subclass of the `tuple` type but they are not a type them self. Instead, namedtuple is a function that generate a new class (`class factory`) which can assign property names to positional elements.

In [None]:
from collections import namedtuple

Pt2D = namedtuple('Point2D', ['x', 'y'])

# Pt2D is a variable alias of the class `Point2D` generated by the class factory namedtuple

pt = Pt2D(x=10,y=20)
pt, pt.x, pt.y

Each time we call the `Pt2D` functions, python is using the `__new__` method of the `Point2D` class to create a new instance of that object and return the tuple.

There are several ways in which we can pass the arguments to the namedtuple function:

In [None]:
Pt2D = namedtuple('Point2D', ['x', 'y']) # list
Pt2D = namedtuple('Point2D', ('x', 'y')) # tuple
Pt2D = namedtuple('Point2D', 'x, y') # comma separated strings
Pt2D = namedtuple('Point2D', 'x y') # whitespace separated strings

# and remember, namedtuple are subclasse of the tuple type
pt = Pt2D(x=10,y=20)
isinstance(pt , tuple)

Differently from a class instance, since `pt` is a tuple, it is immutable, i.e. we cannot modify its attribute (in a class object we could).

In [None]:
pt.x = 100 # cannot do! it is a tuple -> immutable

N.B namedtuple arguments name CANNOT contains underscore!

In [None]:
Pt2D = namedtuple('Point2D', 'x _y') # ERROR!!

unless we provide the keyword `rename=True`, in which case the namedtuple will convert the erroneous name into the positional number of the argument preceded by an underscore

In [79]:
Pt2D = namedtuple('Point2D', 'x _y', rename=True)
pt = Pt2D(10, 20)
pt

NameError: name 'namedtuple' is not defined

### Introspection
The namedtuple generated classes that are shipped with some methods that can help ud in the introspection of our code:

In [None]:
Pt2D._fields

In [None]:
pt._asdict()

### Modify and Extending
Namedtuple are immutable in essence but they come shipped with some methods that helps us handling arguments substitutions and extending. Basically the original tuple is overwritten and associated to a new memory address. Looking to Pt2D as example, we can modify its parameters with the method `_replace()`:

In [80]:
# pt right now is Point2D(x=10, _1=20)
pt = pt._replace(x=50)
pt

NameError: name 'pt' is not defined

Instead, if we want to extend the namedtuple, adding more argument we can use the the `_fields` property of the existing namedtuple (which is a tuple), adding the element/s we want and create the new namedtuple with extended fields:

In [None]:
Pt2D = namedtuple('Point2D', 'x, y')
old_fields = Pt2D._fields # is a tuple
new_fields = old_fields + ('z', ) # concatenation of two tuple
Pt3D = namedtuple('Point3D', new_fields) # equal to say Pt3D = namedtuple('Point3D', (x,y,z)) 
pt = Pt3D(10, 20, 30)
pt

### Docstring
The namedtuple is shipped with a set of precompiled docstrings that can be access as always with the `help()` or the `__doc__` method.

In [81]:
help(Pt3D)

NameError: name 'Pt3D' is not defined

### Defaults values
When we create a namedtuple, we cannot specify default arguments. One way to circumvent this would be to define an instance of the namedtuple (e.g. setting all parameters to 0) and then use the `._replace()` method to specify only those arguments that we want to have a default value. Alternatively we can use the `__defaults__` method (it can be use in the same way on any function). To use this last approach on the namedtuple we first need to create a new instance of the class using the `__new__` method and then call the `__defaults__`:

In [None]:
# `__defaults__` on a generic function
def func(x, y, z):
    print(x, y, z)

func.__defaults__ = (10, 20) # N.B. The replacement starts from the last parameter
func(x=0)

In [82]:
# Create a new instance of the Pt3D function setting a default values fro the argument z
Pt3D.__new__.__defaults__ = (0,)
pt = Pt3D(x=10, y=10)
pt

NameError: name 'Pt3D' is not defined

back to [TOC](#TOC)

# Dictionaries

From Python 3.6, dictionaries are ordered hasmaps, containing (key,values) pairs. To be noted that while the order of insertion is retained now, if you print a dictionaries, it will be sorted in lexicografical order, therefore dont count on that!

Before 3.6, to have an ordered dictionary we would have used the `collections.Orderdict`. this is not completely surpassed since it has some usefull functionalities like `move_to_end` and `popitem(last=False)` to move/pop a key in front or at the end of the dictionary.

back to [TOC](#TOC)

# Unpacking iterables
Iterables are `packed` structures that bundle values together (list, tuple, strings, set, dictionary..). As per the world meaning, `unpacking` is an operation that assigned the packed values to variables:

In [83]:
a, b, c = [1,2,3]

a,b,c

(1, 2, 3)

In [84]:
a, b, c, d = 'ciao'

a, b, c, d
# N.B. we can unpack in the same way a dictionary or a set but in that case the order of assignment will be casual because these are unordered types of objects.

('c', 'i', 'a', 'o')


It comes handy when we want to swap values between variables:


In [85]:
a = 10
b = 20

a, b = b, a

a, b
# this works in python because the RHS is evaluated first, where the memory address of "b" and "a" is copied in a tuple and only after assigned to the new swap) variables "a" and "b".

(20, 10)

## Unpacking with *
We may want to unpack an iterable in more than one variable, and in this case it comes handy the `*` operator:

In [86]:
l = [1,2,3,4]
# we want to unpack the first element of l and the others apart
#we could do it simply with list slicing and unpacking the
a, b = l[0], l[1:]

# or in a more elegant way with the * operator
a, *b = l # a=1, b =[2,3,4] 

a, *b, c = l # a=1, b=[2,3], c=4

a, b, c

(1, [2, 3], 4)

In [87]:
a, *b, *c = l # ERROR we can unpack with only one *, otherwise python won't understand who assign to whom

SyntaxError: two starred expressions in assignment (2746844977.py, line 1)

Another advantage of the `*` operator is that can be used also with objects that don't support slicing (like sets or dictionaries, since they have no ordering). N.B. if more than one element is unpacked with *, it will always end up in a list (even if, for example, the item unpacked is a tuple).

The `*` operator can be used also for unpacking objects on the RHS:

In [None]:
l1 = [1,2]
l2 = [3,4]
l = [*l1, *l2] # l=[1,2,3,4]

l

With dictionaries we have both keys and values that can be unpacked (unordered unpacking since there is no order!). With the `*` operator we unpack the keys of the dict only, while with the `**` we can unpack both keys and values (N.B. `**` can be used only on the RHS).

In [None]:
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
d = {*d1, *d2} # d={'a','b','c'} -> n.b. b was not repeated because keys are unique in sets and dictionaries.

d

In [None]:
d = {**d1, **d2} # d={'a': 1, 'b': 3, 'c': 4} -> n.b. b has the values contained in d2 since it was unpacked for second and overwrite the keys from d1.

d

## Nested unpacking
We can unpack also nested structures, such as list of lists, with the same operators.

In [None]:
a, *b, (c, *d) = [1, 2, 3, 'python']

a, b, c, d

back to [TOC](#TOC)

# Loops
---

## While loop
to generate an infinite loop:

In [88]:
while True:
    print('Infinite Loop')
    break # to stop infinite loop

Infinite Loop


`else` statement is executed after a while loop only if it terminates without a `break`

`continue` is used to interrupt the execution of the current iteration are restart the loop with the next iteration. Only `finally` statement is executed after a continue statement.

## Try statement
test a code block.

`except` is used to captures errors and handle exceptions

`finally` is a code block that is always executed, whether an exception or a break are invoked

### Common exceptions

- ZeroDivisionError

-

back to [TOC](#TOC)

# Functions
---


A first semantic difference in functions is the definition of `parameters` and `arguments`; the first is referred to the variables in the function definition while the second is refereed to the variables passed to the instance of the function.


In [89]:
def my_func(a, b): # a and b are the parameters of the function
  pass

x = 10
y = 'a'

my_func(x, y) # x and y are the arguments of the function

my_func

<function __main__.my_func(a, b)>

To be noted that x and y are passed by `reference` to `my_func`, i.e. the memory addresses of x and y are stored into a and b.
Therefore, the `Function Scope` contains the memory addresses of the variables that are passed to the function (x and y in the example).

Another pythonic difference is the definition of `functions` and `methods`. They are defined in the same way but a method is bound to a class, it is an attribute of the class that is callable.

In [90]:
from inspect import ismethod, isfunction

ismethod, isfunction

(<function inspect.ismethod(object)>, <function inspect.isfunction(object)>)

## Docstrings and annotations (PEP 257)
N.B. Docstring has to the first line of code in the function/class definition, otherwise it won't be inserted into the `__doc__` method and won't be displayed with the `help()` function.

Docstrings (single quote or triple quote) are the way to generate documentation inside the python code. They are different from comments (#) since the former are actually compiled by the interpreter and stored in the `__doc__` property of functions and classes. The `__doc__` property can be invoked with the `help()` function on any object that implements it.

In [91]:
def my_func():
  '''
  Here it goes the doctring that contains
  the instruction on function usage and arguments types and boundaries.
  This will be displayed invocking the `__doc__` method
  with the `help()` function.
  '''
  pass

help(my_func)

Help on function my_func in module __main__:

my_func()
    Here it goes the doctring that contains
    the instruction on function usage and arguments types and boundaries.
    This will be displayed invocking the `__doc__` method
    with the `help()` function.



Another way to document our code is to use annotations. These are not stored in the `__doc__` method but can be invoked by the `help()` function. Annotations can be also functions that are evaluated as constant during first compilation; however they don't bind the code to a specific behavior (a: int -> doesn't bind a to be an int), they are only metadata stored in teh `__annotations__` method which is a dictionary with parameters as key and annotations as values. These can be used by external modules like `Sphinx` to automatically generate documentation for our code.

In [92]:
def my_func(a: 'string', b: 'integer') -> 'a string':
  return a*b
  
my_func.__annotations__

{'a': 'string', 'b': 'integer', 'return': 'a string'}

## `lambda` expression
`lambda` expressions are another way to create function without the `def` statement. They are also referred as to `anonymous functions`. It has to be a single expression, therefore no assignment is allowed aswell as no type hinting (annotations)

In [93]:
# lambda [parameters list]: expression
lambda x: x**2
lambda x, y: x + y
lambda : 'hello' # we can assign 0 parameters and just return a constant

# we can assign lambda function to variable and later call it
my_func = lambda x: x**2

type(my_func), my_func(3)

(function, 9)

The lambda expression generates a `function object` that returns the expression when called.

## Function Intorspection
Function are first-class objects that, when created are shipped with a series of default dunder methods in additions to the ones that we implemented. To look at all the function attributes we can use the built-in function `dir()`. Among the dunders method we have:

In [94]:
# this will be shown with the module
# inspect.getcomments()
def func(a, b, c='hello'):
  pass

introspection = {
'name' : func.__name__, # the name of the function
'default args' : func.__defaults__, # tuple containing default positional parameters
'deafult kwargs' : func.__kwdefaults__, # dictionary containing default keyword parameters
'code object' : func.__code__, # return a code object which has its own methods:
'variables names' : func.__code__.co_varnames, # return the paramters and then the local variables (defined inside the function scope) of the function
'num of arguments' : func.__code__.co_argcount, # return the number of parameters except *args and **kwargs
}

for key, arg in introspection.items():
  print(f'{key} : {arg}')

name : func
default args : ('hello',)
deafult kwargs : None
code object : <code object func at 0x7f62cc0fb5b0, file "/tmp/ipykernel_43728/3242595406.py", line 3>
variables names : ('a', 'b', 'c')
num of arguments : 3


Also the module `inspect` can be used to retrieve information about the function:

In [95]:
import inspect

inspect.getcomments(func) # returns the comment just above the function definition

'# this will be shown with the module\n# inspect.getcomments()\n'

## \*args and **kwargs
Unpacking can be done also in functions parameters in order to specify a variable number of arguments as input:

In [96]:
def my_func(a, b, *args): # N.B. the name 'args' is just a conventions
  return a, b, args

a, b, c = my_func(1,2,3,4) # a=1, b=2, c=(3,4)
# note that inside function scope, arguments are unpacked into tuples e not lists

a, b, c

(1, 2, (3, 4))

The positional argument constructor `*args` has to be the last positional argument in the function since it exhaust all the non-assigned positional arguments; after that only keyword arguments are allowed and these can be unpacked with the `**kwargs` parameter. The `*` and `**` operators can be used to limit the use of positional or keyword arguments.

In [97]:
def my_func(*, name):
  # in this way my_func doesn't allow positional arguments.
  # name is automatically a keyword argument.
  pass

def my_func(a, *, name):
  # in this way my_func allow only one positional argument `a` and one keyword argument `name'.
  # since 'name' is placed after * it means it is a keyword argument
  pass


def my_func(*, name, **kwarg): # OK
  pass


In [98]:
def my_func(*, **kwarg): # ERROR an explicit keyword argument is required after the `*`
  pass


SyntaxError: named arguments must follow bare * (1713583272.py, line 1)

## Parameters default
Care must be taken when assigning default values to functions arguments, in particular if these are mutable objects. Wehn Python compile the script it stores in memory the function definition and any argument with default values. This means that each time that function is called, if the default parameters is left unchanged, it will use the specified values. In some case it may results in unwanted behaviors:

In [None]:
# creating a function that store a message in a log file with the datetime
from datetime import datetime
import time

def log(msg, *, dt=datetime.utcnow()):
  print(f'{dt}: {msg}')

# Now, since the value of dt is stored at runtime, each time we call
log('first log')
time.sleep(3)
log('first log')
# we will see that the time printed is alway the same since it has been stored at compilation time as a CONSTANT!

In [None]:
# SOLUTION
def log(msg, *, dt=None):
  dt = dt or datetime.utcnow() # if dt is false (None) the 'or' statement is executed
  print(f'{dt}: {msg}')
# we can set dt=None and check if the user actually input a values for dt. If not the function will call datetime.utcnow().
# Since this call is performed in the function scope, it gets executed each time the function is called. 

log('first log')
time.sleep(3)
log('first log')

Another example is when we create a mutable object directly as argument of a function. Also in this case, that object is evaluated as a constant at compilation time and reused as reference each time the function is called. This

In [None]:
# create a function that store values into a list
def add_item(item, func_list=[]):
  func_list.append(item)
  return func_list

my_list1 = add_item('banana') # a list is created that references to `func_list`
# so if i create another list of items
my_list2 = add_item('coca')
# now we have:
# my_list1 = ['banana', 'coca']
# my_list2 = ['banana', 'coca']
# because they are both referencing to func_list!

my_list1, my_list2

In [None]:
# SOLUTION
def add_item(item, func_list=None):
  func_list = func_list or list() # short-circuit
  func_list.append(item)
  return func_list

my_list1 = add_item('banana')
my_list2 = add_item('coca')

my_list1, my_list2

`KEY TAKE-AWAY`: Never use mutable objects as `default` arguments. Instead use None and create the object in the function scope. The only time it can come in handy is when using `memoization` to cache values from a function that is executed multiple times.

## Map, Filter and Zip functions
N.B. Map anf Filter have been mostly replaced by list comprehension and generator functions.

These are `higher order functions` (i.e. function that takes a function as parameter and/or returns a function), `map` return an `iterator` that applies the function to each element in the iterable.


In [None]:
# map(func, *iterables)

def sq(x):
  return x**2

l = [1, 2, 3]

list(map(sq, l)), list(map(lambda x: x**2, l))
# map returns a generator, therefore we need to pass it to list()

The number of iterables that are provided to map() is determined by the function that it is passed; if more than one iterable is provided, only the length of the shortest will be mapped. `filter` is a function that takes a function and a single iterable and returns the elements of the iterable that satisfies the condition given in the function, i.e. it filters the iterable.

In [None]:
l = [0,1,2,3,4,5]

list(filter(lambda n: n % 2 == 0, l)) # [0, 2, 4]

## Reducing Functions
Also called, accumulators, aggregators or folding functions; are functions that recombine an iterable recursively, returning a single value (es. finding the max in an array, or summing up its elements).

In [None]:
# write a reducing function to compute max, min and sum of an iterable.
l = [5, 8, 6, 10, 9]

add = lambda a, b: a+b
find_max = lambda a, b: a if a > b else b
find_min = lambda a, b: a if a < b else b

def _reduce(fn, sequence):
  result = sequence[0]
  for x in sequence[1:]:
    result = fn(result, x)
  return result

_reduce(find_max, l), _reduce(find_min, l)

The function `_reduce` takes two arguments, a function (add, find_max or find_min), and sequence of numbers. It applies the function recursively a return a single value (the sum, the max or the min) depending on the function passed.
Python has a builtin modules that contain the function `reduce` similar to the one defined above, but that works on any iterables, also non index ones.

In [None]:
from functools import reduce

l = [5, 8, 6, 10, 9]

reduce(find_max, l), reduce(lambda a, b: a if a > b else b, l) # find max of l

#Reduce has a third argument called 'initializer' that serve as first value for the reduced function. This is to avoid runtime error in the case, for example of trying to apply sum-reduce to an empty list.


Other builtin reducing function in python are:

In [99]:
red_func = {
'max': max(l),
'min': min(l),
'sum': sum(l),
'any': any(l), # return True if at least one element in the sequence evaluates to True
'all': all(l), # return True if all the elements in the sequence evaluates to True
} 

red_func

{'max': 4, 'min': 1, 'sum': 10, 'any': True, 'all': True}

## Partial functions
Partial functions are a way to reduce the number of argument required by a function, setting some of them as default. We could write ourself a wrapper to a function ore use the builtin `functools.partial` module.


In [100]:
# create a function that compute the power of a number

def pow(base, exponent):
  return base ** exponent

# we can create a partial function that specify the exponent so that it computes alway the square
def square(base):
  return pow(base, exponent=2)

# or we can use the functools.partial module
from functools import partial

square = partial(pow, exponent=2)

'''
N.B. if we define a variable prior to the partial definition, and we assign that variable as argument, the argument won't point to the variable but to the values associated with its memory address. therefore even if we change the variable after, the value at which the partial function is pointing will remain the same
'''

a = 2
square = partial(pow, exponent=a)
# exponent is pointing at the same memory address of 'a' (the value 2) and not to a itself
a = 5

square(5) # the value assigned in the square function remain the same (2)

25

## The `operator` module
The `operator` module is a builtin suits shipped with standard python installation. Its main purpose is to construct functional equivalents to arithmetic operation.

In [101]:
from functools import reduce
from operator import mul

# we have seen how we can use lambda expression together with reduce to create recursive functions on sequences
reduce(lambda x, y: x*y, [1,2,3,4]) # return the product of the elements of the list
# the same can be achieved with the operator.mul
multiplication = reduce(mul, [1,2,3,4])

multiplication

24

There is a variety of different methods in the operator module, for arithmetic/boolean operations (`mul()`, `add()`, `le()`, `is_()` ..), for sequences handling (`getitem()`, `setitem()`, `delitem()` ...) and for handling functions (`itemgetter()`, `attrgetter()`, `methodgetter()`...). These last ones don't return values but instances of the method called; they become an operator thyself (a function essentially) to be call on another objects.

In [102]:
from operator import itemgetter
l = [5, 8, 6, 10, 9]
f = itemgetter(1, 3) # create a function that return the item at index 1 and 3

f(l) # -> (8, 10)

(8, 10)

back to [TOC](#TOC)

# Classes
---


`__init__` is the default method that is called after the class object is created. The first argument is always `self`, i.e. the instance of the object created calling the class object.

In [103]:
class Rectangle:
  def __init__(self, width, height):
    self.width = width
    self.height = height

r1 = Rectangle(10,20)
# r1 is a instance of the class Rectangle, referred as `self` inside the class constructor    

r1.__class__, isinstance(r1, Rectangle)

(__main__.Rectangle, True)

## Getters and Setters
Getter and setter methods are implemented to impose some dynamics in the class structure. In python there are no private attributes (even if we can specify to the reader that a variable is private beginning its name with _). Setter and getter methods impose to the user some constrains

## Overload methods
There are some special methods that are built into the `class` constructor and are passed automatically to any instance of the class even if they are not defined explicitly. For example if we call the python method `str` on an instance of the class we will receive a standard output specifying the memory address of the instance. Unless we overwrite or `overload` this method explicitly inside the class definition.

### \_\_str\_\_ method

In [104]:
class Rectangle:
  def __init__(self, width, height):
    self.width = width
    self.height = height

  def __str__(self):
    return f'Rectangle: width:{self.width}, height:{self.height}'

r1 = Rectangle(10,20)

str(r1) # will print what inside the __str__ method
# if __str__ is not defined than the output will be:
# <__main__.Rectangle object at "some memory address">

'Rectangle: width:10, height:20'

### \_\_repr\_\_ method
It is similar to `__str__` method but its use is more developer-oriented. The repr method should return the string representation of the class instance called

In [105]:
class Rectangle:
  def __init__(self, width, height):
    self.width = width
    self.height = height

  def __repr__(self):
    return f'Rectangle({self.width}, {self.height})'

r1 = Rectangle(10,20)

repr(r1) # will print what inside the __repr__ method that is exactly Rectangle(10,20), how the object r1 has been created in the first place

'Rectangle(10, 20)'

### \_\_eq\_\_ method
Is the method needed to compare two objects generated by the same class with the `==` operator.

back to [TOC](#TOC)

# Scopes and Namespaces
---

The `Scope` is the portion of the code in which a variable name is defined; it has an associated `namespace`, essentially a table that lists all the variables in the Scope and the associated memory addresses. There are different `Scope` in python and are defined in a nested structure. At the top, we have the `built-in` scope, the only truly global scopes that exists across each modules of Python, which contains the definitions of core elements such as `True`, `None`, `dict` etc. Nested inside the built-in scope there is the so called `Global` scope (even if it is not global in the sense that exist only inside a single file). Moreover, each function has its own scope, named `Local`, that is created each time the function is called (until the function is not called, the variables defined in its own scope are not compiled, therefore does not exist in the Global scope, where the function is defined).

In [106]:
# module1.py
print(True)
# both print and True are not defined in the module1 scope, therefore python automatically goes up one level and look up for their definition in the builtin-scope

True


In [107]:
print(new_var_never_used)
# Error! 'a' has not been defined in the module scope and neither is in the built-in scope, therefore Python trows an error `NameError`

NameError: name 'new_var_never_used' is not defined

We can summarize that at `compile time` python looks at the code and predetermine which variables will eventually be in the local or global scope. When it encounter a `def` (function), it will look inside of it; if there are assignations (e.g. a = 100), it will understand that that variable will be part of the global scope only, unless the `global a` keyword is specified; if a variable is called but not assigned inside the function (e.g. print(a)), the compiler will determine that it is a non-local reference.

In [None]:
a = 0 # global scope/namespace

def func1():
  print(a) # the compiler understand it is a non-local variable since there is no assignment inside the local scope

func1()
print(a)

In [None]:
def func2():
  a = 100 # the compiler knows that this will be a local variable
  print(a)

func2()
print(a)

In [108]:
def func3():
  global a # the compiler knows that this will refer to a global variable
  a = 100 
  print(a)

func3()
print(a)

100
100


## Masking
It is defined masking, and should be avoided, when we overwrite a keyword from the built-in scope. Since Python first look at the module scope, if we have assigned a variable to an existing element in the built-in namespace, we will modify its standard behavior.

In [109]:
# module1.py
print = lambda x: f'Hello {x}'
# we are redefining locally the 'meaning' of the variable print so now:
print('world') # -> 'Hello world'
# python is invoking the local definition of print e not the built-in one

'Hello world'

The same behavior is applied between Global and Local scopes. When assigning a variable inside a functional scope, python sees it at compilation time and stores it (it will be created only when the function is called but the compiler is already aware that it exists).
Therefore, if the same variable exist in the Global scope, when the function is called, it will be masked by the assignation in the local scope.

In [110]:
# del print # remove the previously overwrite of standard print function

a = 0 # global scope/namespace

def my_func():
  a = 100 # local scope/namespace
  print(a)

my_func() # a = 100, the global scope 'a' has been masked 

print(a) # a = 0, the global scope hasn't been modified, and the local scope of `my_func` has been destroyed after its execution.

'Hello 0'

There is also the possibility to avoid masking by explicitly tell python that the variable assigned in the local spaces are actually owned in the global space. This is done by declaring the `global` keyword at the beginning of a local scope. This is telling the compiler to look first in the global scope for that particular variable and, if not found, to create a new one.

In [111]:
a = 0

def my_func():
  global a
  a = 100 # local scope/namespace
  print(a)

my_func() # a = 100, since `global a` has been declared, the variable `a` in the global scope has been modified

print(a) # a = 100

'Hello 100'

## Nonlocal scope
When we define a function inside another function, a new scope is created which is not the global (module level) scope, nor the local (function level) scope. It is a middle scope called `non-local scope`. Variables belonging to the nonlocal scope are called `free variables`.

In [112]:
def outer_func():
  x = 10 # local scope of outer_func == non-local scope of inner_func
  
  def inner_func():
    x = 20 # local scope of inner_func
  
  inner_func()

  print(x)

# if we call outer_func:
outer_func() # x = 10 since the local scope of inner func has not modified the non-local scope (local scope of outer_func)

In the same way as we tell python that a variable in a local scope is `global`, we can specify a variable to be `non-local` (i.e. with the same reference of the one in the outer_func scope). 

In [113]:
def outer_func():
  x = 10 # local scope of outer_func == non-local scope of inner_func
  
  def inner_func():
    nonlocal x # now the reference of x is shared with the non-local (outer_func) scope
    x = 20 # local scope of inner_func
  
  inner_func()

  print(x)

# if we call outer_func:
outer_func() # x = 20 since the local scope of inner func has modified the non-local scope (local scope of outer_func)

N.B. if in a local scope we define a `global` variable, python will look in global scope for a match, otherwise it will create the global variable. Instead, when defining a `nonlocal` variable, python will look only in the non-local scope (the local scope of the parent function).

In [114]:

def outer():
  x = 0 # local scope of outer
  
  def inner1():
    # local scope of inner1 == nonlocal scope of inner2
    def inner2():
      nonlocal x
      x = 10 
    inner2()

  inner1()
  print(x) # x = 0 because inner2 looked only in its nonlocal scope

outer()

back to [TOC](#TOC)

# Closure
---

A `Closure` is a special python constructor that is composed by a function and an extended scope (nonlocal scope) that contains free variables (nonlocal variables). This means that both the functions and the extended scope point to the same object, but python don't do this directly. Instead a `cell object` is created, pointing to the value of the free variable, while the free variable, in both the extended scope and the function scope, point to the cell object.

In [115]:
def outer():
  a = 10

  x = 'python'  #----------
                # THIS IS A
  def inner():  # CLOSURE
    print(x)    #----------
  
  return inner

fn = outer() 
'''
now outer has returned the function inner which should print x without directly containing a reference to the variable.
Therefore, since the scope of the function outer is exhausted after it is called, we should expect that the variable x=python is lost and can't be referenced by fn.
Instead, it is possible since python, during compilation, sees a Closure and create an intermediate cell object that share the reference to x both from inner and outer functions.
'''

# x = 'python' #--|
                 #|--> cell object --> str object `python`
# print(x)     #--|              

fn()

Both `x` in outer and inner functions point to a `cell object` which contains a reference to another object in memory containing the string `python`. This lets us be able to call the function inner, returned from the function outer, even if the scope of outer is already exhausted. We can inspect the closure and free variables of an object:

In [116]:
fn = outer()

# fn.__code__.co_freevars # (x) while a is not a 
# fn.__closure__ # cell object at address xxx containing a str object ('python') at address yyy
# the memory address of both the `x` (local and free) is the same and pointing to yyy

fn.__code__.co_freevars, fn.__closure__

(('x',), (<cell at 0x7f62cc0fa9d0: str object at 0x7f62d1f2e030>,))

We can have multiple instance of the same closure, this means che each time a cell object is created, leaving the behavior of the different instance of the closure independent.

In [117]:
def counter():
  # beginning of Closure
  count = 0

  def inc():
    nonlocal count
    count += 1
    return count
  # end of Closure

  return inc

f1 = counter()
f2 = counter()
# f1 and f2 behavior is independent

f1()
f1()
f1(), f2()

(3, 1)

## Shared extend scope
At the same time we can have `shared extended scope` of two different closures.

In [118]:
def outer():
  count = 0

  def inc1():
    nonlocal count
    count += 1
    return count

  def inc2():
    nonlocal count
    count += 1
    return count

  return inc1, inc2

f1, f2 = outer() 

f1()
f2()

2

`f1` and `f2` are two closure that share the free variable `count` therefore both the functions, when called, will increment the value of count. If this behavior is wanted, then no problem, but often happens to share the same free variable without knowing it.

In [119]:
# create a list of functions that add a two values
adders = []
for n in range(1, 4):
  adders.append(lambda x: x + n)

# what we expect to have is a list of functions
adders


[<function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>,
 <function __main__.<lambda>(x)>]

In [120]:
# therefore calling 
adders[1](10) # should return 12 = 10 + 2 
# instead all the three functions will add 3, i.e. the last value at which n was pointing to

13

`n` is a global variable, and it doesn't get evaluated until the function is called, and at that time, after the for loop is executed is equal to 3. As a matter of fact we don't have a closure since `n` is a global variable. The correct way to achieve this would be:

In [121]:
def create_adders():
  adders = []
  for n in range(1, 4):
    adders.append(lambda x, y=n: x + y) # in this way we are saving the value of n at each iteration 
  return adders

adders = create_adders()

adders[1](10)

# since we have specified a default value for `y`, this will be evaluated at creation time, not at runtime (i.e. when the function is called).
# `y` won't point to the object `n` itself but to its value at each iteration.
# Therefore, `y=n` belong to the local scope of the `create_adders` function, therefore, the functions appended to adders are actually closures

12

## Nested Closure
It is common, e.g. in decorators, to have nested closures:

In [122]:
# define a function that takes an increment and a starting values and return a function that add the increment each time is called. The
def increment(n):

  def inner(start):
    current = start
    
    def inc():
      nonlocal current
      current += n
      return current

    return inc
  return inner

# Now inc has two free variables (current, n) one that lives in the `inner` scope and one in the `increment` scope.
# if we call:
fn = increment(2) # we will return the inner function with the variable n = 2
fn.__code__.co_freevars # `n` is the free variable of the closure containing the `inner` function
# if we than call:
inc_2 = fn(100) # we will return the `inc` function with the variable n = 2 and current = 100
inc_2.__code__.co_freevars # `n` and `current` are the free variables of the closure containing the `inc` function
# now if we call:
inc_2() # -> 102
inc_2() # -> 104

104


## Application

*hold*


back to [TOC](#TOC)

# Decorators
---

A decorator is a function that takes a function as argument and returns a closure (that in general accept any number of arguments *args and **kwargs) that contain that same function passed with the addition of extra functionality. Let's see an example:

In [123]:
def counter(fn): # counter takes a function as argument
  count = 0
  def inner(*args, **kwargs): 
    nonlocal count
    count +=1
    print(f'Function {fn.__name__} was called {count} times')
    return fn(*args, **kwargs)
  return inner

def add(a, b):
  return a + b

add = counter(add) # closure function is returned by counter()
# now add is no more referencing to the 'def add' function but to it's decorated version, returned by counter()  

result = add(1, 2) # -> 3
result = add(1, 3) # -> 4
result = add(5, 2) # -> 7

result


7

In the example above, `counter()` is essentially a `decorator`; it takes an arbitrary function with any arbitrary arguments, and return the same function with the new "ability" of taking track of how many times it has been called. We reassigned the name add to the decorated function, to pointing out that the function is still the same but now points to the closure returned by `counter()`. Returning a closure is something pretty common in python, therefore and handy way has been defined to decorate a function using the `@` symbol.

In [124]:
# once the function counter has been defined from previous example

@counter
def add(a, b):
  '''Documentation'''
  return a + b

add(1,2)
add(3,2)

5

All good so far, but now if we look for the metadata of the function `add` we'll see that these now refers to the closure function `inner` and not to the original definition (`__name__`, `__doc__` etc. point to the closure function). The pythonic solution to this problem is to use the module `functools.wraps`:

In [125]:
add.__name__, add.__doc__

('inner', None)

In [126]:
from functools import wraps

def counter(fn):
  count = 0
  @wraps(fn) # we are decorating the inner function
  def inner(*args, **kwargs): 
    nonlocal count
    count +=1
    print(f'Function {fn.__name} was called {count} times')
    return fn(*args, **kwargs)
  return inner

@counter
def add(a, b):
  '''Documentation'''
  return a + b

add.__name__, add.__doc__

('add', 'Documentation')

## Multiple decorators
Multiple decorators can be passed to a function; care must be taken to ensure that the order of execution of the two or more decorators respect what wanted by the coder. For example:

In [127]:
def dec_1(fn):
  def inner(*args, **kwargs):
    print('dec_1 called')
    result = fn() # calling the decorated function
    return result
  return inner # return the closure

def dec_2(fn):
  def inner(*args, **kwargs):
    print('dec_2 called')
    result = fn() # calling the decorated function
    return result
  return inner # return the closure


@dec_1
@dec_2
def my_func():
  print('my_func called')

'''
Calling my_func, decorated in this order, is equal to do:

my_func = dec_1(dec_2(my_func))

Therefore, first the closure dec_2(my_func) is evaluated and passed to dec_1(). 
Since inside the decorators the print() is executed before the function call (result = fn()), the printing output will be:

dec_1 called
dec_2 called
my_func called

because first the dec_1 function is called, it prints its output, then call fn passed as argument, which is dec_2(my_func);
therefore dec_2 is called, it prints its output, then call the fn passed as argument, i.e. my_func, that prints its output.
N.B. if the print() had been placed after the fn() call, the print-out order would had been reversed!
This is to say that depending on the functionality we want to implement with our decorators, the order of application matters!
'''

my_func()

## Memoization
Another very powerful application of decorators is called `memoization`, i.e. the process of storing data into cache to avoid excessive recursive calculation (like in the fibonacci or factorial function). Let's take as an example a function to compute the fibonacci value at the n position:

In [128]:
# with recursion
def fib(n):
  print(f'computing fib({n})')
  return 1 if n < 3 else fib(n-1) + fib(n-2)

fib(4)

3

In this way, the function works and it is elegant, but it is not performant since it has to compute each time all the previous numbers in the fibonacci series. A way to solve this problem is to cache the results each time they are computed, and this can be easily implemented creating a class:

In [129]:
class Fib:
  def __init__(self):
    self.cache = {1: 1, 2: 1}

  def fib(self, n):
    if n not in self.cache:
      print(f'computing fib({n})')
      self.cache[n] = self.fib(n-1) + self.fib(n-2)
    return self.cache[n]

f = Fib()
f.fib(4)

3

In this way, after creating an instance of Fib(), fibonacci sequence will be stored while computed (n.b. cache won't be shared between instances, each new instance will have its cache empty at the beginning). The same can be accomplished with a closure (i.e. with a decorator):

In [130]:
def fib():
  cache = {1: 1, 2: 1}

  def calc_fib(n):
    if n not in cache:
      # cache is a nonlocal parameter
      print(f'computing fib({n})')
      cache[n] = calc_fib(n-1) + calc_fib(n-2)
    return cache[n] 

  return calc_fib # return the closure

f = fib()
f(4)

3

From the function `fib()` to a decorator the path is short; we just need to generalize its structure:

In [131]:
def memoize_fib(fib):
  cache = dict()

  def inner(n):
    if n not in cache:
      # the decorator is not carrying out the recursion
      # it is only caching values
      cache[n] = fib(n)
    return cache[n] 

  return inner # return the closure

@memoize_fib
def fib(n):
  print(f'computing fib({n})')
  return 1 if n < 3 else fib(n-1) + fib(n-2)

fib(4)

3

It is worth to note that `memoize_fib` is not a general purpose decorator since it does not accept any number of arguments or keyword arguments (*args, **kwargs) as it usually does, but it is precisely built to work with the function `fib`. Another important aspect to handle is to limit the cache size to safeguard the tradeoff between performance and memory usage. Of course python as already a builtin decorator specifically design for memoization. It comes shipped with the `functools` module.

In [132]:
from functools import lru_cache # least recently used cache

@lru_cache() # lru_cache decorators accept arguments.. see below
def fib(n):
  print(f'computing fib({n})')
  return 1 if n < 3 else fib(n-1) + fib(n-2)

fib(4)

3

## Parametrized decorators
Parametrized decorators are the ones that can handle arguments (such as `wrap` and `lru_cache`). Imagine we have a decorator that run a function a number of time `n` set by the user:

In [133]:
def run_n_times(fn):
  n =3

  def inner(*args, **kwargs):
    for _ in range(n):
      fn(*args, **kwargs)
    return print(f'{fn.__name__} was called {n} times')

  return inner

@run_n_times
def my_func():
  print('called')
  pass

my_func()

'Hello my_func was called 3 times'

Now, the number of times the function is called has been hardcoded in the decorator, but we want to be able to change that parameter. We can think at something like:

In [134]:
def run_n_times(fn, n: int):
  def inner(*args, **kwargs):
    for _ in range(n):
      fn(*args, **kwargs)
    return print(f'{fn.__name__} was called {n} times')

  return inner

# now we would expect to call the decorator as:
@run_n_times(10)
def my_func():
  print('called')
  pass

# but we get an error since run_n_times requires two arguments

TypeError: run_n_times() missing 1 required positional argument: 'n'

In [None]:
def my_func():
    print('called')
    pass    

# however, the argument `10` in the decorator call is in the position of `fn`, therefore it won't work.
# we could instead apply the decorator indirectly as:
my_func = run_n_times(my_func, 3)
# and this will work but how to implement the same behavior with the @ method?

my_func()

In order to be able to use the `@` symbol with a decorator that accept arguments, we need that decorator to return a decorator itself when called. The result of `run_n_times(10)` has to be another decorator that actually perform the decoration we want. The solution is straightforward: we need to enclose our decorator in a `decorator factory` that will olds the extra parameters needed.

In [None]:
def run_n_times(n: int): # decorator factory

  def inner1(fn): # decorator

    def inner2(*args, **kwargs):
      for _ in range(n):
        fn(*args, **kwargs)
      return print(f'{fn.__name__} was called {n} times')

    return inner2
  
  return inner1

Now the call `@run_n_times(10)` actually returns the decorator `inner1` which implement the functionality we originally looked for:

In [None]:
@run_n_times(3) # returns the decorator `inner1`
def my_func():
  print('called')
  pass

my_func()

In [None]:
# this is equivalent to say:
my_func = run_n_times(3)(my_func)

## Decorator class
Not only functions can be used to create decorators factory, but also classes. As a matter of fact, thanks to the `__call__` method, we can replicate the same exact behavior seen in the previous example:

In [None]:
class MyClass:
  def __init__(self, n): # the instance of the class become the decorator factory
    self.n = n

  def __call__(self, fn): # this is the actual decorator
    def inner(*args, **kwargs):
      for _ in range(self.n):
        fn(*args, **kwargs)
      return print(f'{fn.__name__} was called {self.n} times')
    
    return inner # closure

@MyClass(3)
def my_func():
  print('called')

my_func()

## Monkey Patching and Decorating classes
Functions are not the only object that can be decorated; Classes too can thanks to the dynamic behavior of python that allows the so called `Monkey Patching`, i.e. the modification/addition of attributes/methods to classes at runtime. Essentially, we are able to mutate the behavior of a class at runtime. Imagine we are using the class `Fraction` and we want to add to it some functionality; we can do the following:

In [None]:
from fractions import Fraction

f = Fraction(2,3) # create an instance of the Fraction class
# we want the class `Fraction` to be able to speak ...
# if we write:
Fraction.speak = 100
# we are Monkey Patching the class Fraction at runtime, so if following we say:

f.speak

We can make the `Monkey Patched` methods also callable, for example using a lambda function (we can directly patch the class instead of an instance of it):

In [None]:
Fraction.speak = lambda self, message: f'Fraction says {message}'

# we need `self` as argument because we will pass to the method an instance of the class Fraction
# Now we can call:
f.speak('You cannot pass!') # -> 'Fraction says You cannot pass!!

We can see how the process of monkey-patching is essentially a decoration of a class and, as a matter of fact it can be done with a decorator function:

In [None]:
def decorator_speak(cls): # we are passing a class to the function
  cls.speak = lambda self, message: f'{self.__class__.__name__} says {message}'
  return cls # return is only needed if we want to decorate with the `@` symbol

# Now we can simply write on any class:
class Person:
  pass

Person = decorator_speak(Person) # indirect decoration
p = Person() # instance of the class
p.speak('I am ALIVE!') # method inherited from the decorator

Let's do something more useful; Imagine we want to debug an existing class creating a decorator.

In [None]:
def info(obj): # think of this as of the method we would write inside the class, i.e. 'obj' would be 'self'
  from datetime import datetime, timezone
  results = {
    'time' : f'{datetime.now(timezone.utc)}',
    'name' : obj.__class__.__name__,
    'id' : hex(id(obj)),
    'vars' : [(k,v) for k, v, in vars(obj).items()]
  }
  return results


def debug_class(cls): # This is the decorator
  cls.debug = info
  return cls 


# if we want to pass the decorator in function-style:
debug_class(Person)
# we don't need the function to 'return cls' since we are modifying an object inplace.
# However, if we want to use the `@` we need the return, because otherwise, the default return is 'None'.
Person = debug_class(Person) 
# the rhs is returning None and it is assign it to 'Person' that therefore doesn't point anymore to the class Person nor to its decorated version.

In [None]:
@debug_class
class Person:
  def __init__(self, name, age, employed=True):
    self.name = name
    self.age = age
    self.employed = employed

p = Person('Giovanni', 32)
p.debug()

## Single Dispatch Generic Functions
First lets define what overloading is:

`Overloading` in object oriented programming is the ability to create more then a function with the same name al long as its signature is different (essentially if the two functions are distinguishable, i.e. different number/type of arguments etc..). When the program is compiled, the interpreter will understand, based on the signature at which function with the same name we are referring to. 

In python, since there is no static typing, we can't declare a function signature, therefore, overloading, in its strict sense, is not possible. A workaround to this problem is called  `single dispatch generic function`, which allows us to overload functions based on the type of the first argument (if we want to consider the type of more arguments we need `multi dispatch`).


### Application - `Htmlizer`
Link to [Single_dispatch_generic_function_Htmlizer](Single_dispatch_generic_function_Htmlizer.ipynb)

back to [TOC](#TOC)

# Python optimizations
---

## Interning
Python at startup automatically pre-loads (caches) a global list of integer in the range [-5, 256], so these integers have a fixed memory reference. Since these numbers show up often, avoid to reference these each time they appear results in an optimization. A number outside this range will require a new memory reference, and that's why:

In [None]:
a = 500
b = 500
a is b # will return False

The caches integers are called `Singletons`, basically classes that can be instantiated only once.

The same might happen with some strings; python can interning some string (that follow certain rules, letters and numbers concatenated with underscores) in order to speed up the equality (if a string in interned than i can use the `is` operator, otherwise i have to use the `==` character by character). We can force python to interning strings with the sys module:

In [None]:
'''
usually is something we don't need, unless for example we are working with a large ste of string
for NPL and we need to tokenize some words that are reaped often. In this case it can be a useful optimization,
since if a string is interned it becomes a Singleton and can be compared with the mush faster 'is' operator.
'''
import sys

a = sys.intern('this will be interned')

## Peephole
Is an optimization that occur at compile time (so it is repeated each time the script is launched). For example we can have `Constant expression` like numeric calculation thata are better read as the operation rather than the results:


In [None]:
minute_in_day = 60 * 24 # 1440

The expression `60 * 24` is more readable than `1440` but we may thing that, if the variable is called multiple times, we may have performance issues. This is not the case because this is a constant expression and python knows it, so the first time it encounters the variable stores its results, without having to compute it again.

The same happen for membership tests, i.e. check if an object is in a list. If we have a constant expression, python will replace the mutable object with is immutable counterpart (list-> tuples, sets -> fronzensets)


```py
for i in range(100000):
  if i in [1,2,3]:
    pass
```


The list `[1,2,3]` is converted into a tuple `(1,2,3)` so that, being immutable, it has a fixed memory address.

N.B. sets, since are similar to dictionaries (hashmaps), are much more efficient than lists for membership testing!

back to [TOC](#TOC)

# Common Modules
---

## `string`
Module with some useful string constants and representation.


## `functools`
Module with useful functions:

* `total_ordering` : decorator for classes that automatically implement comparison functionality (le, ge, lt, gt), if only one of these is already implemented
* `reduce`: iterate over a sequence applying a function
* `partial`: lets us set some arguments of a function as default parameters
* `wraps`: decorator that allow to wrap a function/class metadata and keep it after the decoration
* `lru_cache`: decorator that allows caching data in recursive structures


## `itertools`
Module with useful iteration methods:
* `cycle` create a cyclic iterator from an iterable

## `collections`
Module with useful functions:

* `namedtuple` : tuple with argument assignment - substitute of classes or dictionary in some case
* `Counter`: it is a class that takes a list returns a dictionary with the number of occurence of each element in the list

## `random`
Pseudo random number generator:

* `seed` value is required to create a repeatable random sequence (essential for testing)

In [None]:
random.seed(0) # now i can execute thi cell as many time as i want but the output won't change
import random
for _ in range(3):
    print(random.randint(2,10), end=', ')

* `shuffle` inplace mixing up of a list 
* `gauss` draw numbers from a gaussian distribution
* `choice` draw a random element from a list
* `choices` draw a defined number of random element from a list; it has the option to weights the appereance of the elements in the list.
* `sample` draw a sample from a list without repetition

### `timeit`
Platform specific timer for performance evaluation of code

In [None]:
# es. of usage
timeit(stmt='math.sqrt(2)', setup='import math', number=n, globals=globals())

### `argparser`
When we run a python script from terminal it might be useful to be able to pass some arguments/variables that will be used by the script itself. The easiest way to retrieve command arguments is to use the `sys.argv` method which returns a list of strings containing the name of the module runned and the argument passed (arguments must be whitespace-separated).

However, the smart way of doing it is to use the builtin module `argparser`:

```py
import argparse

parser = argparse.ArgumentParser(description="Description of the parser")

# now we populate the parser with what we expect to retrieve in the command line
parser.add_argument('first_arg', help='description of first arg', type=str)
parser.add_argument('second_arg', help='description of second arg', type=int)

# now we need to tell the parser to parse these arguments from sys.argv[1:] 
# (by default if nothing specified inside parse_args())
args = parser.parse_args()

print(args.first_argument)
print(args.second_argument)
```

We can call the module from terminal with the flag `-h` to receive in output the descriptions of the parser, that should help us understand the expected usage of the module

We can also specify keyword arguments and many more options:

```py
parser.add_argument('-kw', '--keyword', help='first kw arg', type=str, required=False, dest='alias name')
```

where the first two arguments are the short and long way of assign the kw argument in the command line, `required` is to specifuy if the argument is mandatory and `dest` gives an alias to the variable that will be actually used in the code. Other argumnet that we may be interested in are:
* `nargs` to accept more value per argument. It can be equal to `+` or `*` depending if we require at least one argument or not 
* `action` to specify a behavior on the argument like `store_true`, `store_constant` etc..

Another useful think is to define a group of mutually exclusive arguments (i.e only one can be specified not both), particularly useful when creating flags:

```py
group = parser.add_mutually_exclusive_group()
group.add_argument('-v', '--verbose', action='store_true')
group.add_argument('-q', '--quite', action='store_true')
# so doing only one betweee -v and -q can be specified
```

back to [TOC](#TOC)

# Tips and tricks

### Python version

In [None]:
import sys
sys.version_info

### Exceptions handling
"look before you need" or "ask for permission" are two different approach we can use to catch errors; the first corrispond to try and except, the latter to and if statement. Generally speaking the if statement is faster than the try-except but it may be less expressive; moreover the try-except becomes a burden only if the exception is raised often, otherwise there is not much difference in computational time. 

### which import is faster
N.B. it matters only if the code is runned a humongus number of time

In [135]:
from timeit import timeit

# SLOWER
import math

# FASTER
from math import sqrt

n = 10_000_000

timeit(stmt='math.sqrt(2)', setup='import math',number=n), timeit(stmt='sqrt(2)', setup='from math import sqrt',number=n)

(0.6948520680016372, 0.4379689560155384)

### Sentinel values instead of None
Sometimes we need to define function or classe with kw arguments and have a way to check if the user passed thata argument or not. The standard way of proceeding is to set the default value to `None`. This is fine, but sometimes `None` itself can be an acceptable parameter given by the user, but we won't catch it.

In [144]:
def some_func(kw=None):
    if not kw:
        print('kw was not passed')
    else:
        print('kw was passed')
        
some_func(), some_func(None), some_func(10)

(None, None, None)

As we can see, passing `None` as argument result in the wrong behavior (because actually an argument was passed.. it was None!). (btw.  in this case the same goes for any argument that has a truth values of `False`). An alternative approach is tu set a sentinel values as a flag, something so unique that the user could never use it.

To do this, a smart idea is to use as sentinel value the `id` of an object since it will be unique in memory. The most genereci way is to use the id of the python object class `id(object()`:

In [138]:
_SENTINEL = object()

def some_func(kw=_SENTINEL):
    if kw is _SENTINEL:
        print('kw was not passed')
    else:
        print('kw was passed')
        
some_func(), some_func(None), some_func(0), some_func([]), some_func(10)

(None, None, None, None, None)

As expected, now the behavior is correct and we know exactly if the user has passed an argument. Actually we could have set directly the values of `kw` equal to `object()` since it would have been created at runtime by python when encounter the function definition, thus that object id would have been unique.

### Switch statement in python PEP 3103
In other programming language, like Java, a switch is a structure that holds different values, and is able to switch from one to the other depending on the key value passed. In python we have different way to simulate this behavior, the simplest (and worst) is with a series of `if, elif` statements, using a dictionary or, more elegantly using an associative array like a single_dispatch_function

back to [TOC](#TOC)