<h1>Python Manual</h1>

by Ing. Giovanni Frison

In [1]:
from datetime import date
print(f' last update: {date.today()}')

 last update: 2023-01-16


# TOC

1. [PEP8 naming convention](#PEP8-naming-convention)
2. [Everything is an Object](#Everything-is-an-Object)
3. [Modules](#Modules)
    1. [Some clarifications](#Some-clarifications)
    2. [Reload a module](#Reload-a-module)
4. [Packages](#Packages)
    1. [Package structure](#Package-structure)
    2. [Package Namespace PEP 420](#Package-Namespace-PEP-420)
4. [Variables and Memory](#Variables-and-Memory)
    1. [id function](#id-function)
    2. [Reference Counting](#Reference-Counting)
    3. [Shared Reference](#Shared-Reference)
    4. [Garbage Collection](#Garbage-Collection)
    5. [Object Mutability](#Object-Mutability)
    6. [Variable Equality](#Variable-Equality)
5. [Built-in methods](#Built-in-methods)
   1. [isinstance(object, class)](#isinstance(object,-class))
   2. [issubclass(object, class)](#issubclass(object,-class))  
16. [Numeric Types](#Numeric-Types)
    1.  [Integers](#Integers)
        1.  [Operations](#Operations)
        2.  [Base](#Base)
    2.  [Rational Numbers](#Rational-Numbers)
    3.  [Floats (Real Numbers)](#floats-real-numbers)
        1.  [equality](#equality)
    4.  [Booleans PEP 285](#Booleans-PEP-285)
        1.  [Booleans operators](#Booleans-operators)
        2.  [Short-Circuiting](#Short-Circuiting)
6. [Iterable and Iterators](#Iterable-and-Iterators)
    1. [Consume iterators manually](#Consume-iterators-manually)
    2. [Lazy Iterables](#Lazy-Iterables)
    3. [iter() method](#iter()-method)
        1. [iter() with callables](#iter()-with-callables)
    4. [Delegating Iterators](#Delegating-Iterators)
    5. [Reversed Iteration](#Reversed-Iteration)
    6. [Caveats: using iterators as function arguments](#Caveats:-using-iterators-as-function-arguments)
7. [Generators](#Generators)
   1. [Iterables from generators](#Iterables-from-generators)
   2. [Generator expressions](#Generator-expressions)
   3. [Yield from](#Yield-from)
8. [Sequence Type](#Sequence-Type)
   1. [Mutating sequence](#Mutating-sequence)
      1. [in-place concatenation and repetition](#in-place-concatenation-and-repetition)
      2. [Mutation by assignment](#Mutation-by-assignment)
      3. [Never return a mutated object](#Never-return-a-mutated-object)
   2. [Copying Sequences](#Copying-Sequences)
      1. [Shallow copies](#Shallow-copies)
      2. [Deep Copies](#Deep-Copies)
   3. [Slicing](#Slicing)
   4. [Custom Sequences](#Custom-Sequences)
      1. [Mutation in custom sequences](#Mutation-in-custom-sequences)
   5. [Sorting sequences](#sorting-sequences)
   6. [Zero-based Index](#Zero-based-Index)
   7. [Application - Polygon](#Application---Polygon)
   8. [Lists vs Tuples](#Lists-vs-Tuples)
       1. [Copying](#Copying)
       2. [Storing Efficiency](#Storing-Efficiency)
9.  [Iteration Tools - The itertools module](#Iteration-Tools---The-itertools-module)
    1. [Aggregators](#Aggregators)
    2. [iSlicing](#iSlicing)
    3. [Selecting and Filtering](#Selecting-and-Filtering)
    4. [Infinite iterators](#Infinite-iterators)
    5. [Chaining and Teeing](#Chaining-and-Teeing)
    6. [Mapping and Accumulation](#Mapping-and-Accumulation)
    7. [Zipping](#Zipping)
    8. [Grouping](#Grouping)
        1. [Caveat: lazy iterators in I/O operation](#Caveat:-lazy-iterators-in-I/O-operation)
    10. [Combinatorics](#Combinatorics)
        1. [Cartesian Product](#Cartesian-Product)
        2. [Combinations](#Combinations)
        3. [Permutations](#Permutations)
10. [Context Managers PEP 343](#Context-Managers-PEP-343)
    1.  [Try..Except..Finally](#Try..Except..Finally)
    2.  [The context management protocol](#The-context-management-protocol)
    3.  [The 'with' Scope](#The-'with'-Scope)
    4.  [The \_\_enter\_\_ mehtod](#The-\_\_enter\_\_-mehtod)
    5.  [The \_\_exit\_\_ mehtod](#The-\_\_exit\_\_-mehtod)
    6.  [Context Manager Class](#Context-Manager-Class)
    7.  [Caveat with Lazy Iterator](#Caveat-with-Lazy-Iterator)
    8.  [Additional use of Context Manager](#Additional-use-of-Context-Manager)
        1.  [Redirect Standard Output](#Redirect-Standard-Output)
        2.  [Timing a Function](#Timing-a-Function)
    9.  [Context Manager Decorator](#Context#Manager#Decorator)
    10. [Nested Context Manager: ExitStack](#Nested-Context-Manager:-ExitStack)
11. [Strings](#Strings)
    1. [Common methods](#Common-methods) 
12. [Lists](#Lists)
    1. [List comprehension](#List-comprehension)
13. [Tuples](#Tuples)
   9. [Named Tuples](#Named-Tuples)
      1. [Introspection](#Introspection)
      2. [Modify and Extending](#Modify-and-Extending)
      3. [Docstring](#Docstring)
      4. [Defaults values](#Defaults-values)
14. [Associative arrays](#Associative-arrays)
15. [Dictionaries](#Dictionaries)
    1.  [Creating dictionaries](#Creating-dictionaries)
    2.  [Common operations with dictionaries](#Common-operations-with-dictionaries)
    3.  [Dictionary view](#Dictionary-view)
    4.  [Custom Class and Hashing](#Custom-Class-and-Hashing)
    5.  [Deafultdict](#Defaultdict)
    6.  [Counters](#Counters)
    7.  [ChainMap](#ChainMap)
    8.  [UserDict](#UserDict)
    9.  [MappingProxy](#MappingProxy)
16. [Sets](#Sets)
    1.  [Basic Set Theory](#Basic-Set-Theory)
    2.  [Python implementation of sets](#Python-implementation-of-sets)
    3.  [Sets creation](#Sets-creation)
    4.  [Common operation in sets](#Common-operation-in-sets)
    5.  [Frozen Sets](#Frozen-Sets)
17. [Serialization and Deserialization](#Serialization-and-Deserialization)
    1. [Pickle](#Pickle)
    2. [JSON](#JSON)
    3. [JSONEncoder Class](#JSONEncoder-Class)
    4. [Custom Encoding](#Custom-Encoding)
    5. [JSONDncoder Class](#JSONDecoder-Class)
    6. [JSON Schema](#JSON-Schema)
    7. [Marshmallow](#Marshmallow)
    8. [PyYaml](#PyYaml)
    9. [Serpy](#Serpy)
18. [Unpacking iterables](#Unpacking-iterables)
   10. [Unpacking with *](#Unpacking-with-*)
   11. [Nested unpacking](#Nested-unpacking)
19. [Loops](#Loops)
   12. [While loop](#While-loop)
   13. [Try statement](#Try-statement)
20. [Functions](#Functions)
    1.  [Docstrings and annotations - PEP 257](#Docstrings-and-annotations---PEP-257)
    2.  [lambda expression](#lambda-expression)
    3.  [Function Introspection](#Function-Introspection)
    4.  [\*args and **kwargs](#\*args-and-**kwargs)
    5.  [Parameters default](#Parameters-default)
    6.  [Map, Filter and Zip functions](#Map-Filter-and-Zip-functions)
    7.  [Reducing Functions](#Reducing-Functions)
    8.  [Partial functions](#Partial-functions)
    9.  [The operator module](#The-operator-module)
21. [Classes](#Classes)
    1. [Attributes of classes](#Attributes-of-classes)
    2. [Functions as instance attribute](#Functions-as-instance-attribute)
    3. [Class initialization](#Class-initialization)
    4. [Class Properties](#Class-Properties)
        1. [@property class](#@property-class)
    5. [Property Decorator](#Property-Decorator)
        1. [Read-Only and Computed Properties](#Read-Only-and-Computed-Properties)
    6. [Class and Static Methods](#Class-and-Static-Methods)
    7. [Class body scope](#Class-body-scope)
    8. [Polymorphism](#Polymorphism)
    9. [Special Methods](#Special-Methods)
        1. [\_\_str\_\_ method](#\_\_str\_\_-method)
        2. [\_\_repr\_\_ method](#\_\_str\_\_-method)
        3. [Arithmetic operators](#Arithmetic-operators)
        4. [Truth value](#Truth-value)
        5. [Callables](#Callables)
    10. [Single Inheritance](#Single-Inheritance)
        1. [instance vs type](#instance vs type)
        2. [The *object* class](#The *object* class)
        3. [Overriding](#Overriding)
        4. [Extending](#Extending)
        5. [Delegating to parent class](#Delegating to parent class)
        6. [Method Binding](#Method Binding)
        7. [Slots](#Slots)
22. [Descriptors](#Descriptors)
    1. [non-data descriptors](#non-data descriptors)
        1. [getter and setter](#getter and setter)
        2. [Storing instance properties](#Storing instance properties)
    2. [Strong and Week references](#Strong and Week references)
    3. [The set_name method](#The set_name method)
    4. [Properties are decriptors!](#Properties are decriptors!)
    5. [Functions implements the descriptor protocol!](#Functions implements the descriptor protocol!)
22. [Scopes and Namespaces](#Scopes-and-Namespaces)
    1.  [Masking](#Masking)
    2.  [Nonlocal scope](#Nonlocal-scope)
23. [Closure](#Closure)
    1.  [Shared extend scope](#Shared-extend-scope)
    2.  [Nested Closure](#Nested-Closure)
    3.  [Application](#Application)
24. [Decorators](#Decorators)
    1.  [Multiple decorators](#Multiple-decorators)
    2.  [Memoization](#Memoization)
    3.  [Parametrized decorators](#Parametrized-decorators)
    4.  [Decorator class](#Decorator-class)
    5.  [Monkey Patching and Decorating classes](#Monkey-Patching-and-Decorating-classes)
    6.  [Single Dispatch Generic Functions](#Single-Dispatch-Generic-Functions)
        1.  [Application - Htmlizer](#Application---Htmlizer)
25. [Python optimizations](#Python-optimizations)
    1.  [Interning](#Interning)
    2.  [Peephole](#Peephole)
26. [Common Modules](#Common-Modules)
    1. [string](#string)
    2. [functools](#functools)
    3. [itertools](#itertools)
    4. [collections](#collections)
    5. [random](#random)
    6. [timeit](#timeit)
    7. [argparser](#argparser)
27. [Tips and tricks](#Tips-and-tricks)

# PEP8 naming convention
---

- `packages`: short lowercase and without underscore es. `utilities`
- `modules`: short lowercase and with underscore es. `db_utils`
- `classes`: first letter of each word are uppercase, no spaces and no underscore es. `MyClass`
- `functions` lowercase and with underscore es. `open_account`
- `variables` lowercase and with underscore es. `account_id`
- `constants` all uppercase with underscore es `MIN_VAL`

back to [TOC](#TOC)

# Everything is an Object
---
In python everything is an object. Functions for example, inherit from the built-in function class; the same happen for classes which inherit from class function. This implies that every objects has a memory address (yes, even function and classes). In the same way every object che by assigned to a variable, passed as argument to a function or returned by a function. We can look at the object type of any variable with the `type` built-in function.

back to [TOC](#TOC)

# Modules
---

First lest define what is the `Namespace`: essentially it is a dictionary that contains all the reference in memory that are currently loaded into the python interpreter. It can be access with two different keywords, `globals()` to access the global namespace, and `locals()` to access the local namespace of, let's say, a function.

Like everything in python, also modules are object of the type module. When a module is imported, it get cached into memory (not in the namespace), and its memory reference is reported in the global namespace. 
Due to this, if we were to import the same module from two different scripts, the memory address would be the same across the scripts (we can think about it as a singleton object). We can look at the OS cache using the `sys.modules` which is a dictionary aswell.

The importing is done at runtime (i.e. while the python interpreter is already running) and not at compilation time like in C for example. The way python retrieve its modules is quite complex but to help understand it we can make use of the `sys` module which has some useful functionalities. For example we can look at where the current python execution ans the relative C binaries are with:

In [2]:
import sys
sys.prefix, sys.exec_prefix

('/usr', '/usr')

And we can see we are using the active virtual environment where both the installation and the C binaries are locate. As a matter of fact, modifying the `prefix` is the way python use to activate/deactivate a virtual environment.

But where python looks for import modules? There is a list of directory where python is looking, and these can be inspect with `sys.path`. If a module import fails, we can check if it is actually stored in one of the directory listed in path.

In [3]:
sys.path

['/home/fdifrison/Desktop/GitHub/PyFry-v1/Python',
 '/usr/lib/python310.zip',
 '/usr/lib/python3.10',
 '/usr/lib/python3.10/lib-dynload',
 '',
 '/home/fdifrison/.local/lib/python3.10/site-packages',
 '/usr/local/lib/python3.10/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/home/fdifrison/.local/lib/python3.10/site-packages/IPython/extensions',
 '/home/fdifrison/.ipython']

The operations that python does during the import of a modules are:
* checking if already exist in cache -> `sys.modules`
* if not, create a new modules type object -> `types.ModuleType`
* load the source code from file
* add the entry to sys.modules
* compile adn execute the source code -> N.B. the code in the imported module is executed!

So, the importing of a module seems to be quite straightforward, what is more complicated is how python actually find the module we want to load. We can simply saying that there are 3 constructure at play:
* finders
* loaders
* finder + loaders == importer

First the `finders` are question wheter or not they know anything about the module we are trying to import; the list of the available finders can be found like this:

In [4]:
sys.meta_path

[_frozen_importlib.BuiltinImporter,
 _frozen_importlib.FrozenImporter,
 _frozen_importlib_external.PathFinder,
 <six._SixMetaPathImporter at 0x7fa059cb57b0>,
 <pkg_resources.extern.VendorImporter at 0x7fa058a28580>]

If one of the importer know the module it will built a `Modulespec` and tell the `loader` to load it. (es. the module math, since is a builti-in module, is found by the `BuiltinImporter` finder)

In [5]:
import math
math.__spec__

ModuleSpec(name='math', loader=<class '_frozen_importlib.BuiltinImporter'>, origin='built-in')

We can find if a module is in the python path and look at its specs at the same time with the built-in module `importlib` :

In [6]:
import importlib
importlib.util.find_spec('math')

ModuleSpec(name='math', loader=<class '_frozen_importlib.BuiltinImporter'>, origin='built-in')

if `module_name` exist in `sys.path`, then its spec will be returned, if not we can solve the issue appending to the path list the directory where the module is found:

```py
import sys
sys.path.append('module_path')
```

If this has to become systematic, for example in a project where many paths have to added, the best way to procede is to compile a `.pth` file. For more information look at [https://docs.python.org/3/library/site.html](#https://docs.python.org/3/library/site.html)

## Some clarifications
We have seen that when a module is imported for the first time, if founded in `sys.path`, its address is added to `sys.module`. Instead, what goes inside the namespace `globals()` depends on how we import the module itself, wheter with an alias or not. For example:

In [7]:
import math

# in this way the module math is loaded into sys.module and the name `math` is added to the namespace globals()
# both the `math` names point to the same address

import math as math_alias

# in this way the module math is loaded into sys.module but name `math` is not added to the namespace globals()
# instead we found the name `math_alias` which point to the same address associated to `math` in sys.module

from math import sqrt

# in this way the module math is loaded into sys.module but in the namespace globals() we found only `sqrt`
# that points to the function `math.sqrt`

from math import *
# in this way the module math is loaded into sys.module and the name of every function inside the math module is added to the namespace globals()


N.B. if a name is already present in the namespace and we import something that has the same name, it gets overwritten. This is why it is not recommended to use the `import *` unless we are fully aware of every name we are importing and that there is no conflict between different module's names.

N.B.B some things that using an explicit import of a function in a module like `from math import sqrt` is more lightweight; this is essentially not true because the entire math module is loaded in the `sys.module`, the only thing that change is that only the mane `sqrt` is added to the namespace. There is a very small advantage in calling the function because `sqrt(2)` has one less dictionary look-up that `math.sqrt(2)`, but since a dict-lookup is super fast the difference in efficiency is very very small. Therefore, in this case, do things for READABILITY and not EFFICIENCY.

## Reload a module
If, for some reason, we want to reload a module, it is not sufficient to repeat the `import` statement, because the module name is already in the `sys.module`, and when the loader finds it it will skips its reloading. Neither using `del module_name` is enough because we are only deleting the name reference from the namespace. What we need to do is to delete the memory reference in the `sys.module` dictionary: `del sys.module['module_name']` . Now if we re-import the module, we are creating a new object, with a new `id`.

As an alternative, if we want to reload the module without destroy and recreate the module object, we can use the `importlib.reload(module_name)` function, which will reload the module keeping the same memory id both for the `sys.module` and the namespace. However, care must be taken when reloading modules, for example, if we are loading only a specific function of a module to the namespace, let's say `from math import sqrt` we can directly reload the module since we don't have the name `math` in the namespace, but we'll have to refer to the sys.module reference -> `imporlib.reload(sys.module[math])`. However, even if `math` has been now reloaded, the same is not happened for the `sqrt` function. To do taht we would have needed to do something like `sqrt = sys.module[math].sqrt`. So be aware! reloading is not a safe process, isn't something we want to do in a production environment!

From more references on modules and module import look at :ù
* `PEP 302`
* [https://docs.python.org/3/tutorial/modules.html](#https://docs.python.org/3/tutorial/modules.html)
* [https://docs.python.org/3/reference/import.html](#https://docs.python.org/3/reference/import.html)

back to [TOC](#TOC)

# Packages
---

Packages are a collection of modules and possibly sub-packages that usually have some kind of specialized scope. The substantial difference that can tell us if a module is a package is the presence of a non-empty `__path__` attribute.


The 99% of the packages are file-based, therefore structured into directories in the file system. In particular we have that the `directory name` became the `package-name` and the directory needs to contain the code somewhere (since a package is also a module but not vice-versa). In fact, the code goes inside the `__init__.py` file inside the directory, and the pair `directory + __ini__.py` compose a `module`. Substantially, when python finds inside a folder a `__init__.py` file it knows that it is looking at a `package` and not a standard directory. If not, python create an `implicit namespace package` that allow us to navigate trough folders inside the package itself.

Now lets look at a typical app folder structure containing modules, package and sub-packages:

```
app/

    module_1.py

    pack_1/
        `__init__.py`
        module_1_a.py
        
        pack_1_1/
            `__init__.py`
            module_1_1_a.py   
```

Imagine we are executing our python interpreter inside the `app` folder. We can now simply say:
* `import module_1` : import whats inside module_1.py. If we try to look at `module_1.__path__` we will find that it is empty `''` since module_1 is a module and not a package;
* `import pack_1` : import the package `pack1` and loads what inside the `__init__.py`. Now the `pack1.__path__` is not empty since pack1 is a package. Also, python has added `pack1` has been added to both the `sys.modules` and `globals()` (namespace)
* `import pack_1_1`: result in an ERROR since the python finder can't navigate up to this package; instead we need to:
  * `import pack_1.pack_1_1` to be able to access the nested package. N.B. now `pack_1.pack_1_1` is stored inside `sys.modules` but not in `globals()` where only the root package is stored (`pack_1`); this because we are anyway always gonna refer it as `pack_1.pack_1_1` in our code.
  * `from pack_1 import pack_1_1`: in this way we are storing `pack_1_1` in `globals()` but it is only a placeholder since it is actually pointing to the same objects as `pack_1.pack_1_1` (the `id(pack_1_1 == id(sys.modules['pack_1.pack_1_1']`)

## Package structure

When creating a package we need to keep in mind two quite opposite point of view: the `developer` of the package, which needs a good breakdown of the package's functionality to have a better modularity, better debugging, reading etc.. and the `user` which instead only needs to access the functionality coded into the package itself. Therefore, care is needed when structuring the package, and a good use of the `__init__.py` files has a key role. 

Looking at the previous dummy example, if the user need to access something inside module_1_1_a.py, the import would be something like `import pack_1.pack_1_1.module_1_1_a` i.e. something very tedious and inefficient. If instead the `__init__.py` inside `pack_1` contains directly something like `from pack_1_1 import *`, the user would be able to access everything just doing `import pack_1` because when the package is imported also the `__init__.py` is executed and added to the namespace. 

However, more then often, using a `*` import is something that we want to avoid, since inside a package there will be many functions/classes that are needed only by the developer and not the user (it might be even 'dangerous' if the user is able to access them). To avoid this there are essentially two methods:

* name the 'private' (developer side) functions with an underscore in front (es. `def _func_only_for_dev()`) since the `*` import will avoid those
* specify inside each module the list -> `__all__` = ['names_i_want_to_export_from_a_module'], i.e. the list of all the objects we want to be imported with `*` import

## Package Namespace PEP 420
Packages Namespace are essentially packages without a `__init__.py` file. They have the advantage that can be spread wherever in the file system (even in a zip file) but the import of sub-modules/packages can't be flatten (we don't have a `__init__.py` to leverage).

back to [TOC](#TOC)

# Variables and Memory
---
When a variable is created, what python is doing under the hood is to link the variable name to the memory slot (slots) which contains the element assign to the variable. Therefore the name is nothing more than a reference to the memory slot.

## `id` function
`id` is the function that returns the memory address of a variable in base-10 ( can be converted with `hex` to see the hexadecimal representation).

## Reference Counting
Reference counting is a process carried out by the python memory manager internally. Each time we create a new variable, we are creating a reference to a memory slot. If we create a new variable that is equal to an existing one, we are adding a reference to the same memory slot (which now has a reference count equal to two).

In [8]:
import ctypes
import sys

my_var_1 = list() # my_var is pointing to the memory slot id(my_var)

other_var = my_var_1  # other var is pointing to the same memory slot of my_var
# at this point, the ref count of id(my_var) is equal to 2

print('sys.refcount returns:' , sys.getrefcount(my_var_1))
# return the ref count of the variable + 1 far the call of sys itself

print('ctypes returns:' , ctypes.c_long.from_address(id(my_var_1)).value)
# is a lower level way to find the ref count of a memory slot

sys.refcount returns: 3
ctypes returns: 2


## Shared Reference
When python variables share memory references?

In [9]:
a = 10
b = a
# b is not copying the content of a, it is pointing to the same memory address

a, b, a == b, a is b

(10, 10, True, True)

In [10]:

a = 10
b = 10
# since the number 10 is immutable, both a and b are pointing to the same memory address

a, b, a == b, a is b

(10, 10, True, True)

In [11]:

a = [1,2,3]
b = a
b.append(4)
# now both b and a are equal to [1,2,3,4] since a list is mutable object and appending an element modify only its internal state with the same memory address

a, b, a == b, a is b

([1, 2, 3, 4], [1, 2, 3, 4], True, True)

In [12]:

a = [1,2,3]
b = [1,2,3]
# in this case python doesn't create shared references, so a and b are pointing to different objects, this to prevent that modifying b affects also a

a, b, a == b, a is b

([1, 2, 3], [1, 2, 3], True, False)

N.b. There will be always a shared reference to `None` object, created automatically by python.

## Garbage Collection
It is the way python use to avoid memory leaks such that generated by circular references (objects pointing one to the other). Garbage collection can be controlled using the `gc` method. It can be turned off (only if we are super-sure that there are not circular reference in the code, in order to improve performance). The gc runs periodically on its own but can also be called manually to program a specific cleanup of the code.

## Object Mutability
Changing the data inside an object is called `modifying the internal state` since the memory address is not changed but only its content (es. appending an element to a list). So we can distinguish between `Mutable` and `Immutable` object depending on the possibility of changing the internal state.

`Immutable objects:`

- Numbers
- Strings
- Tuples (if contains mutable elements, es. lists, those remain mutable)
- Frozen Sets
- User_Defined Classes (if so defined)

`Mutable objects:`

- Lists
- Sets
- Dictionaries
- User_Defined Classes (if so defined)

Care must be taken when we talk about immutability of an object that is given to a function as an argument. We have to distinguish between the `module scope` and the `function scope`.
When we pass an object to a function we are in reality passing the `reference` of the object itself. So if we are passing to a function an immutable object, say a string, at the beginning both the module scope and the function scope point to the same memory reference, but as soon as the function modify the string (es. concatenating another string), then a new object with a new reference is created. If the object is mutable, say a list, and the function modify the list (es. appending an element), then python doesn't create a new object but simply modify the internal state of the existing memory reference

In [13]:
# IMMUTABLE
def process_str(s):
  # s has still the same memory reference as my_string
  s = 'hello' + s
  # now s has been modified, and since it was an immutable object, a new object with a new reference is created.
  return s

my_string = 'world'

my_string, process_str(my_string), id(my_string), id(process_str(my_string))

('world', 'helloworld', 140326588266544, 140326588259184)

In [14]:
# MUTABLE
def process_lst(lst):
  # lst has the same memory reference as my_list
  lst.append(5)
  # since lst is a mutable object, only the internal state is changed but the memory reference is still the same.
  return lst

my_list = [1,2,3]

my_list, process_lst(my_list), id(my_list), id(process_lst(my_list))

([1, 2, 3, 5, 5], [1, 2, 3, 5, 5], 140326588253184, 140326588253184)

## Variable Equality
There are two ways to verify the equality of two variables in python: the `is` and the `==` operators, respectively the identity and equality operators. While the identity operator compares the memory reference of two objects, the equality operator compares their internal state (data). Their negation are `is not` and `!=`.

In [15]:
a = 10
b = a
print(a is b) # True since the memory address is the same (int are immutable objects)
print(a == b) # True

True
True


In [16]:
a = 500
b = 500

print(a is b) # False, preloaded integers are in the range [-5, 256] see Interning
print(a == b) # True

False
True


In [17]:
a = [1,2,3]
b = [1,2,3]

print(a is b) # False, different memory address
print(a == b) # True

False
True


In [18]:
a = 'hello'
b = 'hello'

print(a is b) # True, but not always! only if strings are stored as Singleton
print(a == b) # True

True
True


In [19]:
a = 10.0
b = 10

print(a is b) # False, float and int are different objects
print(a == b) # True, python recognize the have the same value

False
True


back to [TOC](#TOC)

# Built-in methods
---

## `isinstance(object, class)`
Return `True` if an object is an instance of a particular class, `False` otherwise.

## `issubclass(subclass, class)`
Return `True` if class inherits from another upper class, `False` otherwise.


back to [TOC](#TOC)

# Numeric Types
---

## Integers
Integer are represented internally using base-2 digits (binary representation)

```
# es. binary representation of 19

 0   0   0   1   0   0   1   1 
--- --- --- --- --- --- --- ---   -> max num of bits = 8
2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0

1x2^4 + 0x2^3 + 0x2^2 + 1*2^1 + 1*2^0 = 16 + 2 + 1 = 19

(10011)base_2 = (19)base_10

To represent the number 19 are required 5 digits, hence 5 bits.
```

But which is the largest number we can store depending on the number of bits we want to store? It depends whether or not we care about negative values, since in order to store the sign we have to allocate one bit.
The general formula is:


```py
max_unsigned_digit = 2^n -1 # where n is the number of bits
max_signed_digits = [-2^(n-1), 2^(n-1)-1]
```


Side-Note: a 32 bit Operative system can store 2^32 unsigned integers (roughly 4Gb) and this limits also the number of memory address that can be stored at the same time. This is why having more than 4Gb of ram on a 32 bit OS is essentially useless since the machine can't store more than 4 Gb at the same time.

### Operations
How mod `//` (floor division) and div `%` (modulo) operators works in python? mod returns the floor division (rounded to the smaller integer) while div return the remainder. They have to satisfy the following equation:



```py
n = d * (n // d) + (n % d) # where n is the numerator and d is the denominator

#n.b. the `floor` of a real number `a` is the largest integer `<= a` 
floor(3.14) = 3
floor(-3.1) = 3
```


### Base
We can create an `int` object by calling the `int()` constructor; this has an optional parameter that is the base that python has to use to translate the argument (it may also be a string). The default values is base=10, since it is the way we are use to read numbers (while machines works in binary, so base=2). If the base is greater then 10 that the numbers start to be encoded with letters ( base[0, 10] = [0, 10], base[11, 27] = [A, Z])

Python has some built-in function to translate the most common base like `bin()` for binary `hex()` for hexadecimal.



```py
bin(10) = 0b1010 # the 0b is telling us that the base 2 (binary)
```


## Rational Numbers
Rational numbers are those number which are not integer and can be represented with a finite number of digits or translated into a fraction of rational numbers. The module `fraction` can be used to represent rational numbers, since the float representation can be misleading due to machine precision.


```py
from fraction import Fraction

Fraction(1,2) #(numerator, denominator)
Fraction('0.125')
Fraction(22/7)

# CAVEAT
'''
Some numbers can't have a finite representation due to machine precision. For example 0.3 it is actually an approximation. The problem is that we have to look at something like the 20th decimal position in order to realize that this is the case. If we pass this number to Fraction() we would imagine to receive 3/10 as output; instead we would get a fraction of very huge numbers that best approximate that imprecision in the machine representation of 0.3 (0.2999999999999999999998977..)
'''
```


## Floats (Real Numbers)
In CPython floats are implemented as `C double` which implements the `binary64` (IEEE 754).
Floats use a fixed size of 64 bits divide as follow:

- sign -> 1 bit
- exponent (in the range[-1022, 1023]) -> 11bit
- significant digits -> 52 bit (15-17 significant digit in base_10)

To have a precise representation of real numbers (since float may be effected by machine precision), we can use the `decimal` module.

```
# decimal representation of a real number
123.45 = 1*10^3 + 2*10^1 + 3*10^0 + 4*10^-1 + 5*10^-2 
```

### equality
Care must be taken when looking at the equality of floats since there are some decimal numbers that cannot be represented by a finite binary representation:

In [20]:
# base_10(0.1) = base_2(0.0 0011 0011 0011 ...) 
# therefore
x = 0.1 + 0.1 + 0.1 
y = 0.3 

print(f'{x:.20f}')
print(f'{y:.20f}') 
x == y 

0.30000000000000004441
0.29999999999999998890


False

One workaround is to set a range delta (es. a percentage of the size of the larger number involved in the equality operation) as discriminant values to determine if two numbers are equal:

```
|a - b| < epsilon
```

The pythonic way to approach this problem is to use the module `math.isclose` with the care of specifying appropriate relative and absolute tolerance:

In [21]:
import math

# math.isclose(x, y, rel_tol, abs_tol)
math.isclose(x, y)

True

## Booleans PEP 285
Booleans are a subclass of the int class (i.e. it inherits all its methods). Two constant are defined: `True` (int = 1) and `False` (int = 0). They are Singleton objects, i.e. the point to a fixed address in memory and can be compared with the identity and the equality operator aswell. N.b. even if True and False evaluates to the int 1 and 0 respectively, they don't point ot the same memory address, since they are not the same type of object:


In [22]:
int(True) == 1 and int(False) == 0

True

In [23]:
id(True) != id(1)

True

In [24]:
id(False) != id(0)

True

Objects have an associated `truth value`, meaning that they have a defined truth state. In particular, every object will evaluate to `True` by default except for:

- None
- False
- 0 (in any numeric type, float, complex ..)
- empty sequence (list, tuple string..)
- empty mapping (dictionary, sets..)
- implementing __bool__ or __len__ in a custom class

Has a matter of fact, when we call `bool()` on an object, it will look for the definition of the dunder method `__bool__`, if this is not defined, then it will look for the `__len__` method and if also this is not defined the it will evaluate to `True`.

In [25]:
# es. __bool__ implementation for the int class
def __bool__(self):
  return self != 0

### Booleans operators

```
# Truth Table
X Y  not X  X and Y  X or Y
0 0    1       0       0
0 1    1       0       1
1 0    0       0       1 
1 1    0       1       1
```

```
# De Morgan's Theorem
not(A or B) == (not A) and (not B)
not(A and B) == (not A) or (not B)

# Operations precedence (in descending order)
< > <= >= == != in is
not
and
or
```

### Short-Circuiting
Looking at a truth table there are two case in which the program can simply is job evaluating only part of a boolean statement. Thi is called `short-circuiting`:

In [26]:
True or Y # -> True 
# with an or statement, id the first argument is True it doesn't matter whether the second argument is True or False, the operation will always evaluate to True

True

In [27]:
False and Y # -> False
# with an and statement, id the first argument is False it doesn't matter whether the second argument is True or False, the operation will always evaluate to False

False

This is very useful when we have to concatenate two conditions together, one of the which may results in rising an exception (and breaking the code) if evaluated. With short-circuiting we can add a first statement that check for a particular exception that may rise with second argument.

In [28]:
my_string = 'ciao'
if 'a' == my_string[0]:
  pass
# we are checking if 'a' is the first letter of my_string. But what happen if my_string is empty? the code breaks. We can solve this with short-circuiting the first


In [29]:
my_string = ''
if 'a' == my_string[0]:
    pass

IndexError: string index out of range

In [None]:
if my_string and 'a' == my_string[0]:
  pass
# Since an "and" expression will evaluate to True only if the two members are True, we can safeguard our code from breaking checking first if my_string evaluates to False (i.e. if is an empty string);
# if it is so, the second part of the "and" statement won't be executed thus safeguarding our code of breaking due to IndexError exception.

back to [TOC](#TOC)

# Iterable and Iterators
---
Iterables are, in essence, containers that can be iterated, nothing more. For example, sequences are particular iterables upon which we can iterate on an index-base, but this is not always the case for an iterbale. An iterable only need the concept of `next` item, meaning that it should return another element from the container, without any ordering implied. Moreover iterators needs to keep track of the elements that have been handed out, since we don't want the same element twice, and a stopping creterion is needed to tell us when the container is exhausted and no more elements are available. Other features are: having a finite number of elements in the container, the possibility to "start from the beginning" i.e. reuse the iterable if needed, the use of list comprehension etc..

Python has a special method called `next()` that lives under `__next__` and fit exaclty the purpose of iteraring over an iterable, i.e handing out a new elemnt each time it is called on an iterable. We also have a built-in exception that is made to check if the iterable is exhausted, this is `StopIteration`.

Let's imagine we want to create our custom iterable class, how can we tell Python how to behave properly on the `__next__` method? how to not exhaust our class instance once we have iterated over all the elements? 
To answer this we need to use a **`protocol`** i.e. a way to tell to the python interpreter that our class has to implement certain functionality.

In particular, for an `iterator` we need a `iterator protocol` that requires the class to have 2 methods:
* `__iter__` a method that should only return the class instance (why???)
* `__next__` to handle the elements return and the eventual raise of StopIteration

If our class satify this prerequisite than we have an **`iterator`** upon which we can use for loops, list comprehension etc.. except for the reusability of the iterable, once it is consumed it has to be reinstaciated.

At the end, the solution to our problem is to have two distinct objects: the `iterable` that is the collection of the elements, it never get exhausted, it is just a container; and the `iterator`, a copy of the iterable object that is responsible for iterating over the elements. the iterable is created once while the iterator is created any time we want to restart the iterating process. This is a good deal form amny reasons, for example we may have a use set of data stored in our iteable and we don't want to reload them each time we need to restart the iteration!

As a formal distinction, an `iterable` is a python object that implement an `iterable protocol` that only requires the definition of the `__iter__` method which will return a new instance of the iterator each time is called:

```py
class Iterator
    def __init__(self):
        pass
        
    def __iter__(self): # return itself
        return self
        
    def __next__(self):
        # criterion for StopIteration

class Iterable:
    def __init__(self):
        pass
        
    def __iter__(self): # return the iterator
        return Iterator(self)
```

Basically, we are not iterating over the instance of the iterable object but over the iterator generated by the iterable itself! 

N.B. when we ask python to iterate over an object it will first look for the `__iter__` method and only after for the `__getitem__` method. This means that if we implement both in our custom class, during iteration python will use the former.

Python has different built-in (lazy) iterable and iterators and it is fundamental to know who is what now that we know the difference between the two constructs:

* `range()` -> iterable
* `dict.keys()` / `values()` / `items()` -> iterable
* `zip()` -> iterator
* `enumerate()` -> iterator
* `open()` -> iterator

## Consume iterators manually
Imagine we need to parse a csv file  where the first two lines are the header and the data types. We could essentially do a for loop with nested if condition to split the header and the data type from the actual data or we could apply what just learned about iterators. Essentially we want to create and iterator from the iterable (the file to be parsed) and assign directly the headers and the data types to a variable and performe a for loop only on the data rows.

```py
with open(data_file.csv) as file:
    file_iter = iter(file)
    header = next(file_iter) # first row
    data_types = next(file_iter) # second row
    data = [row for row in file_iter] # from the third row and on
```

The results is a very clean and efficient way to parse the file

## Lazy Iterables
Let's first define what is a **Lazy Evaluation**: it is often referred to class properties that are not directly evaluated when the instance of the class is created but are computed, and becomes available, only when the propety is requested. Once the propery is requested for the first time, its value is cached in the instance state, therefore it doesn't need to be computed again.

In [None]:
from math import pi

class Circle:
    def __init__(self, r):
        self.radius = r # using the setter
        self._area = None # Lazy area
        
    @property
    def radius(self):
        return self._radius
    
    @radius.setter
    def radius(self, r):
        self._radius = r
        self._area = None # reset area when the radius is changed
        
    @property
    def area(self):
        if self._area is None:
            print('Calculating area')
            self._area = pi*self.radius**2
            #return self._area
    
        return self._area # if area was not None we already have the cached value   

In [None]:
c1 = Circle(2)

In [None]:
# if called again
c1.area

In [None]:
# Now the result is cached
c1.area

In [None]:
# if the radius is changed
c1.radius = 3
c1.area

This concept can be applied also to iterables, and in fact it is something used very often in python `generators`; essentially, the element is returned only when `next()` is called. Fro example we can create an infinte lazy iterable that will compute factorial only when requested.

In [None]:
from math import factorial

class Factorial:
    def __iter__(self):
        return self.FactIter()
    
    class FactIter:
        def __init__(self):
            self.i = 0 # initializer
            
        def __iter__(self):
            return self
        
        def __next__(self):
            f = factorial(self.i)
            self.i += 1
            return f

In [None]:
f = iter(Factorial())
for _ in range(10):
    print(next(f))

Looking at the code we can se that the iterable has almost nothing to do, all the computation is carried out in the iterator class. `enumerate()` and `zip()` are built-in lazy iterator of the python stack.

## iter() method
What happens when we call `iter(obj)`? Python essentialy first look for an `__iter__` method defined in the object class, if not, it look for the `__getitem__` method where basically the iterator is created over a sequence that is iterated with `next` as if it were a while loop, catching the exception for `IndexError` and raising the `StopIteration`. Folloqiong an example of a Sequence-Iterator class (something very similar to what python does when we call `iter()` on an object that has only `__getitem__` defined.

In [None]:
class SeqIter:
    def __init__(self, seq):
        self.seq = seq
        self.index = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        try:
            item = self.seq[self.index]
            self.index += 1
            return item
        except IndexError:
            raise StopIteration

### Iter() with callables
There is a second form of the `iter()` method that is useful to iterate over callables. In the form `iter(callable, sentinel)` we can specify a sentinel argument wich is the criterion to call the StopIteration on the iterable. Of course, if the sentinel value is never met, we'll generate an infinite iterable.

In [None]:
def call():
    i = 0
    
    def inner():
        nonlocal i
        i += 1
        return i
    return inner

inner = call()
iter_call = iter(inner, 5) # set the sentinel value to 5

for c in iter_call:
    print(c)

## Delegating Iterators
Let's imagine we have a class that is storying a list of elements in a pseudo-private variable (i.e. beginning with \_). Once an instance of the class is created, unless specifically implemented, we cannot iterate over this list (unless we are aware of the private variable, but aniway shouldn't be the case). On option could be to add the capability to the class to generate an iterator out of that list, but it is code that we dont want to handle directly. What we can do is to `delegate` the generation of the iterator to the `iter` method itself; in this way we only need to implement the `__iter__` method in our class. As a matter of fact, in real practice we will often not create a custom iterable, unless we need specific functionality, but we'll leverage the `iter()` method, delegating to it theduty of returning an iterator.

In [None]:
class Delegate:
    def __init__(self, some_list):
        self._somelist = some_list
        # do something to this list
        
    def __iter__(self):
        return iter(self._somelist)
    
# since some_list is already an iterable (a list) we can create easily an iterator with iter().
# In this way the class Delegate is simply trasformed in an iterable!

## Reversed Iteration
It may happen to need to iterate over an iterable in reversed order, and what we usually do is to use a for loop and slicing like:
```py
for i in lst[::-1]:
    print(i)
```
In this way we are creating a copy of the list, maybe only to iterate over some few elements in a huge sequence! A quite bad looking sintax solution would be to iterate backwards knowing the length of the sequence:
```py
for i in range(len(lst)):
    print(lst[-i-1])
```
However, the best approach, both in terms of efficiency and readability, is tu tranform the list in a `reversed iterator`:

In [None]:
seq = [i for i in range(5)]
for s in reversed(seq):
    print(s)

the method `reversed()` is creating an iterator of the iterable seq, therefore we are not creating a copy of the sequence. What Python does behinde the scene is to call the `__reversed__` method, and this is it , at least for sequence type.

However, the same does not apply if we want to reverse iterate over a general iterable that doesn't support indexing. In this case we need the `__reversed__` method to return an iterator. As for the `iter()` method, python will first search for the `__reversed__` method, and if don't find it it wil look for both the `__getitem__` and `__len__` methods (basically to do what we have shown before with the for loop).

## Caveats: usign iterators as function arguments
Due to the fact that we can only iterate once over an iterator before it is exhausted, we need to be careful to use it as a function argument. 

Let's immagine that we created a class that returns an iterator consisting in a list of numbers: now suppose we need to find the max and the min betweeen these. If we call, for example the `min()` method we will get the minimum values, but to get this python had to iterate over the iterator exhausting it! Therefore, if now we ask for `max()` we'll incurr in a `ValueError` since our iterator has reached the `StopIteration` condition with the `min()` method and now it is nothing more than an empty sequence!

back to [TOC](#TOC)

# Generators
---
Generators are functions that contain at least one `yield` statement, are essentially iterator (iterato protocol implemented), and are inherently lazy.

Let's imagine we want to create a function that can be stopped during execution (like a for loop that is stopped at each iteration) so that we can handle the data to do something else before the function is exhausted. In python there is a particular keyword called `yield` which emits a value and pause the function execution; than it can be resumed calling `next`. Once the function execution is finished, a `StopIteration` exception is raised.

In [None]:
# es.
def using_yield():
    print('Name')
    yield 'Giovanni'
    print('Surname')
    yield 'Frison'   
    
test = using_yield()
test

As we can see, `test` is not a function but a `generator object` upon which we can call the `next()` method to execute the body until a `yield` statement is reached.

In [None]:
next(test)

In [None]:
next(test)

In [None]:
next(test)

As for an iterator, when the function body is exahusted, i.e. no yield statment remains, the `StopIteration` is raised.

A function that use the `yield` statment in his body is called `generator function` and is essentially a `generator factory`. The `using_yield` function above is just a regular function, but since it contains a `yield` in the body, python compiler knows that it is a generator function/factory, and each time it is called a generator is created. In this way we are able to execute the function piece-wise, one `yield` statement at the time, exiting and re-entering the fucntion in different part of our code, calling `next` on the generator function.

The generator behave like an iterator because it is an iterator! it has the iterator protocol implemented (`__iter__` and `__next__` methods). As a matter of fact, generatos are powerful tools to create iterator in a very effective way. Es. looking at the Factorial implementation that can be found in [Lazy Iterables](#Lazy-Iterables) section, we can easily re-implementing it using a yield statement:

In [None]:
import math

def factorials(n):
    for i in range(n):
        yield math.factorial(i)

In [None]:
for f in factorials(5):
    print(f)

## Iterables from generators
Being an iterator, a genrator has the same caveat, meaning that once it is exhausted it has no use and sometimes this can lead to unwanted bugs in our code. However, there is a solution to this, we could make an iterable from our generator simply by defining an iterable class that in the `__iter__` method returns not `self` but an instance of our generator. Let's look at an example:

In [None]:
def square_gen(n):
    for i in range(n):
        yield i ** 2

sq = square_gen(4)
for s in sq:
    print(s)

So, we created a generator factory (`square_gen`) and a generator from it (`sq`) that it now exhausted due to the for loop.

In [None]:
list(sq)

However, we can define a custom iterable ourself (implementing the iterable protocol).

In [None]:
class Square:
    def __init__(self, n:int):
        self.n : int = n
            
    # iterable protocol
    def __iter__(self):
        # we need to retunr an iterator or in this case 
        # our generato function
        return Square.square_gen(self.n) 
    
    @staticmethod # it doesn't use any of the class properties
    def square_gen(n):
        for i in range(n):
            yield i ** 2   
    

Now we can call `sq` as many time as we want since it time is automatically returning a new instance of the generator function `square_gen`.

In [None]:
sq = Square(4)
for s in sq:
    print(s)

In [None]:
list(sq)

## Generator expressions
Using the same sintax of list comprehension we can create generator expression, the only difference is in that generators wants round brackets `()`. The advantage of generator expressions is that they are ineherently `lazy`, meaning that the expressions are not evaluated until requested by the `next()` function

In [None]:
lst = [i ** 2 for i in range(5)] # iterable
gen = (i **2 for i in range(5)) # iterator

lst, gen

comparing list and generator comprehension we have:
* list takes longer to create since they have to evaluate each expression, while generator are returned immediately
* list iteration is faster since the object has already been created

Therefore, we canc conclude that:
* if we need to iterate over all the elements, the  time performance is almost the same, but generators occupy less space since once iterated are destroyed
* If we need to iterate only on some elements then generators are the way to go, since we don't compute all the unwanted iterations

## Yield from
In it easiest application, `Yield from` is just a way to replace a for loop over an iterable inside a generator expression. Let's take for example:

In [None]:
def loop_on_nested_generator(n):
    gen = ((i*j for i in range(1,n+1)) for j in range(1,n+1))
    for row in gen:
        for item in row:
            yield item

In [None]:
loop = loop_on_nested_generator(2)
for l in loop:
    print(l)

Instead we can sustitute the last for loop with a `yield from` an achieve the same result.

In [None]:
def yield_from_nested_generator(n):
    gen = ((i*j for i in range(1,n+1)) for j in range(1,n+1))
    for row in gen:
        yield from row

In [None]:
loop = yield_from_nested_generator(2)
for l in loop:
    print(l)

back to [TOC](#TOC)

# Sequence Type
---
In math terms a sequence is nothing more than a countable group of items that have a positional ordering, meaning that each element can be accessed by an index representing its position. In Python, a `list` is a sequence type while a `set` is not since it doesn't have a positional order.
A sequence can be `mutable` (list, bytearrays) or `immutable` (string, tuples, range, bytes). 
Again, sequence can be `homogeneous`, if they held elements of the same type (like strings) or `heterogenuos` (lists). 
A sequence is also an `iterable type` since we can reach each element one-by-one hence iterating over the sequence (n.b. an iterbale is not always a sequence, set is an example).

Common methods on sequence are:
* `in` and `not in`
* `+` for concatenation (not for range type)
* `* int` for repetition (not for range type)
* len() to retrieve the length of the sequence
* index(x) to retrieve the occurence of element x



## Mutating sequence

With mutation in Python we refer to the change in the internal state of a mutable object without modifying it memory address. For example list concatenation is not a mutation since a new object is created while usign the method `append` is:

In [None]:
# concatenation
l = ['Giovanni']
l_id = id(l)
l = l + ['Frison']
id(l) == l_id

In [None]:
# appending
l = ['Giovanni']
l_id = id(l)
l.append('Frison')
id(l) == l_id

Other mutation can be achieved with:
* slicing assignation -> `s[i] = x`
* delete of elements -> `del s[i]`
* removing all the objects in the container -> `s.clear()`
* inserting elements -> `s.insert(i, val)`
* extend with another iterable -> `s.extend(iterable)`
* pop (return and remove the element at index i) -> `s.pop(i)`
* remove the first occurrence of x -> `s.remove(i)`
* reverse in-place -> `s.reverse()`
* and many more ..

N.B. not all the sequence type must have this methods, in particular if custom made by us

### in-place concatenation and repetition
Sequence can be concatenate or repeted using the `+` or the `*`. If We do it in the stadard way we obtain a new object:

In [None]:
l1 = [1,2,3]
l2 = [4,5,6]
l1_prev_id = id(l1) 
l1 = l1 + l2

id(l1) == l1_prev_id

however, if the sequence is mutable, we use inplace concatenation `+=` or repetition `*=` we are mutating the object, therefore the `id` remain the same. If the sequence is immutable, like a tuple, a new object will be created anyway.

In [None]:
l1 = [1,2,3]
l2 = [4,5,6]
l1_prev_id = id(l1) 
l1 += l2

id(l1) == l1_prev_id

### Mutation by assignment
Some mutable sequence, like lists, support the assignment via index, meaning that we can replace elements by assigning new values to the list. This works also with slices provided that the we provide as substituing value an iterable, and it doesn't even need to be of the same length of the slice we are replacing! We can aslo have stepwise slices as a replacement, but in that case the length of the iterable must match the number of element selected. In the same way we can delete elements just by replacing a slice with an empty list. At last, usign a trick we are also able to insert an iterable inside a sequence: first we need to create an empy assignation to  a slice on the same index e.g. `l[1:1] = []`, and then we can assign an iterable to that slice.

In [None]:
# replacing a slice with as many elements as we want
l = [1,2,3,4,5,6,7,8,9,10]
l[:2] = ['a', 'b', 'c', 'd'] # place 4 elements instead of the first 2
l

In [None]:
# replacing a stepwise slice with the same number of element selected
l = [1,2,3,4,5,6,7,8,9,10]
l[::2] = ['a', 'b', 'c', 'd', 'e'] # 5 element change with 5 element
l

In [None]:
# delete elements
l = [1,2,3,4,5,6,7,8,9,10]
l[:5] = []
l

In [None]:
# insert elements
l = [1,2,3,4,5,6,7,8,9,10]
l[1:1] = []
l[1:1] = [1,2,3]
l

### Never return a mutated object
If we write a plain function it is best practice not to modify the element we are passing as argument but return a modified copy of it. So called `in-place methods` are generally bouded to classes.

In [None]:
# what we should do
def reverse(s):
    s.reverse()

# what we shoudn't do
def reverse(s):
    s.reverse()
    return s

s1 = [1,2,3]
s2 = reverse(s1)

s1, s2

## Copying Sequences

While copying immutable sequence is in general a safe procedure, the same cannot be stated about mutable ones. A trivial example comes from care concatenation and repetition, because the repeting/concatenating will create a copy of the object and if we then modify one of the copy, the same happen to the other.

### Shallow copies
We created a function that apply an in-place method to the argument `s` of the function and then `return s`. The user then expect to use that return and assign the function call to a variable `s2` thinking that he created a new object while instead `s2` is now poiting to the same object of `s1` that as been modified as well. What we should do is **not** returning the function.

Even better, it would be not to do in-place modification to our objects, but create a copy first and than pass it to the function that modifies it. To copy a sequence, or any objects, there are a variety of ways, some more pythonic then others:

In [None]:
# Ways of coopying a list creting a new object but leaving the same
# memory reference to the elements inside the list

s = [1, 2, 3]

# 1. simple loop (horrible)
cp = []
for i in s:
    cp.append(i)
    
# 2. list comprehension
cp = [i for i in s]

# 3. copy method (not for immutable sequence like tuple or strings)
cp = s.copy()

# 4. slicing (with tuple the same element is returned)
cp = s[:len(s)]

# 5. list method (with tuple the same element is returned)
cp = list(s)


# therefore we have
s == cp,  s is cp,  [s[i] is cp[i] for i in range(len(s))]


What we have performed are called `shallow copies`, i.e. we have created a copy only of the sequence object but the elements inside have the same memory reference. If these elements are immutable objects than the copy is safe, meaning that we can modifying it without affecting the source. However, if the sequence contains mutable objects then a shallow copy may not be enough. Lets see an example:

In [None]:
l = [[1, 2], [3, 4]]
l

Now `l` is a sequence that contains 2 mutable objects. We can create a copy and try to sustitute one of its element and see what happens:

In [None]:
cp = l.copy()
cp[0] = 'python'
cp is l, cp, l

The copy `cp` is actually a new object, and when we say `cp[0] = 'python'` we are actually mutating this object, without affecting the original list `l`. but what happens if we try to modify the mutable objects inside `cp` (which share the same memory reference with the one contained in `l`)?

In [None]:
cp = l.copy()
cp[0][0] = 100
cp, l

As we can see noth the inner list in `cp` and `l` have been modified, this because the `l.copy()` is a shallow copy and effetcs only the outer object. As a matter of fact:

In [None]:
cp[0] is l[0], cp[1] is l[1]

### Deep Copies

To performe a copy at the deepest level of an object a recursive approach, able to handle circular references, is needed. Python has the built-in module `copy` to carry out a deep copy.

In [None]:
from copy import deepcopy
cp = deepcopy(l)
l, cp

In [None]:
cp[0] is l[0], cp[1] is l[1]

the deepcopy is intelligent enough to retin references also after the deep copy. Lets see an example:

In [None]:
class MyClass:
    def __init__(self, a):
        self.a = a
        
x = MyClass(100)
y = MyClass(x)
lst = [x, y]

x is y.a

Now `lst` is a sequence that contains two elements, `x` and `y` that has an attribute `y.a` that point to `x` (not a circular reference. Now if we performe a deepcopy, python will create new objects for each element, but will retain the relationship between `y.a` and `x`.

In [None]:
cp = deepcopy(lst)
# now even if we have different objects
x is cp[0], y is cp[1]

In [None]:
# the relationship is retained
cp[0] is cp[1].a # same of x is y.a

## Slicing

Slicing is an opertaion that works with indexing, therefore it is appliecable only to sequence type objects. A `slice` is an object of type slice that can also be created, and assigned to a varible, with the keyword `slice()`.
Slicing always return a new object

The esiest way to slice is specify the star and stop:
* `l[i:j]` with i included and j excluded

Slice are independente from the sequence they are slicing, therefore even if the stop of the slice is out of bound for the sequence, python won't throw an error but slice to the end of the sequence.

In [None]:
l = [1,2,3,4,5]
l[0:3], l[2:100]

It is possible also to specify a third argument for the slice which is the step (or stride), default to 1. Moreover, if the stop argument is not given, python automatically considere the `len` of the sequence as stopping point; same goes for the start argument

In [None]:
l[::2] # start:stop:step

As a conseguence, it is easy to reverse a list with slicing:

In [None]:
l[::-1]

In general, we can have arguments of th slice that are greater the the length of the sequence or even negative. To understand which range are we actually taking into account we need to remeber the folloqing rules:

given `l[i:j:k]`:

* if `k > 0`:
    * if `i, j > len(seq)` -> `len(seq)`
    * if `i, j < 0` -> `max(0, len(seq) + i(or j)`
    * if `i` is omitted or `None` -> `0`
    * if `j` is omitted or `None` -> `len(seq)`
* if `k < 0`:
    * if `i, j > len(seq)` -> `len(seq) - 1`
    * if `i, j < 0` -> `max(-1, len(seq) + i(or j)`
    * if `i` is omitted or `None` -> `len(seq) - 1`
    * if `j` is omitted or `None` -> `-1`

If in the end we are not sure of the slices we are doing, or we don't remember this rules, we can create a slice object with `i,j,k` needed and the use the method `indices()` that will return the indices `i,j,k` corresponding to the `range(i,j,k)`.

In [None]:
slice(0,6,2).indices(len(l))

In [None]:
# which is equivalent to
l, list(map(lambda x: l[x], range(0, 5, 2)))

## Custom Sequences
An essential feature that as to be implemented in a custom sequence object is the `__getitem__` method since it is the one that make possible to iterate over the sequence enabling all sort of iterations, from list comprehension to for loops. The `__getitem__` method should be coded in order to handle positive and negative integers as well as slice objects. An idea of implementation could be the following:

In [None]:
 class MySequence(object):
        
    def __init__(self, length):
        if isinstance(length, int):
            self.length = length
            self.sequence = [i for i in range(length)]
        else:
            raise TypeError('Length must be an integer.')
            
    
    def __repr__(self):
        return f'{self.__class__.__name__}(length={self.length})'
        
    
    def __getitem__(self, index):
        if isinstance(index, int):
            if index < 0: # handle negative index
                index = self.length + index
            if index < 0 or index > self.length: # handle out of bound index
                raise IndexError
            else:
                return self.sequence[index]

        elif isinstance(index, slice):
            start, stop, step = index.indices(self.length)
            rng = range(start, stop, step)
            return [self.sequence[i] for i in rng]

        else:
            raise TypeError

In [None]:
seq = MySequence(10)
list(seq), seq[0], seq[-1], seq[1:3],

### Mutation in custom sequences
We have seen that mutating a sequence means to change it without creating a new object. Example of mutation can be concatenation or repetition and these can happen also in-place. To add this capability to our custom sequence we need to `overload` the definition of the symbols `+, +=, *, *=` by implementing the methods `__add__` and `__iadd__` or `__mul__` and `__imul__`. 

Other common methods that we could implement in our sequence are:
* `__setitem__` as the complement of `__getitem__`
* `__contains__` to add the `in` functionality to check if an element is in the sequence
* `__delitem__` to delete elements with the keyword `del`

## Sorting sequences
Python as a built-in sorting method called `sorted()` with a default ascending order and an optional parameter `reverse=False`. When talking about ordering we have to take into consideration that, excecpt for numbers, it is not trivial to have a ordering criterion for each type of object. An easy example is with strings, that can be ordered lexicographically, but what about lower and upper case? In python for example, the convention is that lower cases come first, i.e. `'a' > 'A'` (ASCII charchters have a code that can be retrieved with `ord(str)`).

In some case, when the arguments are not directly comparable because they dont have a natural ordering, we need to create a `sorting key`, ie.e a rule that will help python understand which is the order it has to consider:

In [None]:
l = [1,'a', 'x', 'Z', '?', 100, 'A']
sorted(l) # the key argument is not provided hence python will try to sort in natural order

In [None]:
l_func = lambda x: ord(str(x)) if isinstance(x, str) else x 
sorted(l, key=l_func)

Sorting return a copy of the iterable with the sorted elements in a list, use the TimSort algorithm (after Tim Peters, author of `import this` the zen of python). Also, it is a `stable sort`, meaning that if, given a key or not, there is equality in the order of two elements, the one that appear first before the sorting will be the first also after the sorting. Lists object have a `sort()` method that instead is an in-place sorting. They have the same algorithm but the `sorted()` method have a greater overhaed since it has to create a copy of the list.

## Zero-based Index

Why are python sequence indexed starting with `0` and not `1` ?  

Essentially Because:
* we want to describe a range of indices using `range(l, u)` with `l <= n < u`
* in this way the length (number of elements) is precisely the upper bound of the sequence (`l - u`)
* if we want to know how many elements precede the element `s[i]`, in a base-0 system is exaclty `i`
* the only starnage behavior is the last index of a sequence which is `len(s) - 1`

## Application - `Polygon`
Link to [Polygon_Class](Polygon_Class.ipynb)

## Lists vs Tuples

Lets look at the difference between lists and tuples (as immutable data-structure) and in particular why tuples are more efficient and should be used instead of lists if the mutability is an attribute not required.

To do this, we have first to definte what `constant folding` is: the process of recognize and evauating constant expressione at compile time and not at run time as computation.

To look at the different compilation of lists and tuples we'll make use of the `dis` module that essentially disassemble the steps that the python compiler execute.

In [None]:
from dis import dis

# let's compile a tuple and a list and disassemble the process
list_dis = dis(compile("[1,2,3,'a']", 'string', 'eval'))
tuple_dis = dis(compile("(1,2,3,'a')", 'string', 'eval'))

list_dis, tuple_dis

We can see the huge difference in compilation between tuples and list; the first load only one constant representing all the elements while the latter load one element at the time. In this case both the containers have immutable elements, but if the tuple contains a mutable object (es. a list) then the compilation advantage is lost, since first python build the list and only after the tuple.

In [None]:
dis(compile("(1,2,3,['a'])", 'string', 'eval'))

### Copying

There is a difference in copying a list and a tuple since one is mutable and the other not. If for example we do a shallow copy of a list, a new object is created, with a tuple instead, since it doesn't make sense for python to have two identical immutable objects.

In [None]:
l1 = [1,2,3,4]
t1 = (1,2,3,4)
l2 = list(l1) # shallow copy
t2 = tuple(t1)

l1 is l2, t1 is t2

### Storing Efficiency

From Python 3.8 there is not much difference in storing efficiency between tuple and list if the dimension of the final object is known. therefore there is little difference in storage between doing:

In [None]:
import sys

l = list(range(100))
t = tuple(range(100))

sys.getsizeof(l), sys.getsizeof(t)

However, if the final dimension of the sequence is unkown, i.e. we append elements to the list, then the list constructor allocate extra memory when it sees that the size is being filled

In [None]:
l = list()
size_prev = sys.getsizeof(l) # to catch the overhead of list creation

for i in range(10):
    l.append(i)
    size_l = sys.getsizeof(l)
    delta, size_prev = size_l - size_prev, size_l
    print(f' n° item: {i+1}, list size: {size_l}, delta:{delta}')

back to [TOC](#TOC)

# Iteration Tools - The itertools module
---
The itertools module is a collection of lazy iterator functions (i.e. very efficient) that can be very usefull in different situations. In this section we will look to several functions that can leverage the use of the itertools module.

## Aggregators
aggregators are function that iterate over an iterable and return a bulk summary of the its content as a singel value. Example are the functions `max()`, `min()`, `sum()` etc.. 

The function `any()` and `all()` are usefull agregator that looks at the truth values of the elements inside an iterator. `any()` will return `True` if at least one element evaluates to `True` while `all()` will evaluate to `True` only if all the elements evaulate to `True`.

N.B. remember that in Python every object has an associated truth value that by default evaluate to `True`. Only elements like `0`, `''`, `None`, `[]` evaluate to `False`. The truth values can be coded also in a custom class by defininig a specific rule in the `__bool__` method. If the bool method is ont defined, python will look for the `__len__` method, where 0 evaulate to `False`. If neither of the two is defined, the custom object will evaulate to `True` by default.

A `predicate` is a function that takes a single argument and return `True` or `False`, like `bool()`, and can be used in conjunction with `all()` or `any()`. Let's say for example that we want to find if all the elements in a list are greater then 0. We could for sure iterate over the whole list, but a more clever way is to apply a predicate to the list and then check the content with `all()`. In this way if, for example there is an element < 0 in the first position, the program won't waste resources iterating over the whole list, saving memory and time. 

In [None]:
lst = [0, -1, 10, 5, -3, 6]
gen = (l >= 0 for l in lst)
all(gen)

## iSlicing
Slicing is the operation of cutting a sequence type object in different ways. The classical notation is composed by up to three terms `[i:j:k]` respectively the start, stop and step parameters. However, with classical slicing is not possible to slice iterables. To do this, we have the `itertools.islice` method.

Of course, `islice` is a lazy iterators, that iterate over an iterable and return a lazy evaluated slice.

In [None]:
from itertools import islice
result = islice(lst, 0, 3)
result, list(result)

Since the result of `islice` is an iterator, once we have mapped it into a list, it is exahusted, therefore we won't be able to use it again. As a matter of fact, calling `list` again will result in an empty list.

In [None]:
list(result)

## Selecting and Filtering

Python has a builting function `filter` that takes an iterable and a predicate and return an iterator. What it doesn under the hood is basically forming a generator expression that loops over the iterable veryfing the predicate on each element.

In [None]:
lst = [1, 2, 3, 4, 5]
list((l for l in lst if l>3)), list(filter(lambda x: x>3, lst))

as expected the two methods are exactly equivalent, both return an iterator that once iterated is exausted. 

From the itertools module we have some lazy version of the filter function, for example `filterfalse` that will essentially filter the negation of the predicate function.

In [None]:
from itertools import filterfalse
lst = [0, '', 'ciao', 1]
list(filterfalse(None, lst)) 
# N.B. if None is supplied as predicate python will look at the truth values of each element

another usefull itertools function for selecting and filtering is `compress` wich takes two iterables, one with the data to be selected and one with a series of selector (values that explicitly evaluates to True or False), adn maps the two. The result is a lazy iterator that contains only the elements in the first iterable that were in the position that evaluated to True in the second one. Better an example than 1000 words:

In [None]:
from itertools import compress
lst = [1, 2, 3, 4 ,5, 6]
test = [True, False, None, 0, 1]
list(compress(lst, test))
# N.B. if the lst has more elemtns than test, the remaining are evalutated to None and therefore discarted

The `takewhile` function takes an iterable and a predicates as arguments and returns an iterator that yield values until the predicate evaluates to `True`; when a `False` evalutation is encountered, at that point the iterator is exahusted. 

In [None]:
from itertools import takewhile
lst = [1,2,3,5,2,1]
list(takewhile(lambda x: x<5, lst))

As expected, when `takewhile` encoutered a predicated that evaluated to `False` (5<5) it stopped yielding eveen if the last elemebts would evaluate to `True`.

Similarly `dropwhile` will do the opposite, it will start yield values from the iterable as soon as the predicated evaluate to `False` the first time:

In [None]:
from itertools import dropwhile
lst = [1,2,3,5,2,1]
list(dropwhile(lambda x: x<5, lst))

## Infinite iterators
From the itertools module we also have a set of infinite iterators that can comes handy. `count` for example  is a function similar to `range`, since we can define a start and a step, but it has no stop parameter. Moreover, start and step can be an numeric type.

In [None]:
from itertools import count, takewhile
list(takewhile(lambda x: x < 11,count(10, 0.3)))

The `cycle` function let us iterate over an iterable or an iterator (yes, also an iterator) over and over. 

In [None]:
from itertools import cycle
lst = [1, 2, 3, 4]
match  = ['a', 'b']
for i, j in zip(lst, cycle(match)):
    print(i, j)

The `repeat` instead simply yields the same element indefinitely, or a defined number of time if specified. N.B. the elemetn that is repeated is actually alwasy the same object!

In [None]:
from itertools import repeat
ciao = repeat('ciao')
for _ in range(3):
    print(next(ciao))

## Chaining and Teeing

Chaining is essentially the operation of concatenate multiple iterables together, something that we can easily fo with the `+` sign. What `itertools.chain` add is the ability to concatenate iterators lazily. `chain(*args)` takes a variable number of arguments that can be iterable or iterators. However, there is a caveat: imagine we have list `l` containing 3 iterators that we want to chain, if we pass `l` with unpacking we are losing the lazyness since unpacking with `*` is an eager procedure (it requires python to iterate over the object to unpack, and if it is an iterator it will exhaust it). In order to pass directly an iterable there is a dedicated method in the chain module: `itertools.chain.from_iterable`, another lazy iterators!

In [None]:
from itertools import chain

l1 = (i**2 for i in range(3))
l2 = (i**3 for i in range(3))
l3 = (i**4 for i in range(3))

list(chain(l1,l2,l3))

If we try to use an iterable of iterators:

In [None]:
l = [(i**j for i in range(3)) for j in range(2,5)]
list(chain(l))

What we get back is a list of the generators created inside `l` and not their chaining.

Instead if we use the `.from_iterable` method:

In [None]:
list(chain.from_iterable(l))

Since iterator are one-time-use it can be beneficial to be able to create copies if we need to use them more than once in our code. The simplest way would be to use a for loop to populate, let's say, an empty list with an arbitrary number of calls of our generator function, however python has a smarter way to do it. The operation is called  "Teeing" and is performed by the `itertools.tee` function that will take 2 arguments, an iterable/iterator and the number of time we want to copy it. The copies will be independent object with different memory address.

N.B. what come back from `tee` is always an iterator even if the object passed was an iterable!

In [None]:
from itertools import tee

def square(n):
    for _ in range(n):
        yield n**2
        
single_iterator = square(5)

multiple_iterators = tee(single_iterator, 5)

multiple_iterators
# each object in the tuple has a different memory address!

## Mapping and Accumulation
Mapping is essentially the application of a callable (a function) to every element of an iterable, something that can be easily achieved with the `map` function (return a lazy iterator), a list comprehension or a generator function.

Of course, itertools has its own mapping function to work with iterators. `starmap` is similar to map but it is able to unpack every sub element of the iterable and mapping to a function. Moreover, we can pass to `starmap` a function that takes multiple argument, ideally a number equal to the number of elements in the nested iterables. As for every function in the itertools module, what is returned by starmap is a lazy iterator.

In [None]:
from itertools import starmap
l = [[1,2,3], [3,4,5]]
list(starmap(lambda x, y, z: x + y + z, l))


Accumulation is a process that reduce an iterable to a single value; es. the `sum` function or more generally the `reduce` function (lazy iterator), which has the advantage to take an arbitrary function to apply to each element of the iterable and also to specify an initializer (see [Partial functions](#Partial-functions))

Itertools has a function similar to `reduce` that is called `accumulate` which take an iterable and a function as arguments (it doesn't support an initializer) and returns (lazily) each intermediate result of the accumulation process

In [None]:
from itertools import accumulate
from functools import reduce
import operator

lst = [i for i in range(1,5)]

reduce(operator.mul, lst), list(accumulate(lst, operator.mul))

## Zipping
The classic built-in `zip` function is a lazy iterator that takes an arbitrary number of iterables and return an iterator that produces tuples. If the iterables passed have different length, the shortes will command the lenght of the resulting iterator.

In [None]:
# the shortest iterable command the iterator length
list(zip([1,2], ['a', 'b', 'c'], [10, 20, 30 , 40]))

However, we may want to zip on the longest iterable, filling the holes with a predetermined value. To do this we have the `itertools.zip_longest` which takes a variable number of iterators as well, but it let us specify a `fillvalue` (defaulted to `None`) that will serve as a placeholder for the shorter iterables.

In [None]:
from itertools import zip_longest
list(zip_longest([1,2], ['a', 'b', 'c'], [10, 20, 30 , 40], fillvalue='Filled'))

## Grouping
Sometimes, while iterating over an iterable, let's say a list of tuples, we may need to group the elements based on a specific pattern or a key. To do this there is the function `itertools.groupby` wich takes an iterable, a key function and returns a lazy iterator

In [None]:
from itertools import groupby

iterable = [(1, 10, 100), (1, 11, 101), (1, 12, 102),
            (2, 20, 200), (2, 21, 201),
            (3, 30, 300), (3, 31, 301), (3, 32, 302)]

groups = groupby(iterable, lambda x: x[0]) # key function groubing based on the first element of the tuples
for group in groups:
    print(f'key is {group[0]}')
    print(f'resulting in group:\n{list(group[1])}', end='\n\n')

calling next on the sub iterator `group[1]` will actually consumes the original iterator created upond the iterable passed as argument to `groupby`. However, if we decide to skip iterating, let's say, on the second group of elements, when we call next on the third group, python automatically iterates also over the second in order to return in output the elements of the correct group.

N.B. `groupby` will create groups with elements that consecutively have the same key, it doesn't sort the iterbale first, therefore, depending on the need, it migh be needed to pre-sort the iterables before passing it to `groupby`.

### Caveat: lazy iterators in I/O operation
Imagine we want to read a csv file that contains a list of products (brand, product) in rows; we want to group this data by the the brand and we opt to use the `itertools.groupby function`. We can immagine to structure something like this:

```python
from itertools import groupby

with open('my_file.csv') as f:
    grouped = groupby(f, lambda x: x[0])
```

Now we surely would expect to be able to look at the grouped iterator by, for example, castin it to a list:

```python
list(grouped)
```

however, what we would have in return is a `ValueError: I/O operation on closed file.`. this is because `groupby` is a lazy iterator and therefore it won't actually evaluate its content untill requested, that in our case is outside the `with` statement, i.e. when the file `my_file.csv` has already been closed.

## Combinatorics
The itertools module has also some useful functions realted to combinatorics; as a matter of fact we have `permutations`, `combinations` and `cartesian product` of multiple iterables, all returning a lazy iterators.

### Cartesian Product
The cartesian product is the conbination of all the elements in 2 or more sets (don't need to be of the same length). In 2 dimension is something that we do preatty often, and can be achieved easily with a nested for loop, but when the dimension start to increase it can become messy. To easily handle an `n-dimensional` cartesian product we can use the `itertools.product` which takes an arbitrary number of arguments and return lazily their cartesina product.

In [None]:
from itertools import product

l1 = [1,2,3]
l2 = [1,2,3]

list(product(l1, l2))

### Permutations
Statistically speaking, simple permutations are all the possible combinations (without repetition, i.e. no duplicates element are present in the set) of all the elements in a set. Givena set of dimension `n` the toal number of permutation is given by `n!`. To performe a permutation in python we can rely on the `itertools.permutations` function wich takes as argument an iterable and, optionally, the length of the permutation. There is a caveat thou, since all the elements in an iterable, say a list, are distinct objects even if they have the same value; this means that, in case of duplicated values in an iterable, the permutation will contains an apparent repetition, which in reality is not because the object underlying is being switched in potition generating *de facto* a new permutation. **Elements are unique based on their position, not their value!**

In [None]:
from itertools import permutations

string = 'abc'

list(permutations(string))

but, be careful, uniqueness is given by position not values, therefore if we a duplicate value in a different position, this result in a new object, therefore, not counting as a duplicate.

In [None]:
string = 'aba'

list(permutations(string))

### Combinations
Unlike permutations, combinations don't care about the order of the elements (i.e `'ab'=='ba'`). Combinations can be defined `without replacement`, meaning that once an element is picked from a set it cannot be picked again, and `with replacement`. To perform combinations in python we can use the `itertools.combinations` or i`tertools.combinations_with_replacement`; both functions take an iterable and an the optional length of the combination

In [None]:
from itertools import combinations

l1 = [1,2,3]

list(combinations(l1, r=2))

In [None]:
from itertools import combinations_with_replacement

l1 = [1,2,3]

list(combinations_with_replacement(l1, r=2))

back to [TOC](#TOC)

# Context Managers PEP 343
----
The definition of 'Context' in Python is the `the state surrounding a section of code`, we can think it as the `scope` in wich the code is referencing its variable while running. A classical example is the `with` statement used while, for example, open a file; in this case we create a `context` that it is specifically design to handle the file, we `enter` the context opening the file, we `work` in the contex while manipulating the file and we `exit` the context closing the file (escaping the 'with' indentation).

```python
with open('file.txt', 'r') as f:  # Entering the context
    print(f.readlines()) # Working in the context
print('We are out of the context') # Exiting the context
```

We could have use a `try-except` expression to handle possible error inside the context 'file-handling' but this is a much cleaner approach.

We can resume the context manager as the process to manage data in our scope giving an on-entry/on-exit functionality (open and closing file in the example above). Other classical example are querying a database, set a tolerance and reset it back, lock and release a thread etc..

### Try..Except..Finally
Let's first look at the most direct way to create a context manager, i.e. a section of code that provides entering and exiting functionality. By this mean, the most important part is given by the `finally` statement since it is always excuted even if an exception is raised (i.e. it is our `on-exit` event). However, handling each time a try-except statement can become cumbersome and messy, therefore there is a better way to handling this process.

Basically, the pattern we want to reproduce is:
* create an object
    * work with the object
* clean up the object after the work is done (we want to do this automatically!)

The PEP 343 introduces the `with` statement, the keyword that is used to enter a context manager:

```python
with open('file.txt', 'r') as f:
    ''' ENTERING THE CONTEXT
    'with ... as f' are the keyword that allows us to enter the context manager
    created by 'open()'. 'as f' is optional and returns and object from the context manager
    (an alias to for the 'filename')
    '''
    print(f.readlines()) # Working in the context
    ''' WORKING IN THE CONTEXT
    We are in the 'with' block, inside the context, where we can manipulate the object 'f' (optional)
    '''
    
print('We are out of the context') # Exiting the context
''' EXITING THE CONTEX
without the need of any 'f.close', when we exit the indentation of the block, the context manager
is outomatically closed, and its reference cleaned up
'''
```

### The context management protocol
Has we have seen for iterators, the context management is nothing more than a protocol, meaning that we can implement our custom class that implements the context management protocol by providing two special function: `__enter__` which handles the setup, and `__exit__` which handles the cleanup. Let's try to understand step-by-step what's happening:

```python
with CtxManager() as obj: ''' An instance of the CtxManager is created (it is called with open-close parenthesis).
                              This is equivalent to say '''
                              manager = CtxManager()
                          ''' but instead we dont have a variable name assign to the call, and we actually don't need it.
                              The 'with' statement call the '__enter__' method on the context manager and if 'as obj' is
                              specified (it is optional) it is used as variable:'''
                              obj = manager.__enter__()
                          ''' Then we need to handle the work we will do inside the context, beign sure that we reach the
                              exit condition even if an exception is encoutered. this traduces in a try-finally statement
                              under the hood '''
                              try:
                                # do something
                              except:
                                # possible handling of some exception
                              finally:
                                # we are done with the context
                                manager.__exit__()
```

The `CtxManager` in the example above is simply a class that implements the context management protocol:

In [None]:
class MyClass:
    def __init__(self):
        # print('initialization')   
        pass
    def __enter__(self):    
        return obj
    def __exit__(self, *args):
        # print('clean up obj') 
        pass

At this point it is equivalent to do:

In [None]:
my_obj = MyClass()

with MyClass() as obj:
    pass

The only difference is that in the second case, we don't have a direct handlying to the instance of Myclass() that is created (in the first case is the variable 'my_obj'), but no worries, python has! 

The `with` look for the `__enter__` mehtod inside MyClass and if founded, it calls it `manager.__enter__()` ('manager' id just a fictitious name to indicate the instance of MyClass created under the hood by the `with` statement).

Whetever is returned by the `__enter__()` method is assigned to `obj` (if `as obj` is provided). N.B. `obj` is not the instance of MyClass that was created by the `with` statement! (actually it can be but only if the `__enter__` method return `self`, i.e. the instance itself of the context manager we hve just entered.

Whetever we exit the `with` block, or an exception occur, the MyClass `__exit__` method is called (`manager.__exit__()`), as it would happen with `finally` in a try-except.

### The 'with' Scope
The with block, unlike funciton or list comprehension, has not its own scope. It lives in the scope where it is running, local if it inside a function or global otherwise, and the same apply to the object is returned by the `__enter__` method.

### The \_\_enter\_\_ mehtod
The enter method is preatty straigthforward, it just need to handle what we need to happen when we enter the context manager, and optionally it can return an object (the one referred in the with statement as 'as obj').

### The \_\_exit\_\_ mehtod
The exit method is similar to the `finally` statement in a try-except block, it has to be executed always even if an exception occur, but more than that; we may want to be able to handle whichever exception has been raised and modify on its basis the behavior of the `__exit__` method. Therefore, the `__exit__` method needs to knwo which exception occurred and has to tell python if he has to silence it (the program keeps running) or let it propagate (the exception is raised and the program interrupted).

The exit method needs 3 argument:
* the esception type that occurred (if any, None otherwise)
* the exception value that occurred (if any, None otherwise)
* the traceback object if an exception occurred (if any, None otherwise)

It has to return `True` or `False`:
* True if the exception raised must be silenced
* False the exception has to be raised 

In [None]:
def __exit__(self, exc_type, exc_value, exc_trace):
    # do the cleanup
    return True # silence eventual exception

### Context Manager Class
Now, let's try to build a context manager class, i.e. a class that implement the `__enter__` and `__exit__` method, with some print statement that will help us understand how the process is handled.

In [None]:
class CtXManager:
    def __init__(self):
        self.obj = None
        print('Ctx initialized...')
        
    def __enter__(self):
        print('__enter__: entering context...')
        obj = 'Obj returned by __enter__'
        return obj
    
    def __exit__(self, exception_type, exception_value, exception_traceback):
        print('__exit__: exiting context...')
        if exception_type: # if an exception is raised
            print(f'An exception has been raised and handled by __exit__')
            print(f'Exception type: {exception_type}, Exception value: {exception_value}')
        return False # False tell python to not silence the exception       

Now we can use equivalently two approaches: initialize the CtXManager class and the entering the context manager with the `with` statement, or directly call the `with` statement.

In [None]:
ctx = CtXManager()
print('***Context Ready***')
with ctx as obj:
#with CtXManager() as obj:
    print(f'Entering the with block with the {obj}...')
    raise ValueError('A dummy error')

### Caveat with Lazy Iterator
Care must be taken when returning a lazy iterator from a context manager, for example when reading a file. The reason is that, if we don't `yield from` the ctx manager (in this way we dont exit the with block until the generator is exhausted) but we simply `return` an object, and the aim is to use that object in a second stage outside the with block, we will get an error a `ValueError: I/O operation on a closed file`. This is because, once outside the ctx manager, the exit method is called and therefore the file is closed!

### Additional use of Context Manager
Context Managers can be used for a lot more stuff than just opening and closing files. Folloqing some example of application.

#### Redirect Standard Output

In [None]:
import sys

class OutToFile:
    def __init__(self, fname):
        self._fname = fname
        self._curren_stdout = sys.stdout
        
    def __enter__(self):
        self._file = open(self._fname, 'w')
        sys.stdout = self._file
        
    def __exit__(self, exc_type, exc_value, exc_tb):
        sys.stdout = self._curren_stdout
        self._file.close()
        return False        

#### Timing a Function

In [None]:
from time import perf_counter, sleep

class Timer:
    def __init__(self):
        self.elapsed = 0

    def __enter__(self):
        self.start = perf_counter()
        return self
        
    def __exit__(self, exc_type, exc_value, exc_tb):
        self.stop = perf_counter()
        self.elapsed = self.stop - self.start
        return False

## Context Manager Decorator
One of the driver that pushed the Python community do develope the context management constructor was the ability to handle the context of generator functions. Essentially what we need to create a context manager from a generator function is to trap the entering and exiting of the context with a tyr-finally , where 'try' has the duty to yield values from the generator and 'finally' has to clean up the context. We can than build a context manager class that takes as argument an instance of our generator function and implementxs the context manager protocol; on enter we will call `next` on the generator to open the file, on exit we need another try-except to call again next to exhaust the generator, therefore closing the file, and trap the `StopIteration` exception that the generator would naturally raise. This is it, we are able to use a generator with a proper context manager. To simplify the process, python has a built-in standard library `contextlib.contextmanager`, which is a decorator that turns generator functions into context manager.

In [None]:
from contextlib import contextmanager

@contextmanager
def open_file_gen(f_name): # our generator function
    f = open(f_name)
    try:
        print('yielding...')
        yield f
    finally:
        print('closing file...')
        f.close()
        
# now we can use the generato function as a context manager
with open_file_gen('nyc_parking_tickets_extract.csv') as gen:
    for _ in range(3):
        print(next(gen).strip('\n').split(','))

## Nested Context Manager: ExitStack
We can easily manually nest multiple context managers simply keeping the indentation growth, and this is totally fine when the number of context manager is known and small, but what if this is not the case? The `contextlib.ExitStack` is the built-in python class that allows us to handle a variable number of context managers. Basically, `ExitStack` on enter return itself and with a dedicate function stores the `exit` method of each context manager passed. On exit it simply call the exit methods stored in reversed order (to close the context from the most inner).

In [None]:
from contextlib import contextmanager, ExitStack
f_names = 'file1.txt', 'file2.txt', 'file3.txt'

@contextmanager
def create_and_destroy(f_name):
    f = open(f_name, 'w')
    try:
        f.write('Hello')
        yield f
    finally:
        print('Destroy the secret!')
        f.close()
        os.remove(f_name)
           
with ExitStack() as stack:
    files = [stack.enter_context(create_and_destroy(f)) for f in f_names]
    print('working in all the files at once...')

back to [TOC](#TOC)

# Strings
---

Strings are immutable object of the sequence type, therefore they have indices and can be used as an iterator. String are an homogeneous containers with fixed length and order.

## Common methods
- isalpha() -> check if is alphanumeric
- isprintable() -> check if is printable

back to [TOC](#TOC)

# Lists
---
List are mutable object 0f the sequence type... (to be extended)

## List comprehension
The goal of list comprehension is to **generate a list by `tranforming`, and optionally `filtering`, another iterable**.

Like a function, list comprehension have a localscope (what is inside the sqaured brackets is essentially the body of a function) but they can freely access also the globalscope. As a matter of facts, when python compile the list comprehension rhs, it creates a temporary function which is essentially the equivalent of a for loop (with eventual if statements). This also implies that, after the excution, the varibale inside the local scope are deleted and can't be retrieve in the global namespace.

We can try to disassemble a list comprehension to see whats happening:

In [None]:
from dis import dis

compiled = compile('[i**2 for i in (1,2,3)]', filename='string', mode='eval')
dis(compiled)

as we can see, python is create a function `4 MAKE_FUNCTION` 

List comprehension can be nested one inside the other, creating closures betwen themself.

In [None]:
lst = [[i*j for i in range(5)] for j in range(3)]
lst 

List comprehension can have as many nested for loop as we want:

In [None]:
l = []
for i in range(2):
    for j in range(2):
        for k in range(2):
            l.append((i,j,k))
            
            
l_1 = [(i, j, k) for i in range(2) for j in range(2) for k in range(2)]

print(l)
print(l_1)

Note that the order in the for loop is the same in the list comprehension

We can also add if statement inside the for loop and of course the order matter! They have to be referenced after the for loops.

In [None]:
l = []
for i in range(2):
    for j in range(2):
        if i == j:
            l.append((i,j))
            
l_1 = [(i, j) for i in range(2) for j in range(2)  if i == j]

print(l)
print(l_1)     

We can add an else statement, but in this case the `if..else` statement must come before the for loop. since it acts like a filter for the for expression.

In [None]:
l = []
for i in range(2):
    for j in range(2):
        if i == j:
            l.append((i,j))
        else:
            l.append('else')
            
l_1 = [(i, j) if i == j else 'else' for i in range(2) for j in range(2)  ]

print(l)
print(l_1)     

In [None]:
l_1 = [(i, j)  for i in range(2) for j in range(2) if i == j else 'else' ]

back to [TOC](#TOC)

# Tuples
---
Tuple are known as immutable list, they are object of the sequence type, therefore they have indices and can be used as an iterator. Tuples can be homogeneous or heterogeneous containers. Together with immutability, in comparison with lists, the main difference are that tuples have a fixed length and a fixed order (cannot be im-placed sorted or reversed like lists).

Due to this property, we can think of tuples as data records, where the position of the data, one define have a precise meaning:

```py
# Circle(x, y, radius)
circ1 = (0, 0, 10)
```
Once we have define the structure of our container, tuples can be used to store data efficiently, since once created we are sure that nobody will be able to accidentally modify it.

## Named Tuples
Sometimes, defining a custom class can be an excessive effort if what we want to achieve is simply storing data in a custom data structure. Classes require at least some methods to be implemented to be property employed in our code (such the `__eq__` and `__repr__` method for example); moreover the instance of a class is not immutable and can lead to potential errors. On the opposite, plain tuples are immutable and well opt to store data, but accessing properties with indices can be troublesome for the user or even for other developer to read. 

```py
# Class vs tuple approach
class Person:

    def __init__(self, name, age)
    self.name = name
    self.age = age

p1 = Person('Luca', 44)
p1.name, p1.age # 'Luca', 44

p1 = ('Luca', 44)
name = p1[0]
age = p1[1]
```

Of course, python as a perfect solution to this kind of problem, i.e. named tuples. `namedtuples` are functions that comes shipped in the `collenctions` standard library; they are a subclass of the `tuple` type but they are not a type them self. Instead, namedtuple is a function that generate a new class (`class factory`) which can assign property names to positional elements.

In [None]:
from collections import namedtuple

Pt2D = namedtuple('Point2D', ['x', 'y'])

# Pt2D is a variable alias of the class `Point2D` generated by the class factory namedtuple

pt = Pt2D(x=10,y=20)
pt, pt.x, pt.y

Each time we call the `Pt2D` functions, python is using the `__new__` method of the `Point2D` class to create a new instance of that object and return the tuple.

There are several ways in which we can pass the arguments to the namedtuple function:

In [None]:
Pt2D = namedtuple('Point2D', ['x', 'y']) # list
Pt2D = namedtuple('Point2D', ('x', 'y')) # tuple
Pt2D = namedtuple('Point2D', 'x, y') # comma separated strings
Pt2D = namedtuple('Point2D', 'x y') # whitespace separated strings

# and remember, namedtuple are subclasse of the tuple type
pt = Pt2D(x=10,y=20)
isinstance(pt , tuple)

Differently from a class instance, since `pt` is a tuple, it is immutable, i.e. we cannot modify its attribute (in a class object we could).

In [None]:
pt.x = 100 # cannot do! it is a tuple -> immutable

N.B namedtuple arguments name CANNOT contains underscore!

In [None]:
Pt2D = namedtuple('Point2D', 'x _y') # ERROR!!

unless we provide the keyword `rename=True`, in which case the namedtuple will convert the erroneous name into the positional number of the argument preceded by an underscore

In [None]:
Pt2D = namedtuple('Point2D', 'x _y', rename=True)
pt = Pt2D(10, 20)
pt

### Introspection
The namedtuple generated classes that are shipped with some methods that can help ud in the introspection of our code:

In [None]:
Pt2D._fields

In [None]:
pt._asdict()

### Modify and Extending
Namedtuple are immutable in essence but they come shipped with some methods that helps us handling arguments substitutions and extending. Basically the original tuple is overwritten and associated to a new memory address. Looking to Pt2D as example, we can modify its parameters with the method `_replace()`:

In [None]:
# pt right now is Point2D(x=10, _1=20)
pt = pt._replace(x=50)
pt

Instead, if we want to extend the namedtuple, adding more argument we can use the the `_fields` property of the existing namedtuple (which is a tuple), adding the element/s we want and create the new namedtuple with extended fields:

In [None]:
Pt2D = namedtuple('Point2D', 'x, y')
old_fields = Pt2D._fields # is a tuple
new_fields = old_fields + ('z', ) # concatenation of two tuple
Pt3D = namedtuple('Point3D', new_fields) # equal to say Pt3D = namedtuple('Point3D', (x,y,z)) 
pt = Pt3D(10, 20, 30)
pt

### Docstring
The namedtuple is shipped with a set of precompiled docstrings that can be access as always with the `help()` or the `__doc__` method.

In [None]:
help(Pt3D)

### Defaults values
When we create a namedtuple, we cannot specify default arguments. One way to circumvent this would be to define an instance of the namedtuple (e.g. setting all parameters to 0) and then use the `._replace()` method to specify only those arguments that we want to have a default value. Alternatively we can use the `__defaults__` method (it can be use in the same way on any function). To use this last approach on the namedtuple we first need to create a new instance of the class using the `__new__` method and then call the `__defaults__`:

In [None]:
# `__defaults__` on a generic function
def func(x, y, z):
    print(x, y, z)

func.__defaults__ = (10, 20) # N.B. The replacement starts from the last parameter
func(x=0)

In [None]:
# Create a new instance of the Pt3D function setting a default values fro the argument z
Pt3D.__new__.__defaults__ = (0,)
pt = Pt3D(x=10, y=10)
pt

back to [TOC](#TOC)

## Associative arrays
Associative arrays are abstract data structure that can be implemented in different way but that shares the concept of a `collection of key-value pairs`. Dictionaries are associative arrays and in python they have extra functionalities but, in general, to be and associative arrays the data structure should implement:
* adding/removing elements
* modifying elements
* looking up values via key

### Hash Maps
Hash maps are one of the possible implementation of associative arrays. They are based on the concept of `hash function` that is a mathematical expression that maps from a set of arbitrary size (thoretically infinite) to another smaller set of fixed size. Moreover we want the output of the has function to be as equally distributed as possible along the dimension of the hash table (a table that associate to each slots a key, that will be the result of the hash function). This is to avoid collisions, meaning that the result of the hash function is an already occupied key. this problem is directly related to the initial size of the hash table: as a matter of fact, if the table is very large we will have no problem of collisions but we will have a huge memory constrain on a data-structure that maybe in only partially in use; on the opposite we could choose to start with a small table and then resizing at need, but resizing is an expensive operation since we have to recompute the hashes and move data.

### Python Dictionaries implementation (PEP 412)
In Python, dictionaries are probably the most importatn data structure, since most of the objects in it are related to dictionaries type (namespaces, classes, modules, functions, sets ..).

From Python 3.6 we had a major modification in dictionaries implementation that, as  side effect, also preserv the order of insertion of the key-value pairs. The details and the motivation aroud its implementation can be found in PEP 412.

### Python `hash()`
Python has ofc a built-in hash function that always return an int of fixed size (depending on the operating system and the python version you aree running. If two variable holds the same value (i.e a == b is True) then also their hash comparison is also True. However, the hash of the same values can differ from one round to the other (they might but it is not guaranteed, for security reasons that I dont know ;D).

In [None]:
import sys

sys.hash_info

not everything is hashable in python, for example list and dictionaries are *not* hashable (because they are **mutable objects**).

back to [TOC](#TOC)

# Dictionaries
---

From Python 3.6, dictionaries are ordered hasmaps, containing (key,values) pairs. To be noted that while the order of insertion is retained now, if you print a dictionaries, it will be sorted in lexicografical order, therefore dont count on that!

Before 3.6, to have an ordered dictionary we would have used the `collections.Orderdict`. this is not completely surpassed since it has some usefull functionalities like `move_to_end` and `popitem(last=False)` to move/pop a key in front or at the end of the dictionary.

## Creating dictionaries
As a recap, dictionaries are associative arrays that therefore rely on the `key-value` pairs structure. The `value` can be any king of python object while the `key` has to be hasable, therefore, it has to be a constant, i.e. an immutable object (list and sets are not hashable for example). More in general an object is hashable if it returns an integer values and if two object compare to equal (==) also their hashes must be equal.

Following a list of ways to create a dictionary:

In [None]:
# with the curly brakets notation keys can be any hashable object
dict1 = {'key1':1, 222:2, print:1, (0,1):tuple}
dict1

In [None]:
# in this case keys must be a vaid identifier, like in functions or class names
dict2 = dict(key=1, john=list, man=5)
dict2

In [None]:
# dict comprehension
dict3 = {i:i**2 for i in range(5)}
dict3

In [None]:
# fromkeys, passing an iterable
dict4 = dict.fromkeys(['a', (0,0), list], 'whaat?')
dict4

## Common operations with dictionaries
Given a dictionary `d` we can:

In [None]:
d = {'key1':1, 222:2, print:1, (0,1):tuple}

`d[key] = value` - create a key-value pair if the key doesnt exist or replace its values if is already present

In [None]:
d['key2'] = 'new!'
d

`d[key]` - request the associated value to `key`, if `key` is not in the dictionare we will have a `KeyError`

In [None]:
d['key3']

`d.get(key, default)` - to avoid the `KeyError`; if `default` is not specified return `None`

In [None]:
d.get('key3', 0)

`d.pop(key)` returns and delete the value if the key exist, if not return `KeyError` unless a default is specified

In [None]:
d.pop('key3', 'Not Found!')

`d.popitem()` retunr the last element inserted and remove the key-value pair in the dict

In [None]:
d.popitem()

In [None]:
d

`d.setdefault(key, value)` test if a key exist, if not insert the `value` and return it. If the Key already exist, returns its current value

In [None]:
result = d.setdefault('aaa',dict)
result

In [None]:
d

## Dictionary view PEP 3106
There are essentially three ways of getting a view of dictionaries in Python:
* `d.keys()` - return the keys of the dict; it behave like a set since key must be unique
* `d.values()`- return the values of the dict
* `d.items()` - return the key-value pairs as tuple; it behave like a set since the pairs are unique

Behaving like sets, `keys()` and `items()` can be manipulated with sets operation like intersection, union etc.. But be aware! while dictionaries preserve indertion order, and therefore also their views, sets don't meaning that after permorming a set operation on a dictionary view the order is no more guaranteed.

Another important feature of modern dictionaries implementation is that the views are dynamic, meaning that if we store, let's say the keys of a dictionary into a variable with `key = d.keys()`, and later on the keys of the dictionary change, then also the view change.

At last, if we can we should always iterate over the view of a dictionary since it is way faster that do the hash look up each time we require the value associated to a key (e.g. `d[key]`, instead is better `for k, v in d.itmes():`.

In [None]:
d.keys()

In [None]:
d.values()

In [None]:
d.items()

## Custom Class and Hashing
Python has a default behavior when calling the hash() function on an object: it looks for the `__hash__` method and if not defined it will hash based on the `id` of the object (this, in cascade will produce euqality on id, ==, and hash). This leads in a prticular situation to a somehow unwanted behavior.

Imagine we have two instances of a class `Person` that represent actually the same person; we want to insert one of these in a dicitonary as a key and be able to retrive its value calling eithre of the two instances, since for us they represent the same person.


In [None]:
class Person:
    def __init__(self, name):
        self.name = name
        
p1 = Person('Gio')
p2 = Person('Gio')

p1 == p2, p1 is p2

As expected, niether of the comparison evaluates to True, therefore neither their hash evaluates to True.. leading to..

In [None]:
d = {p1:10}
d[p1]

In [None]:
d[p2]

We got a `KeyError` as expected since it is not respected the hash equality property between `p1` and `p2`. 

However, in our aim this is unwanted, for us `Gio` is the same person, therefore the two instance of the class should be seen as the same object by the dictionary. What we can do to tweak this behavior is to implement in the class person the  `__eq__`:

In [None]:
class Person:
    def __init__(self, name):
        self.name = name
        
    def __eq__(self, other):
        if isinstance(other, Person):
            return self.name == other.name
        else:
            return False

p1 = Person('Gio')
p2 = Person('Gio')

p1 == p2, p1 is p2

Now, even if the two object are still different in `id`, they can be compared with `==` according to the `name` property. Probelm solved?

In [None]:
d = {p1:10}
d[p1]

Nope! we get a new error; now the class `Person` is unhashable, this because when implementing the `__eq__` method, python disable the default hash protocol (ofc.. it is based on the object `id`). So what to do?

By the way, if we want explicitly to tell python that our object is not hashable we can set the `__hash__` attribute to `None` and since returning an integer is a ,mandatory proprerty of an has function, python will understand that the object cannot be hashed. An this is exaclty what python does when we define only the `__eq__`  method: it sets `__hash__ = None` in the class.

What we need to do is defining our custom hash method for the class and make it return what is appropriate for our purpose. In the case of our `Person` class, we just need to hash the name attribute.

In [None]:
class Person:
    def __init__(self, name):
        self.name = name
        
    def __eq__(self, other):
        if isinstance(other, Person):
            return self.name == other.name
        else:
            return False
        
    # __hash__ = None -> if we want the class to be non-hashable
        
    def __hash__(self):
        return hash(self.name)

p1 = Person('Gio')
p2 = Person('Gio')

p1 == p2, p1 is p2, hash(p1) == hash(p2)

(True, False, True)

And now we can use p1 and p2 as the same key for our dictionary since they are equal and have the same hash. From a dictionary prespective, p1 and p2 are exaclty the same even if they are differente instance of the same class, i.e. different objects.

In [None]:
d = {p1:10}
d[p1]

10

In [48]:
d[p2]

10

## Defaultdict

Defaultdicts solves the problem of getting a `keyError` back when trying to access a key in a dictionary that doesn't exist. In reality, there is already a built-in way to do this but not sistematically, and can get messy in a bigger code, i.e. using the `get(key, default)` method of diciontaries. However we need to specify the get each time we do a key search.

The pythonic way to do it is to use a specilized dictionaries: `collections.defaultdict`. defaultdict is a subclass of `dict` and is defined with a callable as argument, i.e. a function that gets called whenever the default value is required. The callable has to take zero arguments and if not specified will return to `None`. Whiele the get method just returns a default values that can be eventually stored in the dictionary, the default dict automatically create an entry for the not-found key with the default values specified in the callable.

The callable function doesn't need to be deterministic, it could be an API call that returns a particular value in different situations; the importat thing is that it need to takes no arguments.

In [57]:
from collections import defaultdict
d = defaultdict(lambda: 'notFound')
d.items()

dict_items([])

In [58]:
d['test']
d.items()

dict_items([('test', 'notFound')])

Following, an example from Fred of how to leverage the poiwer of defaultdict. Essentially he creates a function that contains a decorator that create a defaultdict that stores the number of tiume and the utcnow() of the first call of a function... amazing

In [67]:
from collections import defaultdict, namedtuple
from datetime import datetime
from functools import wraps

def function_stats():
    d = defaultdict(lambda: {'count':0, 'first_called': datetime.utcnow()})
    Stats = namedtuple('Stats', ['decorator', 'data'])
    
    def decorator(fn):
        @wraps(fn) # decorator for storing the metadata 
        def inner(*args, **kwargs):
            d[fn.__name__]['count'] +=1
            # if the function already exist in d, it only update the count,
            # otherwise it stores also the utcnow()
            return fn(*args, **kwargs)
        return inner
    
    return Stats(decorator, d)


stats = function_stats()
stats.decorator, stats.data

(<function __main__.function_stats.<locals>.decorator(fn)>,
 defaultdict(<function __main__.function_stats.<locals>.<lambda>()>, {}))

In [68]:
@stats.decorator
def func1():
    pass

@stats.decorator
def func2():
    pass

func1()
func1()
func2()

In [69]:
from pprint import pprint
pprint(stats.data)

defaultdict(<function function_stats.<locals>.<lambda> at 0x7ffa16dd91b0>,
            {'func1': {'count': 2,
                       'first_called': datetime.datetime(2022, 6, 17, 11, 7, 49, 680927)},
             'func2': {'count': 1,
                       'first_called': datetime.datetime(2022, 6, 17, 11, 7, 49, 681008)}})


## Counters

One common application of dictionaries is to use them as counters, and we have seen how can we solve the problem of missing keys with the `get()` method or leveraging the `defaultdicts` module. However, if we have multiple diciontaries this can become tedious, therefore pytho has a specialized dictionary for this case: `collenctions.Counter` the python implementation of `Multi-Sets` (?).

So the `Counter` is a specialized dictionary that:
* acts like a defaultdict with default = 0
* is a subclass od dict and therore inherits most of its class methods (not 'fromkeys', and 'update' become an inplace addition to the count
* has additional functionality to generate `frequency tables`, i.e. count how many time a key appears
* supports addition, subtraction and sets operations betweeen Counters object

In [1]:
from collections import Counter

c1 = Counter('aaabbcdddffrr')
c1

Counter({'a': 3, 'b': 2, 'c': 1, 'd': 3, 'f': 2, 'r': 2})

In [2]:
c1.most_common()

[('a', 3), ('d', 3), ('b', 2), ('f', 2), ('r', 2), ('c', 1)]

In [3]:
c1.total()

13

## ChainMap

ChainMap is a module from the `collections` package that works similarly to the `itertools.chain` module, but specialized for dictionaries. Essentially it creates a view (no storage, required, nothing new is created when chaining) of the combination of multiple dictionaries. Unlike the `dict.items()` view, a chainmap sees chnges in the underlying dicts and can also be updated.

While a classical `chain` returns a lazy iterable, the `chainmap` returns a map, therefore keys must be unique. So, if the chained dictionaries shares a key, the chainmap will view only its first appereance (i.e. the order of chaining is very important). N.B. the opposite happens when unpacking two dictionaries to create a combination of the two, there the last instance of a key is retained (d3 = {\*\*d1, \*\*d2})

**N.B. the key ordering in a chainmap is not guaranteed, therefore, unlike standard dictionaries (post 3.6) we can't rely on insertion order**

Chainmap are usually thinked of as `Parent-Child` relationship, where the child is the first element to be chained and the parents all the others. this is because the child is the only one assured to have all its keys retained, while cascading, the parents might have their keys overrided. With this structure in mind, chainmpas have ad hoc methods line `d.parents` to select only the parents oin the chainmap or `d.new_child(d1)` to add a new dictionary in front of the others as the new child.

To modify a chainmap, the best way it to use the `d.maps` method which returns a `mutable list` of the dicionaries in the chain (retaining the child-parent order). The `d.maps` being a mutable list has the methods `append`, `insert` that mutate chaimap.

In [25]:
from collections import ChainMap

In [26]:
d1 = dict(a=10, b=100, c=30)
d2 = dict(e=10, b=5, f=60)
d3 = dict(a=1000, e=3, l=300)

d = ChainMap(d1, d2, d3)
d

ChainMap({'a': 10, 'b': 100, 'c': 30}, {'e': 10, 'b': 5, 'f': 60}, {'a': 1000, 'e': 3, 'l': 300})

In [27]:
print(f'The first occurence of the key "a" is {d["a"]}')

The first occurence of the key "a" is 10


In [28]:
d.maps

[{'a': 10, 'b': 100, 'c': 30},
 {'e': 10, 'b': 5, 'f': 60},
 {'a': 1000, 'e': 3, 'l': 300}]

We can also modify the source dictionaries from the chaimap, for example associating a new value to an existing key, however **the changes will affect only the child map**, therefore if a key is updated but is in a parent maps, a new entry in the child will be created. Similarly, if we call a `del` on a key, it will only look in the child map, and if there is another appereance of that key in a parent maps, it will be retained; moreover, if we try to delete a key that is in a parent maps but not in the child, we will get a `KeyError` exception.

In [37]:
d['l'] = 1 # the changes happen in the child map!
d

ChainMap({'a': 10, 'b': 100, 'c': 30, 'l': 1}, {'e': 10, 'b': 5, 'f': 60}, {'a': 1000, 'e': 3, 'l': 300})

A cool usecase of chainmaps could be to store a reference immutable dictionary (like a set of default options), and chain it with an empty dictionary that will be the *user_configuration*.. es:

In [34]:
default_config = {'channel':5, 'volume':55, 'user_name':'admin'}
user_config = ChainMap({}, default_config)
user_config

ChainMap({}, {'channel': 5, 'volume': 55, 'user_name': 'admin'})

In [36]:
user_config['volume'] = 70
user_config['user_name'] = 'Giovanni'
user_config

ChainMap({'volume': 70, 'user_name': 'Giovanni'}, {'channel': 5, 'volume': 55, 'user_name': 'admin'})

## UserDict

Sometimes we may want to have a dictionary with specific usecase restrictions, like accepting only certaing keys, limiting the values to a certain range etc.. 

We could implement our own class, specifying the `__getitem__` and `__setitem__` methods but we won't inherit the whole functionality of dicts. Moereover, even if we subclass the `dict` class, some of its special methods, like `.get` are called from the higly optimize C library and therefore there is no guarantee that our redefined method will be used (e.g. if we redefine the `__len__` method and we call `len(string)` python won't use `__len__`, instead, under the hood it will find the reference to the C-wrtitten function optimizew to find the length of an array.

This is why the `collections.UserDict` was implemented, which is not a subclass of dict but use dictionary as a backing data stucture (e.g. items(), keys(), values()). Here we are sure that our specilized method in the custom dict class should work as expected.

Let's see an example where we build a class that stores RGB colors, therefore the keys can only be 'R' or 'G' or 'B' and the values must span between 0 and 256.

In [52]:
from collections import UserDict

class LimitedDict(UserDict):
    def __init__(self, keyset, min_value, max_value, *args, **kwargs):
        self._keyset = keyset
        self._min_value = min_value
        self._max_value = max_value
        # for the rest use the standar implementation of UserDict
        super().__init__(*args, **kwargs)
        
    def __setitem__ (self, key, value):
        if key not in self._keyset:
            raise KeyError('Invalid Key name.')
        if not isinstance(value, int):
            raise ValueError('Value must be integer type')
        if value < self._min_value or value > self._max_value:
            raise ValueError(f'Values must be between {self._min_value} and {self._max_value}')
        # once the exception are stated, leave to the super class the duty to insert the element
        super().__setitem__(key, value)

rgb = LimitedDict(set('RGB'), 0, 255, R=0, G=0, B=0)
rgb

{'R': 0, 'G': 0, 'B': 0}

In [53]:
rgb['Y']

KeyError: 'Y'

In [54]:
rgb['R'] = 300

ValueError: Values must be between 0 and 255

## MappingProxy

Is probably the fastes way to create an immutable and persistent view of a dictionary which will reflects any update on the source dict. It is part of the `types` module.

In [55]:
from types import MappingProxyType

d = {'a':1, 'b':10}
mp = MappingProxyType(d)
mp

mappingproxy({'a': 1, 'b': 10})

In [56]:
mp['a'] = 'cant reassing'

TypeError: 'mappingproxy' object does not support item assignment

In [57]:
mp['c'] = 'cant insert'

TypeError: 'mappingproxy' object does not support item assignment

In [58]:
del mp['a'] # cant delete

TypeError: 'mappingproxy' object does not support item deletion

In [59]:
d['c'] = 'new_key'
mp

mappingproxy({'a': 1, 'b': 10, 'c': 'new_key'})

back to [TOC](#TOC)

# Sets
---

Matematically speaking, a set is a gathering (unordered ensamble) of unique elements

## Basic Set Theory

Given two sets `s1` and `s2`:

* **Union**: the elements contained in s1 `or` s2 but not repeated (or in python is `s1 | s2` but we also have the method `s1.union(s2, ...)`)

* **Interesection**: the elements that are both in s1 `and` s2 (and in python is `s1 & s2` but we also have the method `s1.intersect(s2, ...`)

* **Difference**: the elements that are in s1 `and not` in s2 (in python we use the `-` sign since it is overloaded or the  method `s1.difference(s2, ...)`); difference is not commutative. Two sets are **disjoint** there are no elements in their intersection (`len(s1 & s2) == 0` or `s1.isdisjoint(s2)`)

* **Symmetric Difference**: the union - the intersection, i.e. what is in s1 and in s2 but not in both (in python we use `s1 ^ s2` or the method `s1.simmetric_difference(s2)`.

* The **Cardinality** of a set is the number of elements it contains; the empty set contains no elements, thus its cardinality is 0. To create an empty sets in python we need to use the `set()` function because empty curly brackets `{}` will create an empty dictionary.

* s1 is a **subset** of s2 it means that all the elements in s1 are contained in s2 (in python `s1 <= s2` or `s1.issubset(s2)`. we can also say that s2 is a **superset** of s1 (in python `s2 <= s1` or `s2.issuperset(s1)`

All the above-mentioned operation on set in python will mutate the original set object (meaning that a new object is created. However, there is the possibility to update the sets (same as for lists, adding the `=` sign after each operation (e.g. `&=` or `|=` etc..) or by using the methods with the `_update` ending (e.g. `s1.intersection_update(s2)`.

## Python implementation of sets
Python has the datatype `set` implemented (once was very close to dictionaries implementation) based on hashmaps that only contain keys; as a metter of fact, set's elements must be hashable and distinct (in the sense that they don't compare equal with `==`); no order is guaranteed among the elements (while from 3.6 dictionaries new implementation retain insertion order).

A set is a mutable object, since we can add and remove elements, and therefore it is not hashable; for the same reason it can't be the key of a dictionaries and cannot contains another set.

Since sets are essentially hash tables, the memebership testing is very efficient (`in` or `not in`) in particular compared to a list, where we need to scan all the elements. The tradeof is that sets have an higher storage cost due to the precaution that hash tables need to have to avoid collisions as much as possible.

## Sets creation

There are few ways to create sets:

* using a literal expression:

In [61]:
s1 = {'a', 1, 44}
s1

{1, 44, 'a'}

* using the set method passing an iterable

In [62]:
s1 = set('abc')
s1

{'a', 'b', 'c'}

it is also the only way to create an empty set

In [63]:
s1 = set()
s1

set()

* set comprehension:

In [64]:
s1 = {i for i in 'python'}
s1

{'h', 'n', 'o', 'p', 't', 'y'}

Sets can also be unpacked with `*` but the order of unpacking depends on the hash table, therefore, it won't probably be what we expect:

In [65]:
l = [*s1]
l

['p', 't', 'o', 'h', 'y', 'n']

Still, we can use unpacking to create the union of two sets:

In [66]:
s1 = set('abcd')
s2 = set('cdef')
s3 = {*s1, *s2}
s3

{'a', 'b', 'c', 'd', 'e', 'f'}

## Common operation in sets

Following, a list of the most common operations on sets:

* adding an elements:

In [67]:
s3.add('g')
s3

{'a', 'b', 'c', 'd', 'e', 'f', 'g'}

* removing elements

In [68]:
s3.remove('a')
s3

{'b', 'c', 'd', 'e', 'f', 'g'}

however, if we try to remove an element that doesn't existi we get a `KeyException`; this can be avoided using the `discard` method.

In [69]:
s3.discard('a')

To remove elements we could also use the `pop` method, but the element popped will be pseudo-random, and we will get a `keyError` exception if the set is empty.

In [70]:
s3.pop()

'e'

Instead, if we want to empty the set we have the `clear` method:

In [71]:
s3.clear()
s3

set()

To copy a set, as for any iterable, we have methods for shallow copies like:

In [72]:
s2 = s1.copy()
s2 = set(s1)
s2 = {*s1}

or, for a deep copy, we have the `copy.deepcopy` function which takes care of all the check and recursions required.

## Frozen Sets

Frozen sets are the immutable version of a set; as a matter of fact they are hashable and, as a consequence, can be used as keys in dictionaries and can contain other frozen sets.

To create a frozenset we simply need to call the corresponding method `frozenset`.

We can carried out sets operation between sets and frozensets, and the resulttant type will dipend on the first operand.

back to [TOC](#TOC)

# Serialization and Deserialization
---

Serialization is a procedure useful to store and trasmit data both during and after the program execution. A classical example is a REST API where data are trasmitted in JSON format and therefore have to be serialized in that format and eventualy deserialized for performing operations on the data. In principles, we can serialize any kind of object and this mainly depend by the serialization mechanism we will adopt. Python for example has a specific serialization mechanism called `pickle` which use a binary representaion (non-human readable). Other example are databases (sql and NOsql) or JSON which are essentially text representation and therefore more limited in the datatypes they can serialize/deserialize, but with the advantage of being human readable.


## Pickle
Pickle is a Python specific serialization/deserialization (`marshalling`) tool which creare and load objects representation in binary format. 

A side effect of pickling/unpickling is that we loose the objects id, therefore the same object after a pickle will still compare equal `==` but not identical `is`.

Some objects may have some property that cannot be natively serialized by pickle, like open file hadle; to solve this problem we need to create a custom class that hold that unserializable operation and code ourself the `__reduce__` method that will essentially tell pickle how to treat that unknown property/object.

N.B. when deserializing, pickle can actually execute code, therefore it is important to unpickle only data that we trust.

## JSON

JSON stands for **J**ava**S**cript **O**bject **N**otation and it is a text-based object seriablization, thus human readable. It is the standard for API and web/systems communication in general.

Being string based, it has limitation of the data types it can serialize:

* strings: **Only** with double quotes -> "python"
* numbers: without any distinction on integer, exponentials floats etc.. (all are floats even if many JSON deserializer have the ability to recognize more sepcific datatypes)
* booleans: true, false
* arrays: list essentially, the order matters!
* dictionaries: the keys must be string and the values can be any supported datatype (unordered)
* empty value: `null`


The fact that JSON format is a string with limited datatypes and we may want to serialize different type of objects can create problems. How can we serialize a python set? Python dictionaries' keys only require to be hashable, not to be string, is this incompatibel with the JSON dictionary? To solve this problem we will need to create our `custom encoding rules`, and can get complicate. This is way we usually fall back in third-party libraries that have already sorted out this problem, like `marshmallow`.

To work with JSON in python we need to import the `json` module and then use the `dump` and `load` methods for serialize and deserialize (or `loads`, and `dumps` if we want to work with string representation).

In [96]:
import json
from pprint import pprint

json_data = '''
{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}

'''

In [98]:
json_py = json.loads(json_data)
pprint(json_py)

{'menu': {'id': 'file',
          'popup': {'menuitem': [{'onclick': 'CreateNewDoc()', 'value': 'New'},
                                 {'onclick': 'OpenDoc()', 'value': 'Open'},
                                 {'onclick': 'CloseDoc()', 'value': 'Close'}]},
          'value': 'File'}}


## Custom Encoding
So what if we want to JSON-encode something that is not JSON supported, let's say an instance of a custom class? Well, essentially we need to create our custom encoding function. The `dump` method has an option called `default` which takes a single-argument-callable as argument and gets called any time python encounter any type that cannot be serialized to JSON. 

Lets see an example: imagine we want ot serialize a date generated by the datetime module, an object that cannot be serialized in JSON.

In [141]:
import json
from datetime import datetime

now = datetime.utcnow()
now

datetime.datetime(2022, 6, 9, 14, 31, 10, 833635)

In [142]:
try:
    test = json.dumps(now)
except TypeError as exc:
    print(f'{exc.__class__.__name__} : {exc}')

TypeError : Object of type datetime is not JSON serializable


therefore, we need to create our custom encoder for datetime format `YYYY-MM-DDTHH:MM:SS` year-month-day (T is a spacer for Time) hours\:minutes\:seconds.

This is the format that we actually get if we call the string representation of datetime

In [143]:
str(now)

'2022-06-09 14:31:10.833635'

So we are gonna define a single argument function that conver our datetime object and pass it as default to the json encoder

In [144]:
def format_iso(dt: datetime):
    return dt.strftime('%Y-%m-%dT%H:%M:%S')

In [145]:
test = json.dumps(now, default=format_iso)
print(test)

"2022-06-09T14:31:10"


But what if we need to consider more than one custom encoder at the time? Then we have two alternative, define a function that check the instance of the argument passed and collect the proper custom encoding or, more efficienlty, we may use the [`functools.singledispatch`](#Single-Dispatch-Generic-Functions) decorator. 

Beyond the `default` argument, there are others tweaks we can use with the `dump` method:
* `skipkeys`: default to False; if True will skip from encoding dictionary keys that otherwise will trown a TypeError
* `indent`: specify the idnentation for imporve human readability
* `separators`: default is `(', '; ': ')` -> comma with space, semicolon with space, colon with space. All this to imporve human readability, but for example we may want to remove the spaces to save space in the transmission of the json.
* `sort_keys`: default is False, if True sort alphanumerically the keys of our dictionary.
* `cls`: itlet us specify our custom `JSONEncoder`

### JSONEncoder Class
Python uses an instance of the `JSONEncoder` class, inside the `json` module, to serialize data. It shares basically all the arguments with the `dump` method but it become usefull as sort of  context manager. Imagine that in our file we have many dumping command and some times later we need to change something in our encoding strategy, let's say the indentation for whetever reason. In this case we would need to modify each j.son.dump entry. The smart approach instead is to always specify the `cls` argument, to point to a variable that hold the JSONEncoder (even the default one); in this way, if in a second time we need to change the encoding strategy, we only need tpo modify in one place, i.e. creating our custom JSONEncoder.

To create a custom JSONEncoder we just need to subclass it, modify the arguments we need to and falls back on the default class for the others. Lets see an example where we use the default JSONEncoder as a variable:

In [146]:
import json
default_encoder = json.JSONEncoder
json.dumps((1,2,3), cls=default_encoder)

'[1, 2, 3]'

Now lets create a custom encoder:

In [147]:
from datetime import datetime

class CustomEncoder(json.JSONEncoder):
    # args and kwargs are needed to silently pass all the argument from the parent class
    # also those we are not using, otherwise python will complain
    def __init__(self, *args, **kwargs): 
        # redefine our custom arguments
        super().__init__(skipkeys=True,
                         allow_nan=False,
                         indent='---',
                         separators=('', ' = '))
        
    # define the custom encoding rules
    def default(self, arg):
        if isinstance(arg, datetime):
            return arg.isoformat()
        else:
            # delegate back to parent class
            return super().default(arg)        

In [148]:
d = {
    'time': datetime.utcnow(),
     1+1j: 'complex will be skipped by skikeys=True',
     'name': 'Python'
    }

print(json.dumps(d, cls=CustomEncoder))

{
---"time" = "2022-06-09T14:31:14.307764"
---"name" = "Python"
}


## Custom Decoding
Decoding, or deserialization, is performed with the `json.load` method and it works out-of-the-box with standas JSON datatypes but won't work with specialized object, like datetime for example, for which it will retunr only a string. However, if we want to retain particular object while decoding we need do work around the json limitations.

One approach is to encode using a scheme that define not only the value but also its type, in order to have that information ready when deconding. We could encode a single object as a dictionary that contains two keys, one for the value and one for the object type. When deconding to a dictionary, then we can iterate over and replace the the two key-value pairs with the exact object. But this is a tedious approach and difficult to carry out recursively.

A semplification of this approach is using the `object_hook` argument in the `load` method; this take a function that will be called each time the decoder encounter a dictionary, from the most inner to the outer (root), handling the dicionray and then returning a substitue value/object. In this way the recursion is hadled for us.

An alternative is to use `object-pairs_hook` (is an alternative to *object_hook*, we cannot specify both the arguments); the difference is that instead of passing the deserialized dictionary (for whom the keys order is not guaranteed) it pass a list o tuples that contains the key-value paris, and since list are guaranteed to retin order, so it will be for the keys in the tuples.

There are other usefull arguments of the `load` method that are supposed to take care of specific deconding; these can be used together with *object_hook* but will be execute first since the hook return an already parsed object.
* parse_float
* parse_int
* parse constant

## JSONDecoder Class
Like for the encoder, we have a similar class for decoding that we can customize and use as `cls` arugment in the `load` function. However, while for encoding we overrided the `default` function only for the specific type we wanted to custom-encode while leaving to the default class the rest, in decoding we receive as argument the JSON string as a whole; therefore, we need to fully parse the text and return the object we want.. much more work to do! A samrt way to approach this is to write our custom decoding class and inside it use the default `json.loads` as a first step to return al least the basic python objects (list, dictionaries etc..) and then work on this result leveragin python methods.

## JSON Schema
https://json-schema.org/

Aside from third party libraries, specialized in deconding/endcoding JSONs, the best way to approach a decoding is to define a schema, i.e. a pattern that will be consistent with the data we are expecting to receive (e.g. an API call). 

A JSON schema is essentially a dictionary that describe each component of the data we are expecting to decode. Let's see an example; imagine we are receiving from an API call a series of information that describe a person and that we want to decod it in a python object with some properties, and the following is the expected template:

```json
{
    "firstName": "...",
    "lastName":"...",
    "age":"..."
}
```

Given this expected incoming data format we can create our dictionary schema:

In [7]:
person_schema = {
    "type":"object",
    "properties":{
        "firstName":{"type":"string"},
        "lastName":{"type":"string"},
        "age":{"type":"number"},
    }
}

But we are still very generic in our schema: what if an empty string is passed as a name? what if a negative number is passed as age? Well we can specofy even further the schema requirements:

In [24]:
person_schema = {
    "type":"object",
    "properties":{
        "firstName":{"type":"string",
                    "minLength":1
        },
        "lastName":{"type":"string",
                   "minLength":1
        },
        "age":{"type":"integer",
              "minimum":0
        }
    },
    "required": ["firstName", "lastName"]
}

And there are many more properties, limitation and specifications that can be added to our schema, depending on the usecase.

Ok, now that we have our JSON schema setup, we need to use it! To do this we levarage the module `jsonschema`. Given a dummy API call that result in a person, we will try to validate it with our schema.

In [23]:
 '''{
    "firstName": "Giovanni",
    "lastName":"Frison",
    "age": 32
}'''


p2 =  '''{
    "firstName": "Giovanni",
    "lastName":"Frison",
    "age": "Unknown"
}'''

p3 =  '''{
    "firstName": "Giovanni",
    "age": 32
}'''

In [19]:
from jsonschema import validate
from jsonschema.exceptions import ValidationError
from json import loads, dumps, JSONDecodeError

def my_json_schema(json_doc):
    try:
        '''
        validate() wants a dictionary, therefore we need to first
        '''
        validate(loads(json_doc), person_schema)
        '''
        We need to catch some possible error; first if the file is not a proper json file,
        second if the file is not compliant with our schema
        '''
    except JSONDecodeError as ex:
        print(f'Invalid JSON: {ex}')
    except ValidationError as ex:
        print(f'Validation error: {ex}')
    else:
        print(json_doc)
        print('JSON is a valid and compliant schema')
    

In [20]:
my_json_schema(p1)

{
    "firstName": "Giovanni",
    "lastName":"Frison",
    "age": 32
}
JSON is a valid and compliant schema


In [21]:
my_json_schema(p2)

Validation error: 'Unknown' is not of type 'integer'

Failed validating 'type' in schema['properties']['age']:
    {'minimum': 0, 'type': 'integer'}

On instance['age']:
    'Unknown'


In [22]:
my_json_schema(p3)

Validation error: 'lastName' is a required property

Failed validating 'required' in schema:
    {'properties': {'age': {'minimum': 0, 'type': 'integer'},
                    'firstName': {'minLength': 1, 'type': 'string'},
                    'lastName': {'minLength': 1, 'type': 'string'}},
     'required': ['firstName', 'lastName'],
     'type': 'object'}

On instance:
    {'age': 32, 'firstName': 'Giovanni'}


So far so good, but with this approach, as soon as an exception is raised the decoding stops, hence we are able to see only the first error and we might need to reiterate many time untill we have a proper schema to fit our data. To solve this, in the jsonschema library there is the module `Draft4Validator` that assist us inspecting all the errors that occur at once:

In [26]:
p4 =  '''{
    "firstName": "Giovanni",
    "age": "32"
}'''

from jsonschema import Draft4Validator

validator = Draft4Validator(person_schema)

for error in validator.iter_errors(loads(p4)):
    print(error, end= '\n------------------\n')

'32' is not of type 'integer'

Failed validating 'type' in schema['properties']['age']:
    {'minimum': 0, 'type': 'integer'}

On instance['age']:
    '32'
------------------
'lastName' is a required property

Failed validating 'required' in schema:
    {'properties': {'age': {'minimum': 0, 'type': 'integer'},
                    'firstName': {'minLength': 1, 'type': 'string'},
                    'lastName': {'minLength': 1, 'type': 'string'}},
     'required': ['firstName', 'lastName'],
     'type': 'object'}

On instance:
    {'age': '32', 'firstName': 'Giovanni'}
------------------


## Marshmallow

https://marshmallow.readthedocs.io/en/stable/

Marshmallow is a thrid party library devoted to encoding and deconding (not only JSONs). It does a ton aof thinks, but essentially it has a lot of builtin utility to built specific schema clas to handle JSONs.

**Re-watch lecture 57 of deep dive part 3 for more info**

## PyYaml

https://pyyaml.org/wiki/PyYAMLDocumentation

Library to work with .yml files. Be carefull, it uses pickling therefore work only with files you knwo the source of.

**Re-watch lecture 58 of deep dive part 3 for more info**

## Serpy

https://serpy.readthedocs.io/en/latest/

doeas only the serialization but mush faster than marshmallow

**Re-watch lecture 59 of deep dive part 3 for more info**

back to [TOC](#TOC)

# Unpacking iterables
Iterables are `packed` structures that bundle values together (list, tuple, strings, set, dictionary..). As per the world meaning, `unpacking` is an operation that assigned the packed values to variables:

In [None]:
a, b, c = [1,2,3]

a,b,c

In [None]:
a, b, c, d = 'ciao'

a, b, c, d
# N.B. we can unpack in the same way a dictionary or a set but in that case the order of assignment will be casual because these are unordered types of objects.


It comes handy when we want to swap values between variables:


In [None]:
a = 10
b = 20

a, b = b, a

a, b
# this works in python because the RHS is evaluated first, where the memory address of "b" and "a" is copied in a tuple and only after assigned to the new swap) variables "a" and "b".

## Unpacking with *
We may want to unpack an iterable in more than one variable, and in this case it comes handy the `*` operator:

In [None]:
l = [1,2,3,4]
# we want to unpack the first element of l and the others apart
#we could do it simply with list slicing and unpacking the
a, b = l[0], l[1:]

# or in a more elegant way with the * operator
a, *b = l # a=1, b =[2,3,4] 

a, *b, c = l # a=1, b=[2,3], c=4

a, b, c

In [None]:
a, *b, *c = l # ERROR we can unpack with only one *, otherwise python won't understand who assign to whom

Another advantage of the `*` operator is that can be used also with objects that don't support slicing (like sets or dictionaries, since they have no ordering). N.B. if more than one element is unpacked with *, it will always end up in a list (even if, for example, the item unpacked is a tuple).

The `*` operator can be used also for unpacking objects on the RHS:

In [None]:
l1 = [1,2]
l2 = [3,4]
l = [*l1, *l2] # l=[1,2,3,4]

l

With dictionaries we have both keys and values that can be unpacked (unordered unpacking since there is no order!). With the `*` operator we unpack the keys of the dict only, while with the `**` we can unpack both keys and values (N.B. `**` can be used only on the RHS).

In [None]:
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
d = {*d1, *d2} # d={'a','b','c'} -> n.b. b was not repeated because keys are unique in sets and dictionaries.

d

In [None]:
d = {**d1, **d2} # d={'a': 1, 'b': 3, 'c': 4} -> n.b. b has the values contained in d2 since it was unpacked for second and overwrite the keys from d1.

d

## Nested unpacking
We can unpack also nested structures, such as list of lists, with the same operators.

In [None]:
a, *b, (c, *d) = [1, 2, 3, 'python']

a, b, c, d

back to [TOC](#TOC)

# Loops
---

## While loop
to generate an infinite loop:

In [None]:
while True:
    print('Infinite Loop')
    break # to stop infinite loop

`else` statement is executed after a while loop only if it terminates without a `break`

`continue` is used to interrupt the execution of the current iteration are restart the loop with the next iteration. Only `finally` statement is executed after a continue statement.

## Try statement
test a code block.

`except` is used to captures errors and handle exceptions

`finally` is a code block that is always executed, whether an exception or a break are invoked

### Common exceptions

- ZeroDivisionError

-

back to [TOC](#TOC)

# Functions
---


A first semantic difference in functions is the definition of `parameters` and `arguments`; the first is referred to the variables in the function definition while the second is refereed to the variables passed to the instance of the function.


In [None]:
def my_func(a, b): # a and b are the parameters of the function
  pass

x = 10
y = 'a'

my_func(x, y) # x and y are the arguments of the function

my_func

To be noted that x and y are passed by `reference` to `my_func`, i.e. the memory addresses of x and y are stored into a and b.
Therefore, the `Function Scope` contains the memory addresses of the variables that are passed to the function (x and y in the example).

Another pythonic difference is the definition of `functions` and `methods`. They are defined in the same way but a method is bound to a class, it is an attribute of the class that is callable.

In [None]:
from inspect import ismethod, isfunction

ismethod, isfunction

## Docstrings and annotations (PEP 257)
N.B. Docstring has to the first line of code in the function/class definition, otherwise it won't be inserted into the `__doc__` method and won't be displayed with the `help()` function.

Docstrings (single quote or triple quote) are the way to generate documentation inside the python code. They are different from comments (#) since the former are actually compiled by the interpreter and stored in the `__doc__` property of functions and classes. The `__doc__` property can be invoked with the `help()` function on any object that implements it.

In [None]:
def my_func():
  '''
  Here it goes the doctring that contains
  the instruction on function usage and arguments types and boundaries.
  This will be displayed invocking the `__doc__` method
  with the `help()` function.
  '''
  pass

help(my_func)

Another way to document our code is to use annotations. These are not stored in the `__doc__` method but can be invoked by the `help()` function. Annotations can be also functions that are evaluated as constant during first compilation; however they don't bind the code to a specific behavior (a: int -> doesn't bind a to be an int), they are only metadata stored in teh `__annotations__` method which is a dictionary with parameters as key and annotations as values. These can be used by external modules like `Sphinx` to automatically generate documentation for our code.

In [None]:
def my_func(a: 'string', b: 'integer') -> 'a string':
  return a*b
  
my_func.__annotations__

## `lambda` expression
`lambda` expressions are another way to create function without the `def` statement. They are also referred as to `anonymous functions`. It has to be a single expression, therefore no assignment is allowed aswell as no type hinting (annotations)

In [None]:
# lambda [parameters list]: expression
lambda x: x**2
lambda x, y: x + y
lambda : 'hello' # we can assign 0 parameters and just return a constant

# we can assign lambda function to variable and later call it
my_func = lambda x: x**2

type(my_func), my_func(3)

The lambda expression generates a `function object` that returns the expression when called.

## Function Intorspection
Function are first-class objects that, when created are shipped with a series of default dunder methods in additions to the ones that we implemented. To look at all the function attributes we can use the built-in function `dir()`. Among the dunders method we have:

In [None]:
# this will be shown with the module
# inspect.getcomments()
def func(a, b, c='hello'):
  pass

introspection = {
'name' : func.__name__, # the name of the function
'default args' : func.__defaults__, # tuple containing default positional parameters
'deafult kwargs' : func.__kwdefaults__, # dictionary containing default keyword parameters
'code object' : func.__code__, # return a code object which has its own methods:
'variables names' : func.__code__.co_varnames, # return the paramters and then the local variables (defined inside the function scope) of the function
'num of arguments' : func.__code__.co_argcount, # return the number of parameters except *args and **kwargs
}

for key, arg in introspection.items():
  print(f'{key} : {arg}')

Also the module `inspect` can be used to retrieve information about the function:

In [None]:
import inspect

inspect.getcomments(func) # returns the comment just above the function definition

## \*args and **kwargs
Unpacking can be done also in functions parameters in order to specify a variable number of arguments as input:

In [None]:
def my_func(a, b, *args): # N.B. the name 'args' is just a conventions
  return a, b, args

a, b, c = my_func(1,2,3,4) # a=1, b=2, c=(3,4)
# note that inside function scope, arguments are unpacked into tuples e not lists

a, b, c

The positional argument constructor `*args` has to be the last positional argument in the function since it exhaust all the non-assigned positional arguments; after that only keyword arguments are allowed and these can be unpacked with the `**kwargs` parameter. The `*` and `**` operators can be used to limit the use of positional or keyword arguments.

In [None]:
def my_func(*, name):
  # in this way my_func doesn't allow positional arguments.
  # name is automatically a keyword argument.
  pass

def my_func(a, *, name):
  # in this way my_func allow only one positional argument `a` and one keyword argument `name'.
  # since 'name' is placed after * it means it is a keyword argument
  pass


def my_func(*, name, **kwarg): # OK
  pass


In [None]:
def my_func(*, **kwarg): # ERROR an explicit keyword argument is required after the `*`
  pass


## Parameters default
Care must be taken when assigning default values to functions arguments, in particular if these are mutable objects. Wehn Python compile the script it stores in memory the function definition and any argument with default values. This means that each time that function is called, if the default parameters is left unchanged, it will use the specified values. In some case it may results in unwanted behaviors:

In [None]:
# creating a function that store a message in a log file with the datetime
from datetime import datetime
import time

def log(msg, *, dt=datetime.utcnow()):
  print(f'{dt}: {msg}')

# Now, since the value of dt is stored at runtime, each time we call
log('first log')
time.sleep(3)
log('first log')
# we will see that the time printed is alway the same since it has been stored at compilation time as a CONSTANT!

In [None]:
# SOLUTION
def log(msg, *, dt=None):
  dt = dt or datetime.utcnow() # if dt is false (None) the 'or' statement is executed
  print(f'{dt}: {msg}')
# we can set dt=None and check if the user actually input a values for dt. If not the function will call datetime.utcnow().
# Since this call is performed in the function scope, it gets executed each time the function is called. 

log('first log')
time.sleep(3)
log('first log')

Another example is when we create a mutable object directly as argument of a function. Also in this case, that object is evaluated as a constant at compilation time and reused as reference each time the function is called. This

In [None]:
# create a function that store values into a list
def add_item(item, func_list=[]):
  func_list.append(item)
  return func_list

my_list1 = add_item('banana') # a list is created that references to `func_list`
# so if i create another list of items
my_list2 = add_item('coca')
# now we have:
# my_list1 = ['banana', 'coca']
# my_list2 = ['banana', 'coca']
# because they are both referencing to func_list!

my_list1, my_list2

In [None]:
# SOLUTION
def add_item(item, func_list=None):
  func_list = func_list or list() # short-circuit
  func_list.append(item)
  return func_list

my_list1 = add_item('banana')
my_list2 = add_item('coca')

my_list1, my_list2

`KEY TAKE-AWAY`: Never use mutable objects as `default` arguments. Instead use None and create the object in the function scope. The only time it can come in handy is when using `memoization` to cache values from a function that is executed multiple times.

## Map, Filter and Zip functions
N.B. Map anf Filter have been mostly replaced by list comprehension and generator functions.

These are `higher order functions` (i.e. function that takes a function as parameter and/or returns a function), `map` return an `iterator` that applies the function to each element in the iterable.


In [None]:
# map(func, *iterables)

def sq(x):
  return x**2

l = [1, 2, 3]

list(map(sq, l)), list(map(lambda x: x**2, l))
# map returns a generator, therefore we need to pass it to list()

The number of iterables that are provided to map() is determined by the function that it is passed; if more than one iterable is provided, only the length of the shortest will be mapped. `filter` is a function that takes a function and a single iterable and returns the elements of the iterable that satisfies the condition given in the function, i.e. it filters the iterable.

In [None]:
l = [0,1,2,3,4,5]

list(filter(lambda n: n % 2 == 0, l)) # [0, 2, 4]

## Reducing Functions
Also called, accumulators, aggregators or folding functions; are functions that recombine an iterable recursively, returning a single value (es. finding the max in an array, or summing up its elements).

In [None]:
# write a reducing function to compute max, min and sum of an iterable.
l = [5, 8, 6, 10, 9]

add = lambda a, b: a+b
find_max = lambda a, b: a if a > b else b
find_min = lambda a, b: a if a < b else b

def _reduce(fn, sequence):
  result = sequence[0]
  for x in sequence[1:]:
    result = fn(result, x)
  return result

_reduce(find_max, l), _reduce(find_min, l)

The function `_reduce` takes two arguments, a function (add, find_max or find_min), and sequence of numbers. It applies the function recursively a return a single value (the sum, the max or the min) depending on the function passed.
Python has a builtin modules that contain the function `reduce` similar to the one defined above, but that works on any iterables, also non index ones.

In [None]:
from functools import reduce

l = [5, 8, 6, 10, 9]

reduce(find_max, l), reduce(lambda a, b: a if a > b else b, l) # find max of l

#Reduce has a third argument called 'initializer' that serve as first value for the reduced function. This is to avoid runtime error in the case, for example of trying to apply sum-reduce to an empty list.


Other builtin reducing function in python are:

In [None]:
red_func = {
'max': max(l),
'min': min(l),
'sum': sum(l),
'any': any(l), # return True if at least one element in the sequence evaluates to True
'all': all(l), # return True if all the elements in the sequence evaluates to True
} 

red_func

## Partial functions
Partial functions are a way to reduce the number of argument required by a function, setting some of them as default. We could write ourself a wrapper to a function ore use the builtin `functools.partial` module.


In [None]:
# create a function that compute the power of a number

def pow(base, exponent):
  return base ** exponent

# we can create a partial function that specify the exponent so that it computes alway the square
def square(base):
  return pow(base, exponent=2)

# or we can use the functools.partial module
from functools import partial

square = partial(pow, exponent=2)

'''
N.B. if we define a variable prior to the partial definition, and we assign that variable as argument, the argument won't point to the variable but to the values associated with its memory address. therefore even if we change the variable after, the value at which the partial function is pointing will remain the same
'''

a = 2
square = partial(pow, exponent=a)
# exponent is pointing at the same memory address of 'a' (the value 2) and not to a itself
a = 5

square(5) # the value assigned in the square function remain the same (2)

## The `operator` module
The `operator` module is a builtin suits shipped with standard python installation. Its main purpose is to construct functional equivalents to arithmetic operation.

In [None]:
from functools import reduce
from operator import mul

# we have seen how we can use lambda expression together with reduce to create recursive functions on sequences
reduce(lambda x, y: x*y, [1,2,3,4]) # return the product of the elements of the list
# the same can be achieved with the operator.mul
multiplication = reduce(mul, [1,2,3,4])

multiplication

There is a variety of different methods in the operator module, for arithmetic/boolean operations (`mul()`, `add()`, `le()`, `is_()` ..), for sequences handling (`getitem()`, `setitem()`, `delitem()` ...) and for handling functions (`itemgetter()`, `attrgetter()`, `methodgetter()`...). These last ones don't return values but instances of the method called; they become an operator thyself (a function essentially) to be call on another objects.

In [None]:
from operator import itemgetter
l = [5, 8, 6, 10, 9]
f = itemgetter(1, 3) # create a function that return the item at index 1 and 3

f(l) # -> (8, 10)

back to [TOC](#TOC)

# Classes
---
We have seen as everything in python is a object and Classes are the quintessence of object! We can think of classes like containers that stores data (state and attributes) and functionality (behavior and methods). We can thing of a class as a blueprint of an object, and the object that will be created from that class are referred as `instance` of that class; as a consequence there is a huge difference when talking about the state and the behavior of a class or of an instance of that class (actually the creation of an instance is a behavior of the class!). 

Finally, since class are object that generates objects, how are class created in the first place? the answer is in the `metaclass` type, an advance concept (metaprogramming) that we will investigate later.

## Attributes of classes

Classes, once created, are shipped with a series of default attributes that python generates for us, like `__name__` or `__doc__`; however we can easily add properties to the a class simply declaring variables inside its definition. We can easily access these variable with the dot notation or with the `getattr` function to which we can specify a default value in case we request an attribute that soesn't exist (otherwise we would incurr in an `AttributeError`)

In [12]:
class MyClass:
    language = 'Python'
    version = 3.6

MyClass.language, getattr(MyClass, 'version', 'Not an attribute'), getattr(MyClass, 'name', 'Not an attribute')

('Python', 3.6, 'Not an attribute')

Similarly we can set an attribute to the class with the dot notation or with the `settattr` function; if the attribute already existing we are goin to mutate the state of the class, otherwise we are going to add a new one (ofc, python is a dynamic language!).

To delete class attributes we can simply use the `del` or the `delattr` methods (with the same exception handling problem of getattr if an attribute doesn't exist).

In [14]:
MyClass.inventor = 'G.V.Rossum'
setattr(MyClass, 'version', 3.7)

MyClass.inventor, MyClass.version

('G.V.Rossum', 3.7)

The state of a class is stored inside a dictionary, under the builtin property `__dict__` which return a [mappingproxy](#MappingProxy) (forcing attributes to be strings essentially), essentially an immutable view of key-value pairs that represents attributes and their stored value.

In [15]:
MyClass.__dict__

mappingproxy({'__module__': '__main__',
              'language': 'Python',
              'version': 3.7,
              '__dict__': <attribute '__dict__' of 'MyClass' objects>,
              '__weakref__': <attribute '__weakref__' of 'MyClass' objects>,
              '__doc__': None,
              'inventor': 'G.V.Rossum'})

Attributes can be of any type, for example a function

In [20]:
def say_hello():
    print('hello from PyFry!')
    
setattr(MyClass, 'say_hello', say_hello)

MyClass.say_hello()

hello from PyFry!


So, as we have said, class are callables, meaning that python automatically creates the `__call__` method for us when creating a class. We a class is called, it return an object of the type of the class itself that is called an `instance of that class`. The instance of a class is an object itself and as such it has its own namespace and own `__dict__`.

To recap, classes are callable (of type() == `type`) that returns an instance of themself which has the type corresponding to the class from which it is generated.

In [30]:
myObj = MyClass()
# m is a instance of the class MyClass  

type(MyClass), type(myObj), isinstance(myObj, MyClass)

(type, __main__.MyClass, True)

In [31]:
myObj.__dict__, myObj.language

({}, 'Python')

Looking at the instance dict we notice something strange: the dict itself is not a mappingproxy (but a simple dict that can be direclty manipulated) and it is empty, however we are able to retrieve the attribute *language* from the parent class. This is because python first look at the instance namespace, and if it finds the attribute it returns it, if not look one level above, i.e. in the namespace of the parent class.

If we try to set an attribute to the instance instead, we are going to populate the instance dict an not the class one, i.e. creating an `instance attribute`.

In [32]:
myObj.language = 'Java'

myObj.__dict__, myObj.language

({'language': 'Java'}, 'Java')

as expected, python retrieves the instance attributes since it looks first at the instance namespace. However, if we create a new instanc eof the class, its dictionary would be empty since it is a different object not related to the other instances of the same class.

## Functions as instance attribute

We have seen that we can set as class attribute also a function, however, unlike other data-like attributes, function are not passed directly to instances of that class. Instead something else is passed to the instance, i.e. a `bound method` to the parent class and a reference to a specific id of the instance itself. But lets see this in action:

In [27]:
MyClass.say_hello # a simple function, an attribute of the class MyClass

<function __main__.say_hello()>

In [52]:
myObj.say_hello, hex(id(myObj)) # a method bounded to the instance myObj of the class MyClass

(<bound method MyClass.say_hello of <__main__.MyClass object at 0x000001EE58F3F0D0>>,
 '0x1ee58f3f0d0')

In [53]:
type(MyClass.say_hello), type(myObj.say_hello)

(function, method)

Even a more unexpect behavior happen when trying to call the function! If we call it on the class, where it is defined, than we have seen that the function is executed correclty; If we try to execute it on the instance we get a `TypeError`:

In [65]:
myObj.say_hello()

TypeError: say_hello() missing 1 required positional argument: 'name'

Python says that the function takes no argument, and this is true in its definition, but that we passed one argument! Whats happening? and what it is this `bound method` that python created on the instance of the class??

`method` is a actual object type that like a function is callable but it is bound to some object, **and that same object is passed to the method as first argument**. 

Therefore when we are calling **myObj.say_hello()** we have that **say_hello** is a method bounded to the object **myObj** that is injected as a first parameter in the method itself!?!?! essentially what we are doing is similar to:

```py
MyClass.say_hello(myObj)
```
But of course this is now what we want to achive, and we will sort this soon.
However, the utility of having the method bounded to the object is clear: now **say_hello** has access to the object namespace.

Being objects, `methods` have their own attributes like `__self__` , which is the instance to which the method is bound to, and `__func__` which is the original function defined in the parent class. calling `myObj.say_hello(args)` traduces essentially in calling `say_hello.__func__(say_hello.__self__, args)` where:

* `say_hello.__func__` is the function say_hello coded in the class MyClass
* `say_hello.__self__` is the instance to which the method is bounded, i.e. `myObj`
* `args` are any other parameters that the function may take as input 

In [66]:
myObj.say_hello.__func__ , myObj.say_hello.__self__

(<function __main__.MyClass.say_hello(instance_obj, name)>,
 <__main__.MyClass at 0x1ee57cfdf10>)

Now, we need a way to take into account that extra argument `say_hello.__self__` that is automatically passed to the method when it is called by the instance, and this has to be done inside the class definition, otherwise we wont be able to call the function form the instance. To do this we need to define the function inside the class with at least one argument that represents the instance of the class that will be passed to the instance method once it will be bounded to the instance.

In [67]:
class MyClass:
    def say_hello(instance_obj):
        print('hello from PyFry!')
    
myObj = MyClass()
myObj.say_hello()

hello from PyFry!


The function inside the class is often called `instance method` but in realty, until is not bounded to an instance of the class, is just a regular function. When we create `myObj` the function `say_hello` becomes a method bounded to that instance, and since it need an argument (called `obj` in his case as an example), it can be called from the instance without receiving the error:

**TypeError: say_hello() takes 0 positional arguments but 1 was given** because now we have one positional argument that is filled by the bound instance (the instance bounded to the method) itself `say_hello.__self__`.

This is the same as calling:

In [68]:
MyClass.say_hello(myObj)

hello from PyFry!


Of course, the functions defined inside the class can have their own parameter, even if the first one will always need to be the bounded instance. the power of the instance method is that can access both the instance and the class namespaces thanks to the bounded instance that is automatically passed as first argument, making possible the communication between the instance and the parent class.

In [69]:
class MyClass:
    master = 'Pyfry'
    def say_hello(instance_obj, name):
        print(f'Hi {name}, greating from {instance_obj.master}!')
    
myObj = MyClass()
myObj.say_hello('Giovanni'), myObj.__dict__

Hi Giovanni, greating from Pyfry!


(None, {})

Again, python looked for the attribute master first in the instance namespace, since it doesn't find it, looks up in the parent class namespace. If we were to redifine the state of the attirbute *master* then:

In [70]:
myObj.master = 'God'
myObj.say_hello('Giovanni'), myObj.__dict__

Hi Giovanni, greating from God!


(None, {'master': 'God'})

A Worldwide convention is to call the first argument of a class function, the one referring to the bounded instance **`self`**

**N.B if at runtime we assign a function to an instance of a class, this will be only a plain function and not a bounded method**. We can create bounded method at runtime but we need a special function from the `type` module called `MethodType`, this takes two arguments: the function and the instance object to which we want to bound it. Ofc, in this case only the namespace of the particular instance is affected, if we create new instances they wont see nothing more than what is defined inside the parent class. This can become handy when we have different instance of the same class that for some reason needs a different behavior; monkey-patching a bounded method at runtime can help us in this kind of tasks.

## Class initialization

When creating an instance of the class python does two things: creates the new instance and initialize its namespace. We can, and we usually do, override the default python behavior creating a custom initializer. This is design to work as a bound instance method and has the special name `__init__`. Like any function defined inside the class, the *init* is a class attribute, i.e. a function that is in the class namespace. Only when we instanciate the class, the instance call the `__init__` as a bounded method, as always with the first argument equal to the instance itself.

In [77]:
class MyClass:
    attribute = 'a class attribute'
    
    def __init__(self, whoAmI):
        print(f'Initializing {self}')
        self.whoAmI = whoAmI
        
obj = MyClass(f"I'm an instance of {MyClass}")
obj.whoAmI

Initializing <__main__.MyClass object at 0x000001EE57AE3F40>


"I'm an instance of <class '__main__.MyClass'>"

Unlike any other function, the *init* is a special method and gets called automatically by python each time an instance of the class is created.

*whoAmI* is an `instance attribute`, inside the class is assigned to `self`, i.e. to the future instances of the class MyClass; as a matter of fact *whoAmI* is not inside the class namespace but only in the instance one.

In [76]:
MyClass.__dict__, obj.__dict__

(mappingproxy({'__module__': '__main__',
               'attribute': 'a class attribute',
               '__init__': <function __main__.MyClass.__init__(self, whoAmI)>,
               '__dict__': <attribute '__dict__' of 'MyClass' objects>,
               '__weakref__': <attribute '__weakref__' of 'MyClass' objects>,
               '__doc__': None}),
 {'whoAmI': "I'm an instance of <class '__main__.MyClass'>"})

Whats happening under the hood is that python has already created the object ans its namespace before the *init* is executed; only in this way the *init* can be called as a bound method to the newly created instance.

## Class Properties

We have seen how we can define bare attributes to class instances by assigning them as *self.attribute* in the *init* method or, at runtime, with the *setter* or dot method. These attribute are freely accessible from the user since in python, unlike other programming OOP languages like JAVA or C#, deosn't have the concept of private variable. as a matter of fact, not all the attributes we define in a class should be accessible to the user since their value can affect in an unexpeded way other behaviors o fthe class itself; where private variable exist, usually no attributes are left open access in classes; instead they are defined as private and two methods called *getter* and *setter* that are pubblic and permit an indirect access to the private variable.

Even if in python we dont have the concept of private variable, there is a common rule between pythonista that is **if an attribute starts with an underscore, it means that is protected e.g. self._attribute**. However, this is just a convention, and nothing strictly prohibits the user to access it. To declarea **private** attribute double underscore **self.__attribute** are used, but again the privacy is only apparent since python is only a `name mangling`, i.e. it is replacing the attribute name with `_className__attributeName`; of course the protection on that attribute is stronger, but we are still not strickly talking about a private variable.

In [92]:
class MyClass:
    def __init__(self, name, surname):
        self._name = name
        self.__surname = surname

obj = MyClass('Giovanni', 'Frison')
obj._name

'Giovanni'

In [81]:
obj.__surname

AttributeError: 'MyClass' object has no attribute '__surname'

In [83]:
obj._MyClass__surname

'Frison'

As stated before, the corret approach to handle a private variable is to create methods that explicity set or get that attribute, and also in python, even if the concept of privacy is ficticious, we can declare a varible as protected (`sef._attribute`) and create a setter and a getter function suited to read and modify the protected variable. We can always access the attribute directly but to other programmers or to our future self we are explicitly telling "use that attribute only trough the setter and the getter method!".

In [86]:
# Mimic JAVA
class MyClass:
    def __init__(self, name):
        self._name = name
       
    def get_name(self):
        return self._name
    
    def set_name(self, name):
        self._name = name
        
obj = MyClass('Giovanni')

obj.get_name()

'Giovanni'

In [88]:
obj.set_name('Gianni')
obj.get_name()

'Gianni'

Getter and Setter are more then simple way to protect an attribute, they are ment to provide control over tha class attributes themself giving extra capabilities. For example we may add a validation step in the setter method or perform a computation in the getter, and that's why we call them properties.

In [94]:
# Example of a validation that we can carry out with a setter method
def set_name(self, name):
    if isinstance(name, str) and len(name) > 0:
        self._name = name.strip()
    else:
        raise ValueError('name must be a non-empty string')

Moreover, since in the setter we are validating the bare atribute that is assigned for the first time in the *init*, why dont use it directly in the *init*? In this way the validation will occur also at the creation of the instance!

In [97]:
class MyClass:
    def __init__(self, name):
        # self._name = name
        self.set_name(name)
    
    def set_name(self, name):
        if isinstance(name, str) and len(name) > 0:
            self._name = name.strip()
        else:
            raise ValueError('name must be a non-empty string')

Smart!

### @**property** class

Python has a solution to mimic in a more consistent way the JAVA-like approach to setter and getter: the `property` class. The `property` class takes two arguments `fget` and `fset` that are the two class functions that will be called when the property to which it referes is called (other parameters are `fdel` to specify the function to call when we want to delete the instance property and `doc` to specify a docstring).

In [95]:
class MyClass:
    def __init__(self, name):
        self._name = name
       
    def get_name(self):
        print('getter called')
        return self._name
    
    def set_name(self, name):
        print('setter called')
        self._name = name
        
    name = property(fget=get_name, fset=set_name)
    
obj = MyClass('Giovanni')
obj.name = 'God'
obj.name

setter called
getter called


'God'

Now we are able to access the attribute using the dot notation but in reality we are invocking the *fget* and *fset* method of the property class. This helps us in the code structure; in fact, in python we usually start writing bear attribute, and if needed we can create a property without breaking the interface of the class, but simply defining a getter and a setter method. 

Another way to use the `property` class type is to call it and assign to creare a property object and then assign to it the `getter, setter and deleter` methods:

```py
x = property()
x = x.getter(get_x)
x = x.setter(set_x)
```

## Property Decorator

Following the example above, the use of a property to create a property object and then assign to it the getter and setter methods is the same pattern used for decorators (a decorator takes the original function symbol and add to it some functionality, like a property class take a function as argument and returns a property object with the functionality of the functions passed as argument). As a matter of fact the most used way to assign a property to a class is trough decorators.

So to create a `property` we simply decorate any function with the `@property` decorator, and this is the same effect of creating a property object with the same name of the function and assign to it a getter. 


In [14]:
class MyClass:
    def __init__(self, name):
        self._name = name
     
    @property
    def name(self):
        print('getter called')
        return self._name
    
obj = MyClass('Giovanni')
obj.name

getter called


'Giovanni'

Now the function *name* is a method and an instance property.

Then, if we want to define a `setter` we can refer to the *name* function that has been decorated as a property and decorate a setter function adding the setter functionality to the alredy created *name* property: `@name.setter` (**the name of the setter has to correspond to the name of the function which has been decorated with @property**this because when creating the setter i dont want to create a new property but override the first one adding the setter functionality).

In [15]:
class MyClass:
    def __init__(self, name):
        self._name = name
     
    @property
    def name(self):
        print('getter called')
        return self._name
    
    @name.setter
    def name(self, value):
        print('setter called')
        self._name = value   
    
    
obj = MyClass('Giovanni')
obj.name = 'God'
obj.name

setter called
getter called


'God'

**BONUS**: if we want to define a docstring to the property using the decorator approach, we just need to insert the docstring in the getter function (i.e. the one decorated with @property).

### Read-Only and Computed Properties

In general, we want our python objects to be as lazy as possible, meaning that we want to compute at initialization only what is striclty necessary and leave to a later computation attributes that may or may not be requested by the user. Moreover, once these attributes are requested, we want to store their results, until one of the input property change, in order to safe computation if the request has to be done multiple times.

Looking at the example belowe, we have a circle class that require the radius as argument. The area of the circle is needed only when the user request it, therefore it is an optimal candidate for a  **read-only property**.

In [16]:
from math import pi
class Circle:
    
    def __init__(self, r):
        self.r = r
        
    @property
    def area(self):
        print('computing area..')
        return pi*self.r**2
    
c1 = Circle(3)
c1.area

computing area..


28.274333882308138

However, each time we are calling the property area we are computing its value even if the radius hasn't change, and this is a waste of resources. What we can do is to set initially the area value to None at initialization and each time the radius is changed.

In [17]:
from math import pi
class Circle:
    
    def __init__(self, r):
        self._r = r
        self._area = None
    
    @property
    def radius(self):
        return self._r
    
    @radius.setter
    def radius(self, value):
        print('setting new radius..')
        if isinstance(value, int) or isinstance(value, float):
            self._r = value
            self._area = None
    
    @property
    def area(self):
        if not self._area:
            print('computing area..')
            self._area = pi*self.radius**2
        return self._area
    
c1 = Circle(3)
c1.area
c1.radius = 4
c1.area
c1.area

computing area..
setting new radius..
computing area..


50.26548245743669

Done! the last time we asked for the circle's area we retrieved the value cached at the prior iteration without wasting computational resources.

**N.B. in *area* property we used the getter method *radius* to retrieve the circle's radius instead of the semi-private attribute *self._r* and this is a good practice, since usually getter and setter method are enhanced with additional funcitonalities, such as data validation, and calling the property we are ensuring that we keep our pipeline consistent**

This is so true that in reality we couls have define direclty the radius property in the initialization using the setter method instead the semi-private attribute:

In [18]:
from math import pi
class Circle:
    
    def __init__(self, r):
        self.radius = r
        self._area = None
    
    @property
    def radius(self):
        return self._r
    
    @radius.setter
    def radius(self, value):
        print('setting new radius..')
        if isinstance(value, int) or isinstance(value, float):
            self._r = value
            self._area = None
            
    @property
    def area(self):
        if not self._area:
            print('computing area..')
            self._area = pi*self.radius**2
        return self._area
            
c1 = Circle(5)

setting new radius..


## Class and Static Methods

**Class methods** are, as the name suggest, method that are bounded to the class also when called from an instance of the class itself. We have seen how normally, when we define a function inside the class, we need to pass at least the *self* argument for the instance to be able to call it. With class method instead, the call of the function will be always referred to the class that generated it, not its instance; therefore, calling the function from the instance or from the class will have the same output.

Let's see an example:

In [30]:
class MyClass:
    def hello():
        return 'hello'
    
    def instance_hello(self):
        return f'hello form {self}'
        
    @classmethod
    def class_hello(cls):
        return f'hello from {cls}'
        

Now, the function *hello()* will work only on the class, but when called from the instance will throw an error due to the missing *self*.

In [31]:
mC = MyClass()
mC.hello()

TypeError: hello() takes 0 positional arguments but 1 was given

While it will work fine if called from the class:

In [32]:
MyClass.hello()

'hello'

the instance and the class method instead will have different behavior: the instance one won't work if called from the class since it is a bounded method, while the class one will always refer to the class:

In [33]:
MyClass.instance_hello, mC.instance_hello()

(<function __main__.MyClass.instance_hello(self)>,
 'hello form <__main__.MyClass object at 0x0000023A612A77C0>')

In [34]:
MyClass.class_hello(), mC.class_hello()

("hello from <class '__main__.MyClass'>",
 "hello from <class '__main__.MyClass'>")

**Static Method** instead are plain function that are won't never be bounded to an instance of the class or to the class itself. There isn't a real use case for statich method, if not the need to have a function enclosed in the class definition for completeness or logical order (but it could be defined at module level so.. why bother??)

In [36]:
class MyClass:
    
    @staticmethod
    def hello():
        return 'hello'
    
mC = MyClass()

In [37]:
MyClass.hello(), mC.hello()

('hello', 'hello')

In [39]:
type(MyClass.hello), type(mC.hello)

(function, function)

As expected, the *type* of the function, both on the class and on the instance, is a *function* an not a bounded method.

## Class body scope

We knwo the difference between global and local scope, but what about the class scope, i.e. whats defined inside the class body? All the variables defined inside the class body are actually part of the class scope, therefore also the pointers to the methods defined inside the class belong to the class namespace. However, what about the methods inner scope? Methods are essentially functions therefore they have their own local scope; is this a nested scope inside the class scope? 

The answer is **NO**! the functions themself belong in the scope that contain the class itself, therefore be aware that **Python won't look in the class scope for any symbols unless it is clearly referenced the class scope by self**.

Let's see an example:

In [51]:
CONST=2

class ClassBodyScope:
    CONST = 1
    
    def __init__(self, var):
        self.var = var
        
    def add_to_var_1(self):
        return self.var + self.CONST
    
    @classmethod
    def add_to_var_2(cls, var):
        return var + cls.CONST
    
    @staticmethod
    def add_to_var_3(var):
        return var + ClassBodyScope.CONST
    
    def add_to_var_globalscope(self):
        return self.var + CONST
    
test = ClassBodyScope(1)

test.add_to_var_1(), test.add_to_var_2(1), test.add_to_var_3(1), test.add_to_var_globalscope()


(2, 2, 2, 3)

From the example above we can see that the name 'CONST' is defined inside the class scope and can be accessed by `add_to_var` function from 1 to 3 becasue we are explicitly telling python where to look for the variable 'CONST' (i.e. inside the class). In the last method however, python will look in the parent scope of the function for the variable 'CONST' and this is the globalscope, where it is defined with a different value.

**N.B. a list comprehension is nothing more than a function, therefore if we use it inside a class, referencing a variable to loop on, if not explicitly stated (with self or cls), it will look for a reference for that variable inside the globalscope!! BUUGSS!!**

In [60]:
name = 'Giovanni'

class Parrot:
    name = 'Luca'
    test1 = [name]*3
    test2 = [name for _ in range(3)] # This is a function! its outer scope is the global scope!
    
    @classmethod   
    def exec_test(cls):
        print(f'test 1 got: {cls.test1}')
        print(f'test 2 got: {cls.test2}')
        
Parrot.exec_test()

test 1 got: ['Luca', 'Luca', 'Luca']
test 2 got: ['Giovanni', 'Giovanni', 'Giovanni']


## Polymorphism

`Polymorphism` is the ability to define a behavior that will change when applied to different type of objects. Python is very polyphormin in nature since it supports the paradigm of `ducke typing`:

*"If i walks like a duck andquacks like a duck then it is a duck!"*

Meaning that we don't care to explicity name a specifi object for what it is but for its functionality! In the section about iterator we see how we can implement our own iterator easy enough so that we can iterate over its elements; we don't care if the collection is a list a tuble of a dictionary, we care about iterability only!

Other example are the operators such as `+ and *` which behave differently wheter they work with strings of list or numbers!

## Special Methods

`Special Methods` are those defined with a double underscore at the beginning and at the end of the name (e.g. \_\_init\_\_). These are private names that should be reserved to python core only since they serves for the implementation of a lot of functionalities (e.g. iterators, context manager, arithmetic operations etc..).

### \_\_str\_\_ method

The `str` method is aspecial method that will invocked by the print function on the object; it is tipically used for displaying the purpose of the object or some information to the user. If not defined python will look for the `repr` method, if not found it will print the source of the object together with its memory address.

In [65]:
class Rectangle:
    def __init__(self, width, height):
        self.width = width
        self.height = height

    def __str__(self):
        return f'Rectangle: width:{self.width}, height:{self.height}'

r1 = Rectangle(10,20)
str(r1)

'Rectangle: width:10, height:20'

### \_\_repr\_\_ method

The `repr` method is a special method similar to `str` method but its use is more developer-oriented. The repr method should return the string representation of the class instance called

In [66]:
class Rectangle:
    def __init__(self, width, height):
        self.width = width
        self.height = height

    def __repr__(self):
        return f'Rectangle({self.width}, {self.height})'

r1 = Rectangle(10,20)
repr(r1)

'Rectangle(10, 20)'

### Arithmetic operators

Arithmetic operations can be defined for each type of object, also in our custom class. There are many flavors of operators, such as standard, inplace, reversed.

for example standard arithmetic operations are:
* `__add__`: +
* `__sub__`: -
* `__mul__`: *
* `__truediv__`: /
* `__floordiv__`: //
* `__mod__`: %
* `__pow__`: **


more info can be found in DEEP Dive part 4 Lesson 50

### Truth value

We should know by now that every python object has a truth value, actually we can say that any object returns to `True` if evaluated as a boolean, execpt for some special cases like 0, None, '' etc..

When called `bool` on an object python will first look for the definition of the `__bool__` method and if not found it will look for the `__len__` method, if this returns 0 than the result is `False`, and `True` in any other case. If bool and len are not defined, python will return True no matter what.

In our custom class we can override the bool behavior with our custom rules but it always need to return True or False.

### Callables

The `__call__` sepcial methods is used to create classes that are callables, meaning that they can generate instance that are themself callable.

It is a very commond practice in python developmente to create callable class and use them, for example, like decorator class.

**For some example see Deep Dive 4 Lesson 58**

In [74]:
class Person:
    def __call__(self):
        print(f"The class {self.__class__.__name__} generates callable now")
        
p = Person()
p()

The class Person generates callable now


## Single Inheritance

`Inheritance` is a fundamental concept in OOP; since classes define properties and methods, these can be inherited from child classes, forming a natural hierarchy. For example we could have the class **Shape** which has 3 child classes that inherit from it that could be **Polygon**, **Line** and **Ellipse**; **Polygon** in turn has two `child classes`: **Quadrilateral** and **Triangle** and so on. 
In the same way as a triangle is a polygon and also a shape, we can immagine having a class the inherit, and potentially extend or override, the characteristics (state and behavior) of one or more parent class.

### instance vs type

There is a subtle difference when looking at instances of child classes; in fatc, if we look at the `type()` of an instance we will get back the class from which it was generated and no relation with a potential parent class; with the `isinstance()` method instead, we get `True` also if we check the instance against the parent class (which is not directly the generator of the instance itself). We will more often use *isinstance()* method since we are usually more interested in knowing wheter an object has inehirted certain behaviors (from the list or dict class for example).

We can also look at the relationship between classes direclty, instead of looking at instances, using the `issubclass()` method

### The *object* class

When we create a class, even if we are not inheriting explicitly from another class, python is subclassing a special built-in class called `object`. N.B. lowercase classes derives from a Cython implementation.

Also modules and functions are class that are contained in the `types` module.

In [13]:
import types
dir(types)

['AsyncGeneratorType',
 'BuiltinFunctionType',
 'BuiltinMethodType',
 'CellType',
 'ClassMethodDescriptorType',
 'CodeType',
 'CoroutineType',
 'DynamicClassAttribute',
 'EllipsisType',
 'FrameType',
 'FunctionType',
 'GeneratorType',
 'GenericAlias',
 'GetSetDescriptorType',
 'LambdaType',
 'MappingProxyType',
 'MemberDescriptorType',
 'MethodDescriptorType',
 'MethodType',
 'MethodWrapperType',
 'ModuleType',
 'NoneType',
 'NotImplementedType',
 'SimpleNamespace',
 'TracebackType',
 'UnionType',
 'WrapperDescriptorType',
 '_GeneratorWrapper',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_calculate_meta',
 '_cell_factory',
 'coroutine',
 'new_class',
 'prepare_class',
 'resolve_bases']

So that, for example we define a function, it "type", hence its string representation, will be `function`, but in reality it descentd from the `types.FunctionType` class

In [11]:
def my_func():
    pass

type(my_func), types.FunctionType is type(my_func), isinstance(my_func, object)

(function, True, True)

As a matter of fact, the class object implements a number of dunder methods that are used by a common class, like `__init__`, `__repr__`, `__new__` etc..

In [12]:
dir(object)

['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

### Overriding

Any method inherited from a parent class (at any level) can be overridden to have specific functionalities fot hata class. This works when we inherit from another custom class (like, Person -> Student) but since any class inherit from `object`, we can override also those methods, something that we do almost always when we define a custom `__init__` or `__repr__` method.

An example case where we leverage this behavior is in the representation method of a class, where instead of hard-coding the name of the class (that might change or can have a child class with a differente name) we can use the `__class__` property inherited from the `object` class

In [18]:
class MyClass:
    def __repr__(self):
        return f'{self.__class__.__name__}()'
    
class MySecondClass(MyClass):
    pass
    
aClass = MyClass()
print(aClass)
aSecondClass = MySecondClass()
print(aSecondClass)

MyClass()
MySecondClass()


### Extending

The same concept of overriding but instead we are enhancing a method or a property from the parent class with extra functionalities.

### Delegating to parent class

Often, we don't want to repeat code that is already present in our parent class; instead we want to be able to leverage it and extending where applicable. to do this we have the special method `super()`. When using `super().someMethod()`, we are delegating to the parent class the usage of that method as if we were executing it from the parent itself (careful with side effects). Moreover, `super().someMethod()` will look up in the hierarchy of the class for the method called, not only in the direct parent class.

In [25]:
class Person:
    def talk(self):
        return "Im' a Person"

class Man(Person):
    pass

class Athlete(Man):
    def talk(self):
        return super().talk() + " but also an Athlete!"
    

p = Person()
m = Man()
a = Athlete()

p.talk(), m.talk(), a.talk()

("Im' a Person", "Im' a Person", "Im' a Person but also an Athlete!")

The most common use case of delegating is through the init method.

N.B. it is always better to call the `super().__init__(*args)` method as first in the child class init, becasue you never know; the parent class might overwrite what you did before. (In Java for example it is mandatory to first init the parent class).

In [34]:
class Person:
    
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age
        self.smoker = True
        
class Man(Person):
    def __init__(self, name: str, age: int, smoker: bool):
        self.smoker = smoker
        super().__init__(name, age)
    

m = Man('Gio', 33, False)
m.smoker

True

Even if we set smoker=False in the class parameter, since the method super() is called after, the parent class override the value for smoke.

### Method Binding

Even if we are delegating a method to a parent class, that same method is bounded to the instance of the class from which is called; therefore `self` will always be the instance of the clas we are using.

In [40]:
class Person:
    def talk(self):
        print( f"Im {self.__class__.__name__}")
        
class Man(Person):
    def talk(self):
        super().talk()
        print(f"Im {self.__class__.__name__}")

m = Man()
m.talk()

Im Man
Im Man


### Slots

When a class is created, since it inherits from the base `object` class, it is shipped with a default dicitonary that therefore comes with some overhead in terms of memory and speed. To reduce this overhead, relative to emmory in particular, we can use slots. Defining the `__slots__` attribute we can assigne a tuble of values to our class, and these will be the only properties that the class can use. Therefore, if for example we have a mapper class that has always the same properties in it, and this class is called several times (e.g. the parsing of a db), then using slots can be an efficient way to go.

In [46]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
class Point_with_slots:
    __slots__ = ('x', 'y', )
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
p = Point(1,1)
p_slot = Point_with_slots(1,1)

p.__dict__

{'x': 1, 'y': 1}

In [47]:
p_slot.__dict__

AttributeError: 'Point_with_slots' object has no attribute '__dict__'

However, using slots have some limitations and should be the choice only when there is a clear advantage in terms of performance. For example, inheritance from a parent class that uses slots might be tricky, since the new class inherit the attributes but has also a definition of the instance dictionary `__dict__` that however won't contains the atrtribute inherited from the parent class. Moreover, we don't want to redifine attributes inherited from a parent class in a child class because we would hide the parent class definition nand increase the memory usage. 

Slots and properties have in common that are not stored in the instance dictionary even if they are both in the class dictionary. Htey both use **data descriptors** that essentially creates properties for us (getters, setterd, deleters etc.).

To summarize, slots are faster at attribute acces and use less memory, while instance dictioray are heavier but more versatile can add attributes at run time. We could use the best of both worlds usign single inheritance where the parent or the child class only implements the slots while inheriting the dictionary from the other, or in the same class adding to the slots the `__dict__` property.

back to [TOC](#TOC)

# Descriptors
HARD TOPIC

---

https://docs.python.org/3/howto/descriptor.html

We have seen that to validate an attribute of a class, we usually have to define properties, getters and setters ourself which will contain our custom validation rules and determin how the user can  interact with instance of thata class. However this can become tedious pretty quickly and a lot of boilerplate code can be produced. Here is where descriptors come at hand.

The idea is to be able to define class attributes that will also be bound to the instances at run-time, the same way when we define properties at class level that somehow are also bounded to the instances of the class.

The `descriptor protocol` is made of 4 main methods, not all required:
* __get__
* __set__
* __delete__
* __set_name__

Moreover, descriptors can be devided in:
* `non-data descriptors` which don't implement the set
* `data desriptors` which implements alsto set and/or delete

## non-data descriptors

Let's start with a dummy example where we want to create a class that trows 2 dices; the approach using properties would be to define at class level two function annotated with `@property`, so that they are read only.

In [76]:
from random import choice, seed

seed(123)

class Dices:
    @property
    def dice_1(self):
        return choice(tuple('123456'))
    @property
    def dice_2(self):
        return choice(tuple('123456'))
    
    @property
    def throw(self):
        return print(self.dice_1, self.dice_2)
    
d = Dices()
for _ in range(3):
    d.throw

1 3
1 4
3 1


What we could do instead is to define a descriptor class that will handle the choice mechanism and that, moreover, can re-used for other classes as well.

In [77]:
class Choice:
    def __init__(self, *choices):
        self.choices = choices
        
    def __get__(self, instance, owner_class):
        return choice(self.choices)

class Dices:
    dice_1 = Choice(*'123456')
    dice_2 = Choice(*'123456')
    
    @property
    def throw(self):
        return print(self.dice_1, self.dice_2)

In [78]:
d = Dices()
for _ in range(3):
    d.throw

1 4
5 5
3 3


### getter and setter

But how the `__get__` method is being called in in the Choice class? The class Dices define an instance of Choice() as class attribute and since Choiche implements the `__get__`, python will use that method when retrieving the instance attribute. We can also call the method from the class itself but we have to be careful because the *owner_class* will be diffirent in that case. Therefore, when calling a descriptor it might be important to know if it is called from an instance (or None if called from a class) and to which class the descriptors we are calling belongs (Dices in this specific case).

We usually want to differentiate the behavior of the descriptor if it is called from an instance or from the class itself. Usually, if called from an instance we want the attribute value, while if called from the class we want an handler on the descriptor, i.e. an instcane of the descriptor itself. In the case of the Choice class it will be:

In [None]:
class Choice:
    def __init__(self, *choices):
        self.choices = choices
        
    def __get__(self, instance, owner_class):
        if not instance:
            return self
        return choice(self.choices)

The `__set__` method behave quite similarly adn its signature is `(self, instance, value)` where again:
* self is the reference to an instance of the descriptor class
* instance is the instance of the class that implements the descritor if any, or None otherwise
* value is the value we want to assign to the attribute

There is no *owner_class* since it doesn't make sense to set an attribute to a class but only to one of its instances.

However, there is a caveat with setters and deleters descriptors; since they are declared at class level, their reference is shared across the multiple instances of the class that implements the descriptor; this will of course ause problem  when the different instance of the class sohuld set different values. For this reason we need both the getter and the setter to be aware of which instance is calling them, in order to properly store the data relative to the specific instance that called the method.

### Storing instance properties

Now come the problem of where to store the instance variables setted through our descriptor; we could place it inside the instance dictionary, but what if we don't have one, what if the class implements slots? and evene if the instance dictionary is there, which simbol should we use to store the variable being sure that it doesn't shade an existing one?

The solution is not trying to use the instance dictionary form the instances of our class but to use the instance dictionary inside the instance of the descriptor.

Now the problem is how to choose the key-value pair to insert in this dictionary: the value is trivial, is simply what we set, but the key might be a problem since it has to be hashable an not all objects are. The solution is to use as key the instance that is calling the setter method!

In this way we will have a common dictionary bound to the single class instance of the descriptor, an in it we will have a key-value reference that traduces in `instance -> setted value`, so that when usign the get from a particular instance we are sure to retrieve the correct value.

In [19]:
class IntegerValue:
    def __init__(self):
        self.data = {} # the decorator dictionary
    
    def __set__(self, instance, value):
        print("descripto setter called")
        self.data[instance] = int(value)
        
    def __get__(self, instance, owner_class):
        if not instance:
            print(f"descripto getter called from {owner_class.__class__.__name__}")
            return self
        print(f"descripto getter called from {instance.__class__.__name__}")
        return self.data.get(instance, None) # use get in case getter is called before setter
    
class Point2D:
    x = IntegerValue()
    y = IntegerValue()
    
p = Point2D()
Point2D().x = 10
Point2D().x
p.x = 1
p.y = 1
p.x
Point2D.x.data, Point2D.y.data # Im accessing the instance dictiorny of the descriptor instance inside the class

descripto setter called
descripto getter called from Point2D
descripto setter called
descripto setter called
descripto getter called from Point2D
descripto getter called from type
descripto getter called from type


({<__main__.Point2D at 0x17f249bebc0>: 10,
  <__main__.Point2D at 0x17f249bd750>: 1},
 {<__main__.Point2D at 0x17f249bd750>: 1})

However this cause a side effect, a memory leack to be precise; when we instantiate the `p` object the refernce count is 1, when we set its x value the reference count goes up to 2 since we are storing a reference in the descriptor dictionary. Therefore, even if we delete p the garbace collector won't be able to free the memory related to `p`!

## Strong and Week references

What just described above is called `strong reference` since both the instantiation of the class Point2D and the method calling of the descriptor create astrong reference to the object underlying, hence to untill at list one of the reference is alive, the garbage collector can't free the allocated memory.

A `week reference` instead is a reference that doesn not impact the reference count on an object in memory; as a consequence, if we delite the strong reference, the weak reference is defined "dead" and automatically lost and therefore the garbage collector can do its work. This is the kind of reference that we want our descriptor to hold in its dictionary in order to avoid memory leaks.

To create a weak reference python has a built.in module called `weakref`. Once a variable holds a weak reference to an object, it become a callable, and by calling it we are returning the original object to which the reference is pointing to.

In [24]:
import weakref

p1 = Point2D()
p2 = weakref.ref(p1)

p2()

<__main__.Point2D at 0x17f26682320>

Now p2 is a callable and when called is returing the object at which p1 is poiting, therefore it is returning a strong reference. So be careful, if we assign the call of p2 to another object we are creating a second strong reference!

In [25]:
p3 = p2() # N.B. this is a new strong reference! -> now the reference count is 2!

Therefore, in the descriptor's dictionary we only want to have weak references, adn instead to create a weak reference key each time we can import the `WeakKeyDictionary` from the weakref module, and this special dictionary will call `weakref.ref` for us on each key stored.

All great but we still have to deal with the fact that an object has to be hashable in orter to be the key to a dictionary, and not all object are. We could use the `id(instance)` as key but we might occur into another problem: first if the strong reference to the object is deleted than our key might point to an object that doesn't exist anymore; second, even if very unlikeable, there might be a new object created that obtain the same id of the previous object, so that now the dictionary contains a reference to the wrong object.

Long story short, we are going to leverage the callback functionality of `weakref.ref` which automatically calls a function when the object which is pointing to is garbage callected.

The final recepy is:
- using a regular dictionary
- storying as key the id of the instance, which is always hashable
- store as value a tuple composed by the weak reference and the value, the weak ref will be used to remove dead entry

Using descriptor like this we will achieve:
- instance specifi storage of variables
- instance are not direclty used for storage
- handled non-hashable objects
- the data storage mechanism has no leaks

This is how the previous descriptor should be coded:

In [41]:
class IntegerValue:
    def __init__(self):
        self.data = {} 
    
    def __set__(self, instance, value):
        self.data[id(instance)] = (weakref.ref(instance, self._remove_object), int(value))
        
    def __get__(self, instance, owner_class):
        if not instance:
            return self
        value_tuple = self.data.get(id(instance))
        return value_tuple[1] # the value will always be teh second element of the tuple in the dict
    
    def _remove_object(self, weak_ref):
        reverse_lookup = [key for key, value in self.data.items() if value[0] is weak_ref]
        if reverse_lookup: # if the key relative to the supposely dead instance is found, delete it
            print(f'removing dead entry for {weak_ref}')
            key = reverse_lookup[0]
            del self.data[key]
        
class Point:
    x = IntegerValue()

In [42]:
p1 = Point()
p1.x = 1
Point.x.data

{1645616832784: (<weakref at 0x0000017F267F1F30; to 'Point' at 0x0000017F26682110>,
  1)}

In [43]:
del p1
Point.x.data

removing dead entry for <weakref at 0x0000017F267F1F30; dead>


{}

N.B. since classes that implements slot doesn't have an instance dictionary and a weak_ref, they can be subject of a week reference! The only solution, as for the instance dictionary, is to add the weeak_ref to the slots intself.

In [46]:
class Person:
    pass
    
class Person_slot:
    __slots__ = ('name',)

In [47]:
Person.__dict__

mappingproxy({'__module__': '__main__',
              '__dict__': <attribute '__dict__' of 'Person' objects>,
              '__weakref__': <attribute '__weakref__' of 'Person' objects>,
              '__doc__': None})

In [48]:
Person_slot.__dict__

mappingproxy({'__module__': '__main__',
              '__slots__': ('name',),
              'name': <member 'name' of 'Person_slot' objects>,
              '__doc__': None})

In [51]:
p = Person_slot()
hasattr(p, '__weakref__')

False

In [52]:
class Person_slot:
    __slots__ = ('name', '__weakref__')

p = Person_slot()
hasattr(p, '__weakref__')

True

## The set_name method

N.B. we assume that the classes dont implement slots for the following to be applicable.

From python 3.6 we have another method applicable to descriptors: the `__set_name__` method. We can use it to retrieve the name of the property that the descriptor instance has being assigned to in our class (e.g. x in the Point example above).
The descriptor is instantiated at compile time because it is defined inside a class context. Lets see an example:

In [161]:
class ValidateString:
    def __set_name__(self, owner_class, property_name):
        print(f'__set_name__: owner={owner_class}, property_name={property_name}')

class Person:
    name = ValidateString()

__set_name__: owner=<class '__main__.Person'>, property_name=name


Has we can see the set name has already been called and since it is instatiated inside the Person class, the latter is the owner class and the peroperty_name is exaclty what was set as class attribute.

The next step is to store the property name in the instance dictionary of the descriptor, which is not a problem, since the instance of the class which is tight to the instance of the descriptor will always have that same name for that particular attribute. Now, if we define a getter or a setter, these will be aware of which attribute is exdaclty beign retrieved through the property_name stored in the instance dictionary of the descriptor.

Now we still have to store the property name inside the instance dictionary fo our descriptor in order to be able to retrieve it and don't caus ememory leaks. An idea could be to use as a key the property name with an underscore in front of it, but this doesn't give use the certainty that the same name is not used for something else in our class. 



In [39]:
class ValidateString:
    def __set_name__(self, owner_class, property_name):
        print(f'__set_name__: owner={owner_class}, property_name={property_name}')
        self.property_name = property_name
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        print(f'__get__ called for property {self.property_name} of instance {instance}')
        key = '_' + self.property_name
        return getattr(instance, key, None)
        
    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError(f'{self.property_name} must be a String')
        key = '_' + self.property_name
        setattr(instance, key, value)

        
class Person:
    name = ValidateString()

__set_name__: owner=<class '__main__.Person'>, property_name=name


In [43]:
p = Person()
p.name = 'Giovanni'
p.name
p.__dict__, Person.__dict__

__get__ called for property name of instance <__main__.Person object at 0x000002408FCCAD40>


({'_name': 'Giovanni'},
 mappingproxy({'__module__': '__main__',
               'name': <__main__.ValidateString at 0x2408df60e50>,
               '__dict__': <attribute '__dict__' of 'Person' objects>,
               '__weakref__': <attribute '__weakref__' of 'Person' objects>,
               '__doc__': None}))

After all, We cannot use the same name of the property name as key or we will shadow the class instance dict with the descriptor instance dict.. right?

Well.. it depends! Here becomes crucial the distinciton between `data` (implements at least get and set or delete) or `non-data` (only get is defined) decriptors. 

**The data descriptor always override the class instance dictionary with its own instance dictionary** even if we try to hard wire a property directly into the instacne dictionary, when accessing that same property, python will look at the instance diciotnary of the descriptor, not of the class instance.

The non-data descriptor instead gives precedence to the class instance dictionary, and the look up to the descriptor instance dictionary happens only if the attribute is not found in the former.

In [189]:
class ValidateString:

    def __set_name__(self, owner_class, property_name):
        self.property_name = property_name
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        print(f'getting {self.property_name} from __get__')
        return self.__dict__.get(self.property_name, None)
        
    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError(f'{self.property_name} must be a String')
        print(f'setting {self.property_name}={value}')
        self.__dict__[self.property_name] = value
        
class Person:
    name = ValidateString()

In [190]:
p = Person()
p.name = 'Giovanni'
p.__dict__['name'] = 'Fred'

setting name=Giovanni


Since we have a data descriptor, even if we are trying to hard wire the instance dict of the class instance, we get back the value stored in the descriptor instance

In [191]:
p.name, p.__dict__

getting name from __get__


('Giovanni', {'name': 'Fred'})

We can see that the class instance dictionary actually has an attribute name in it, but it gets shadowed by the descriptor instance dictionary that is called with the getter.

As stated above, differnet would have been if our was a non-data descriptor; in that case python would have first looked in the class instance dictionary, and only after in the descriptor instance,

We can store data both in the class instance and in the descriptor instance, without the risk of shadowing attributes, but beign careful of not incurr in infinite recursion.

In [44]:
class ValidateString:

    def __set_name__(self, owner_class, property_name):
        self.property_name = property_name
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        print(f'getting {self.property_name} from __get__')
        return instance.__dict__.get(self.property_name, None) 
    # we use the get method to not incurr in a runtime erro in case the values has not been set yet
        
    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError(f'{self.property_name} must be a String')
        print(f'setting {self.property_name}={value}')
        instance.__dict__[self.property_name] = value
        #setattr(instance, self.property_name, value) # N.B. this will lead to infinite recursion!!
        
class Person:
    name = ValidateString()

In [46]:
p = Person()
p.name = 'Giovanni'
p.__dict__, Person.__dict__

setting name=Giovanni


({'name': 'Giovanni'},
 mappingproxy({'__module__': '__main__',
               'name': <__main__.ValidateString at 0x2408f13c6d0>,
               '__dict__': <attribute '__dict__' of 'Person' objects>,
               '__weakref__': <attribute '__weakref__' of 'Person' objects>,
               '__doc__': None}))

In [47]:
p.name

getting name from __get__


'Giovanni'

Now the class instance dictionary stores the same attribute name as the descriptor dictionary, but since it is a data descriptor, the getter return comes always from the descriptor dictionary!

## Properties are decriptors!

Back to the beginning, were we start talking of `@property` annotation, how we use then in classes and how we can reduce boilerplate code usign descriptors. Well, plot twist, properties are actual data descriptors! As a matter of fact they implements getter, setters and delete methods. Whenever we use a property with the dot notation, python is calling the get/set method on the instance that is performing the call.

If we define a property usign the `property()` method, so that we can assign that property to a variable, we can easily confirm with the function `hasattr` that the property contains getter, setter and deleters, even if they are not specified! Since it is a data descriptor it has all the methods implemented by default, they will simply return none if called, but they are there!.

## Functions implements the descriptor protocol!

We have seen before that some magic happens when the functions defined inside a class becomes magically bounded to the instance of that class. But how does this happen? HOw can python differentiate the behavior of calling a function from the class in which it is defined or from an instance? Well.. guess what, function are objects that implements the non-data descriptors protocol! As such, thei have the `__get__` method and depending how you call it, it is gonna return the function or the bound method to the class

In [48]:
def add(a,b):
    return a + b

hasattr(add, '__get__'), hasattr(add, '__set__')

(True, False)

To call the get method we need the instance and the owner class.

To see the different behaviors of the get method we need to be able to call the add function from its owner class, i.e. in this case the module we are writing in now.

In [52]:
import sys
me = sys.modules['__main__']

No we can call the fucntion from the owner class without passing an instance.

In [54]:
f = add.__get__(None, me)
f, f is add

(<function __main__.add(a, b)>, True)

f is now exaclty the function; so the  get method return the function itself when called from the module in which it was defined (i.e. the class that contains it).

Now lets see what happens whan calling a function from a class:

In [55]:
class Person:
    def __init__(self, name):
        self.name = name
        
    def say_hello(self):
        return f'{self.name} says hello!'
    
Person.say_hello

<function __main__.Person.say_hello(self)>

See? the behavior is exaclty the same as before; when we are calling the function, python is usign the `__get__` method, with the instance value set to None, since we are calling it from the class itself, and the owner class set to Person, i.e. the class that owns that function. 
If instead we instantiate the Person class and we call the function from there (therefore with the instance value set to the instance itself, we get the bound method instead:

In [56]:
p = Person('Giovanni')
p.say_hello

<bound method Person.say_hello of <__main__.Person object at 0x000002408FD71DE0>>

Which is equivalent to say

In [59]:
bound_method = Person.say_hello.__get__(p, Person)
bound_method

<bound method Person.say_hello of <__main__.Person object at 0x000002408FD71DE0>>

When we get the bound method from an instance, python is actually creating a new method each time. To see to which owner class the bound method belongs we can use the special `__func__` method.

In [60]:
f1 = p.say_hello
f2 = p.say_hello
print(f'Are f1 and f2 two the same object? {f1 is f2}, because pyton creates a new method each time')
type(f), type(bound_method)

Are f1 and f2 two the same object? False, because pyton creates a new method each time


(function, method)

In [65]:
bound_method.__func__

<function __main__.Person.say_hello(self)>

We can actually mimic the descriptor that is used by python to bound a function to a class using the `types.MethodType` method wichi takes 2 argument, the function and the instance that we want to bound the function to.

In [66]:
import types

class MyFunc:
    def __init__(self, func):
        self._func = func
        
    def __get__(self, instance, owner_class):
        if instance is None:
            print(f"Called from class={owner_class.__class__.__name__}")
            return self._func
        else:
            print(f"Called from instance={instance.__class__.__name__}")
            return types.MethodType(self._func, instance)            

Now we bound a function, defined outside the class itself, to a class but using the descriptor

In [68]:
def hello(self):
    print(f'{self.name} says hello')

class Person:
    def __init__(self, name):
        self.name = name
        
    say_hello = MyFunc(hello)
    
p = Person('Giovanni')

In [69]:
Person.say_hello, p.say_hello

Called from class=type


(<function __main__.hello(self)>,
 <bound method Person.say_hello of <__main__.Person object at 0x000002408FD71DE0>>)

back to [TOC](#TOC)

# Scopes and Namespaces
---

The `Scope` is the portion of the code in which a variable name is defined; it has an associated `namespace`, essentially a table that lists all the variables in the Scope and the associated memory addresses. There are different `Scope` in python and are defined in a nested structure. At the top, we have the `built-in` scope, the only truly global scopes that exists across each modules of Python, which contains the definitions of core elements such as `True`, `None`, `dict` etc. Nested inside the built-in scope there is the so called `Global` scope (even if it is not global in the sense that exist only inside a single file). Moreover, each function has its own scope, named `Local`, that is created each time the function is called (until the function is not called, the variables defined in its own scope are not compiled, therefore does not exist in the Global scope, where the function is defined).

In [None]:
# module1.py
print(True)
# both print and True are not defined in the module1 scope, therefore python automatically goes up one level and look up for their definition in the builtin-scope

In [None]:
print(new_var_never_used)
# Error! 'a' has not been defined in the module scope and neither is in the built-in scope, therefore Python trows an error `NameError`

We can summarize that at `compile time` python looks at the code and predetermine which variables will eventually be in the local or global scope. When it encounter a `def` (function), it will look inside of it; if there are assignations (e.g. a = 100), it will understand that that variable will be part of the global scope only, unless the `global a` keyword is specified; if a variable is called but not assigned inside the function (e.g. print(a)), the compiler will determine that it is a non-local reference.

In [None]:
a = 0 # global scope/namespace

def func1():
  print(a) # the compiler understand it is a non-local variable since there is no assignment inside the local scope

func1()
print(a)

In [None]:
def func2():
  a = 100 # the compiler knows that this will be a local variable
  print(a)

func2()
print(a)

In [None]:
def func3():
  global a # the compiler knows that this will refer to a global variable
  a = 100 
  print(a)

func3()
print(a)

## Masking
It is defined masking, and should be avoided, when we overwrite a keyword from the built-in scope. Since Python first look at the module scope, if we have assigned a variable to an existing element in the built-in namespace, we will modify its standard behavior.

In [None]:
# module1.py
print = lambda x: f'Hello {x}'
# we are redefining locally the 'meaning' of the variable print so now:
print('world') # -> 'Hello world'
# python is invoking the local definition of print e not the built-in one

The same behavior is applied between Global and Local scopes. When assigning a variable inside a functional scope, python sees it at compilation time and stores it (it will be created only when the function is called but the compiler is already aware that it exists).
Therefore, if the same variable exist in the Global scope, when the function is called, it will be masked by the assignation in the local scope.

In [None]:
# del print # remove the previously overwrite of standard print function

a = 0 # global scope/namespace

def my_func():
  a = 100 # local scope/namespace
  print(a)

my_func() # a = 100, the global scope 'a' has been masked 

print(a) # a = 0, the global scope hasn't been modified, and the local scope of `my_func` has been destroyed after its execution.

There is also the possibility to avoid masking by explicitly tell python that the variable assigned in the local spaces are actually owned in the global space. This is done by declaring the `global` keyword at the beginning of a local scope. This is telling the compiler to look first in the global scope for that particular variable and, if not found, to create a new one.

In [None]:
a = 0

def my_func():
  global a
  a = 100 # local scope/namespace
  print(a)

my_func() # a = 100, since `global a` has been declared, the variable `a` in the global scope has been modified

print(a) # a = 100

## Nonlocal scope
When we define a function inside another function, a new scope is created which is not the global (module level) scope, nor the local (function level) scope. It is a middle scope called `non-local scope`. Variables belonging to the nonlocal scope are called `free variables`.

In [None]:
def outer_func():
  x = 10 # local scope of outer_func == non-local scope of inner_func
  
  def inner_func():
    x = 20 # local scope of inner_func
  
  inner_func()

  print(x)

# if we call outer_func:
outer_func() # x = 10 since the local scope of inner func has not modified the non-local scope (local scope of outer_func)

In the same way as we tell python that a variable in a local scope is `global`, we can specify a variable to be `non-local` (i.e. with the same reference of the one in the outer_func scope). 

In [None]:
def outer_func():
  x = 10 # local scope of outer_func == non-local scope of inner_func
  
  def inner_func():
    nonlocal x # now the reference of x is shared with the non-local (outer_func) scope
    x = 20 # local scope of inner_func
  
  inner_func()

  print(x)

# if we call outer_func:
outer_func() # x = 20 since the local scope of inner func has modified the non-local scope (local scope of outer_func)

N.B. if in a local scope we define a `global` variable, python will look in global scope for a match, otherwise it will create the global variable. Instead, when defining a `nonlocal` variable, python will look only in the non-local scope (the local scope of the parent function).

In [None]:

def outer():
  x = 0 # local scope of outer
  
  def inner1():
    # local scope of inner1 == nonlocal scope of inner2
    def inner2():
      nonlocal x
      x = 10 
    inner2()

  inner1()
  print(x) # x = 0 because inner2 looked only in its nonlocal scope

outer()

back to [TOC](#TOC)

# Closure
---

A `Closure` is a special python constructor that is composed by a function and an extended scope (nonlocal scope) that contains free variables (nonlocal variables). This means that both the functions and the extended scope point to the same object, but python don't do this directly. Instead a `cell object` is created, pointing to the value of the free variable, while the free variable, in both the extended scope and the function scope, point to the cell object.

In [None]:
def outer():
  a = 10

  x = 'python'  #----------
                # THIS IS A
  def inner():  # CLOSURE
    print(x)    #----------
  
  return inner

fn = outer() 
'''
now outer has returned the function inner which should print x without directly containing a reference to the variable.
Therefore, since the scope of the function outer is exhausted after it is called, we should expect that the variable x=python is lost and can't be referenced by fn.
Instead, it is possible since python, during compilation, sees a Closure and create an intermediate cell object that share the reference to x both from inner and outer functions.
'''

# x = 'python' #--|
                 #|--> cell object --> str object `python`
# print(x)     #--|              

fn()

Both `x` in outer and inner functions point to a `cell object` which contains a reference to another object in memory containing the string `python`. This lets us be able to call the function inner, returned from the function outer, even if the scope of outer is already exhausted. We can inspect the closure and free variables of an object:

In [None]:
fn = outer()

# fn.__code__.co_freevars # (x) while a is not a 
# fn.__closure__ # cell object at address xxx containing a str object ('python') at address yyy
# the memory address of both the `x` (local and free) is the same and pointing to yyy

fn.__code__.co_freevars, fn.__closure__

We can have multiple instance of the same closure, this means che each time a cell object is created, leaving the behavior of the different instance of the closure independent.

In [39]:
def counter():
  # beginning of Closure
  count = 0

  def inc():
    nonlocal count
    count += 1
    return count
  # end of Closure

  return inc

f1 = counter()
f2 = counter()
# f1 and f2 behavior is independent

f1()
f1()
f1(), f2()

(3, 1)

## Shared extend scope
At the same time we can have `shared extended scope` of two different closures.

In [None]:
def outer():
  count = 0

  def inc1():
    nonlocal count
    count += 1
    return count

  def inc2():
    nonlocal count
    count += 1
    return count

  return inc1, inc2

f1, f2 = outer() 

f1()
f2()

`f1` and `f2` are two closure that share the free variable `count` therefore both the functions, when called, will increment the value of count. If this behavior is wanted, then no problem, but often happens to share the same free variable without knowing it.

In [None]:
# create a list of functions that add a two values
adders = []
for n in range(1, 4):
  adders.append(lambda x: x + n)

# what we expect to have is a list of functions
adders


In [None]:
# therefore calling 
adders[1](10) # should return 12 = 10 + 2 
# instead all the three functions will add 3, i.e. the last value at which n was pointing to

`n` is a global variable, and it doesn't get evaluated until the function is called, and at that time, after the for loop is executed is equal to 3. As a matter of fact we don't have a closure since `n` is a global variable. The correct way to achieve this would be:

In [None]:
def create_adders():
  adders = []
  for n in range(1, 4):
    adders.append(lambda x, y=n: x + y) # in this way we are saving the value of n at each iteration 
  return adders

adders = create_adders()

adders[1](10)

# since we have specified a default value for `y`, this will be evaluated at creation time, not at runtime (i.e. when the function is called).
# `y` won't point to the object `n` itself but to its value at each iteration.
# Therefore, `y=n` belong to the local scope of the `create_adders` function, therefore, the functions appended to adders are actually closures

## Nested Closure
It is common, e.g. in decorators, to have nested closures:

In [None]:
# define a function that takes an increment and a starting values and return a function that add the increment each time is called. The
def increment(n):

  def inner(start):
    current = start
    
    def inc():
      nonlocal current
      current += n
      return current

    return inc
  return inner

# Now inc has two free variables (current, n) one that lives in the `inner` scope and one in the `increment` scope.
# if we call:
fn = increment(2) # we will return the inner function with the variable n = 2
fn.__code__.co_freevars # `n` is the free variable of the closure containing the `inner` function
# if we than call:
inc_2 = fn(100) # we will return the `inc` function with the variable n = 2 and current = 100
inc_2.__code__.co_freevars # `n` and `current` are the free variables of the closure containing the `inc` function
# now if we call:
inc_2() # -> 102
inc_2() # -> 104


## Application

*hold*


back to [TOC](#TOC)

# Decorators
---

A decorator is a function that takes a function as argument and returns a closure (that in general accept any number of arguments *args and **kwargs) that contain that same function passed with the addition of extra functionality. Let's see an example:

In [None]:
def counter(fn): # counter takes a function as argument
  count = 0
  def inner(*args, **kwargs): 
    nonlocal count
    count +=1
    print(f'Function {fn.__name__} was called {count} times')
    return fn(*args, **kwargs)
  return inner

def add(a, b):
  return a + b

add = counter(add) # closure function is returned by counter()
# now add is no more referencing to the 'def add' function but to it's decorated version, returned by counter()  

result = add(1, 2) # -> 3
result = add(1, 3) # -> 4
result = add(5, 2) # -> 7

result


In the example above, `counter()` is essentially a `decorator`; it takes an arbitrary function with any arbitrary arguments, and return the same function with the new "ability" of taking track of how many times it has been called. We reassigned the name add to the decorated function, to pointing out that the function is still the same but now points to the closure returned by `counter()`. Returning a closure is something pretty common in python, therefore and handy way has been defined to decorate a function using the `@` symbol.

In [None]:
# once the function counter has been defined from previous example

@counter
def add(a, b):
  '''Documentation'''
  return a + b

add(1,2)
add(3,2)

All good so far, but now if we look for the metadata of the function `add` we'll see that these now refers to the closure function `inner` and not to the original definition (`__name__`, `__doc__` etc. point to the closure function). The pythonic solution to this problem is to use the module `functools.wraps`:

In [None]:
add.__name__, add.__doc__

In [None]:
from functools import wraps

def counter(fn):
  count = 0
  @wraps(fn) # we are decorating the inner function
  def inner(*args, **kwargs): 
    nonlocal count
    count +=1
    print(f'Function {fn.__name} was called {count} times')
    return fn(*args, **kwargs)
  return inner

@counter
def add(a, b):
  '''Documentation'''
  return a + b

add.__name__, add.__doc__

## Multiple decorators
Multiple decorators can be passed to a function; care must be taken to ensure that the order of execution of the two or more decorators respect what wanted by the coder. For example:

In [None]:
def dec_1(fn):
  def inner(*args, **kwargs):
    print('dec_1 called')
    result = fn() # calling the decorated function
    return result
  return inner # return the closure

def dec_2(fn):
  def inner(*args, **kwargs):
    print('dec_2 called')
    result = fn() # calling the decorated function
    return result
  return inner # return the closure


@dec_1
@dec_2
def my_func():
  print('my_func called')

'''
Calling my_func, decorated in this order, is equal to do:

my_func = dec_1(dec_2(my_func))

Therefore, first the closure dec_2(my_func) is evaluated and passed to dec_1(). 
Since inside the decorators the print() is executed before the function call (result = fn()), the printing output will be:

dec_1 called
dec_2 called
my_func called

because first the dec_1 function is called, it prints its output, then call fn passed as argument, which is dec_2(my_func);
therefore dec_2 is called, it prints its output, then call the fn passed as argument, i.e. my_func, that prints its output.
N.B. if the print() had been placed after the fn() call, the print-out order would had been reversed!
This is to say that depending on the functionality we want to implement with our decorators, the order of application matters!
'''

my_func()

## Memoization
Another very powerful application of decorators is called `memoization`, i.e. the process of storing data into cache to avoid excessive recursive calculation (like in the fibonacci or factorial function). Let's take as an example a function to compute the fibonacci value at the n position:

In [None]:
# with recursion
def fib(n):
  print(f'computing fib({n})')
  return 1 if n < 3 else fib(n-1) + fib(n-2)

fib(4)

In this way, the function works and it is elegant, but it is not performant since it has to compute each time all the previous numbers in the fibonacci series. A way to solve this problem is to cache the results each time they are computed, and this can be easily implemented creating a class:

In [None]:
class Fib:
  def __init__(self):
    self.cache = {1: 1, 2: 1}

  def fib(self, n):
    if n not in self.cache:
      print(f'computing fib({n})')
      self.cache[n] = self.fib(n-1) + self.fib(n-2)
    return self.cache[n]

f = Fib()
f.fib(4)

In this way, after creating an instance of Fib(), fibonacci sequence will be stored while computed (n.b. cache won't be shared between instances, each new instance will have its cache empty at the beginning). The same can be accomplished with a closure (i.e. with a decorator):

In [None]:
def fib():
  cache = {1: 1, 2: 1}

  def calc_fib(n):
    if n not in cache:
      # cache is a nonlocal parameter
      print(f'computing fib({n})')
      cache[n] = calc_fib(n-1) + calc_fib(n-2)
    return cache[n] 

  return calc_fib # return the closure

f = fib()
f(4)

From the function `fib()` to a decorator the path is short; we just need to generalize its structure:

In [None]:
def memoize_fib(fib):
  cache = dict()

  def inner(n):
    if n not in cache:
      # the decorator is not carrying out the recursion
      # it is only caching values
      cache[n] = fib(n)
    return cache[n] 

  return inner # return the closure

@memoize_fib
def fib(n):
  print(f'computing fib({n})')
  return 1 if n < 3 else fib(n-1) + fib(n-2)

fib(4)

It is worth to note that `memoize_fib` is not a general purpose decorator since it does not accept any number of arguments or keyword arguments (*args, **kwargs) as it usually does, but it is precisely built to work with the function `fib`. Another important aspect to handle is to limit the cache size to safeguard the tradeoff between performance and memory usage. Of course python as already a builtin decorator specifically design for memoization. It comes shipped with the `functools` module.

In [None]:
from functools import lru_cache # least recently used cache

@lru_cache() # lru_cache decorators accept arguments.. see below
def fib(n):
  print(f'computing fib({n})')
  return 1 if n < 3 else fib(n-1) + fib(n-2)

fib(4)

## Parametrized decorators
Parametrized decorators are the ones that can handle arguments (such as `wrap` and `lru_cache`). Imagine we have a decorator that run a function a number of time `n` set by the user:

In [None]:
def run_n_times(fn):
  n =3

  def inner(*args, **kwargs):
    for _ in range(n):
      fn(*args, **kwargs)
    return print(f'{fn.__name__} was called {n} times')

  return inner

@run_n_times
def my_func():
  print('called')
  pass

my_func()

Now, the number of times the function is called has been hardcoded in the decorator, but we want to be able to change that parameter. We can think at something like:

In [None]:
def run_n_times(fn, n: int):
  def inner(*args, **kwargs):
    for _ in range(n):
      fn(*args, **kwargs)
    return print(f'{fn.__name__} was called {n} times')

  return inner

# now we would expect to call the decorator as:
@run_n_times(10)
def my_func():
  print('called')
  pass

# but we get an error since run_n_times requires two arguments

In [None]:
def my_func():
    print('called')
    pass    

# however, the argument `10` in the decorator call is in the position of `fn`, therefore it won't work.
# we could instead apply the decorator indirectly as:
my_func = run_n_times(my_func, 3)
# and this will work but how to implement the same behavior with the @ method?

my_func()

In order to be able to use the `@` symbol with a decorator that accept arguments, we need that decorator to return a decorator itself when called. The result of `run_n_times(10)` has to be another decorator that actually perform the decoration we want. The solution is straightforward: we need to enclose our decorator in a `decorator factory` that will olds the extra parameters needed.

In [None]:
def run_n_times(n: int): # decorator factory

  def inner1(fn): # decorator

    def inner2(*args, **kwargs):
      for _ in range(n):
        fn(*args, **kwargs)
      return print(f'{fn.__name__} was called {n} times')

    return inner2
  
  return inner1

Now the call `@run_n_times(10)` actually returns the decorator `inner1` which implement the functionality we originally looked for:

In [None]:
@run_n_times(3) # returns the decorator `inner1`
def my_func():
  print('called')
  pass

my_func()

In [None]:
# this is equivalent to say:
my_func = run_n_times(3)(my_func)

## Decorator class
Not only functions can be used to create decorators factory, but also classes. As a matter of fact, thanks to the `__call__` method, we can replicate the same exact behavior seen in the previous example:

In [None]:
class MyClass:
  def __init__(self, n): # the instance of the class become the decorator factory
    self.n = n

  def __call__(self, fn): # this is the actual decorator
    def inner(*args, **kwargs):
      for _ in range(self.n):
        fn(*args, **kwargs)
      return print(f'{fn.__name__} was called {self.n} times')
    
    return inner # closure

@MyClass(3)
def my_func():
  print('called')

my_func()

## Monkey Patching and Decorating classes
Functions are not the only object that can be decorated; Classes too can thanks to the dynamic behavior of python that allows the so called `Monkey Patching`, i.e. the modification/addition of attributes/methods to classes at runtime. Essentially, we are able to mutate the behavior of a class at runtime. Imagine we are using the class `Fraction` and we want to add to it some functionality; we can do the following:

In [None]:
from fractions import Fraction

f = Fraction(2,3) # create an instance of the Fraction class
# we want the class `Fraction` to be able to speak ...
# if we write:
Fraction.speak = 100
# we are Monkey Patching the class Fraction at runtime, so if following we say:

f.speak

We can make the `Monkey Patched` methods also callable, for example using a lambda function (we can directly patch the class instead of an instance of it):

In [None]:
Fraction.speak = lambda self, message: f'Fraction says {message}'

# we need `self` as argument because we will pass to the method an instance of the class Fraction
# Now we can call:
f.speak('You cannot pass!') # -> 'Fraction says You cannot pass!!

We can see how the process of monkey-patching is essentially a decoration of a class and, as a matter of fact it can be done with a decorator function:

In [None]:
def decorator_speak(cls): # we are passing a class to the function
  cls.speak = lambda self, message: f'{self.__class__.__name__} says {message}'
  return cls # return is only needed if we want to decorate with the `@` symbol

# Now we can simply write on any class:
class Person:
  pass

Person = decorator_speak(Person) # indirect decoration
p = Person() # instance of the class
p.speak('I am ALIVE!') # method inherited from the decorator

Let's do something more useful; Imagine we want to debug an existing class creating a decorator.

In [None]:
def info(obj): # think of this as of the method we would write inside the class, i.e. 'obj' would be 'self'
  from datetime import datetime, timezone
  results = {
    'time' : f'{datetime.now(timezone.utc)}',
    'name' : obj.__class__.__name__,
    'id' : hex(id(obj)),
    'vars' : [(k,v) for k, v, in vars(obj).items()]
  }
  return results


def debug_class(cls): # This is the decorator
  cls.debug = info
  return cls 


# if we want to pass the decorator in function-style:
debug_class(Person)
# we don't need the function to 'return cls' since we are modifying an object inplace.
# However, if we want to use the `@` we need the return, because otherwise, the default return is 'None'.
Person = debug_class(Person) 
# the rhs is returning None and it is assign it to 'Person' that therefore doesn't point anymore to the class Person nor to its decorated version.

In [None]:
@debug_class
class Person:
  def __init__(self, name, age, employed=True):
    self.name = name
    self.age = age
    self.employed = employed

p = Person('Giovanni', 32)
p.debug()

## Single Dispatch Generic Functions
First lets define what overloading is:

`Overloading` in object oriented programming is the ability to create more then a function with the same name al long as its signature is different (essentially if the two functions are distinguishable, i.e. different number/type of arguments etc..). When the program is compiled, the interpreter will understand, based on the signature at which function with the same name we are referring to. 

In python, since there is no static typing, we can't declare a function signature, therefore, overloading, in its strict sense, is not possible. A workaround to this problem is called  `single dispatch generic function`, which allows us to overload functions based on the type of the first argument (if we want to consider the type of more arguments we need `multi dispatch`).


### Application - `Htmlizer`
Link to [Single_dispatch_generic_function_Htmlizer](Single_dispatch_generic_function_Htmlizer.ipynb)

back to [TOC](#TOC)

# Python optimizations
---

## Interning
Python at startup automatically pre-loads (caches) a global list of integer in the range [-5, 256], so these integers have a fixed memory reference. Since these numbers show up often, avoid to reference these each time they appear results in an optimization. A number outside this range will require a new memory reference, and that's why:

In [None]:
a = 500
b = 500
a is b # will return False

The caches integers are called `Singletons`, basically classes that can be instantiated only once.

The same might happen with some strings; python can interning some string (that follow certain rules, letters and numbers concatenated with underscores) in order to speed up the equality (if a string in interned than i can use the `is` operator, otherwise i have to use the `==` character by character). We can force python to interning strings with the sys module:

In [None]:
'''
usually is something we don't need, unless for example we are working with a large ste of string
for NPL and we need to tokenize some words that are reaped often. In this case it can be a useful optimization,
since if a string is interned it becomes a Singleton and can be compared with the mush faster 'is' operator.
'''
import sys

a = sys.intern('this will be interned')

## Peephole
Is an optimization that occur at compile time (so it is repeated each time the script is launched). For example we can have `Constant expression` like numeric calculation thata are better read as the operation rather than the results:


In [None]:
minute_in_day = 60 * 24 # 1440

The expression `60 * 24` is more readable than `1440` but we may thing that, if the variable is called multiple times, we may have performance issues. This is not the case because this is a constant expression and python knows it, so the first time it encounters the variable stores its results, without having to compute it again.

The same happen for membership tests, i.e. check if an object is in a list. If we have a constant expression, python will replace the mutable object with is immutable counterpart (list-> tuples, sets -> fronzensets)


```py
for i in range(100000):
  if i in [1,2,3]:
    pass
```


The list `[1,2,3]` is converted into a tuple `(1,2,3)` so that, being immutable, it has a fixed memory address.

N.B. sets, since are similar to dictionaries (hashmaps), are much more efficient than lists for membership testing!

back to [TOC](#TOC)

# Common Modules
---

## `string`
Module with some useful string constants and representation.


## `functools`
Module with useful functions:

* `total_ordering` : decorator for classes that automatically implement comparison functionality (le, ge, lt, gt), if only one of these is already implemented
* `reduce`: iterate over a sequence applying a function
* `partial`: lets us set some arguments of a function as default parameters
* `wraps`: decorator that allow to wrap a function/class metadata and keep it after the decoration
* `lru_cache`: decorator that allows caching data in recursive structures


## `itertools`
Module with useful iteration methods:
* `cycle` create a cyclic iterator from an iterable

## `collections`
Module with useful functions:

* `namedtuple` : tuple with argument assignment - substitute of classes or dictionary in some case
* `Counter`: it is a class that takes a list returns a dictionary with the number of occurence of each element in the list

## `random`
Pseudo random number generator:

* `seed` value is required to create a repeatable random sequence (essential for testing)

In [None]:
random.seed(0) # now i can execute thi cell as many time as i want but the output won't change
import random
for _ in range(3):
    print(random.randint(2,10), end=', ')

* `shuffle` inplace mixing up of a list 
* `gauss` draw numbers from a gaussian distribution
* `choice` draw a random element from a list
* `choices` draw a defined number of random element from a list; it has the option to weights the appereance of the elements in the list.
* `sample` draw a sample from a list without repetition

### `timeit`
Platform specific timer for performance evaluation of code

In [None]:
# es. of usage
timeit(stmt='math.sqrt(2)', setup='import math', number=n, globals=globals())

### `argparser`
When we run a python script from terminal it might be useful to be able to pass some arguments/variables that will be used by the script itself. The easiest way to retrieve command arguments is to use the `sys.argv` method which returns a list of strings containing the name of the module runned and the argument passed (arguments must be whitespace-separated).

However, the smart way of doing it is to use the builtin module `argparser`:

```py
import argparse

parser = argparse.ArgumentParser(description="Description of the parser")

# now we populate the parser with what we expect to retrieve in the command line
parser.add_argument('first_arg', help='description of first arg', type=str)
parser.add_argument('second_arg', help='description of second arg', type=int)

# now we need to tell the parser to parse these arguments from sys.argv[1:] 
# (by default if nothing specified inside parse_args())
args = parser.parse_args()

print(args.first_argument)
print(args.second_argument)
```

We can call the module from terminal with the flag `-h` to receive in output the descriptions of the parser, that should help us understand the expected usage of the module

We can also specify keyword arguments and many more options:

```py
parser.add_argument('-kw', '--keyword', help='first kw arg', type=str, required=False, dest='alias name')
```

where the first two arguments are the short and long way of assign the kw argument in the command line, `required` is to specifuy if the argument is mandatory and `dest` gives an alias to the variable that will be actually used in the code. Other argumnet that we may be interested in are:
* `nargs` to accept more value per argument. It can be equal to `+` or `*` depending if we require at least one argument or not 
* `action` to specify a behavior on the argument like `store_true`, `store_constant` etc..

Another useful think is to define a group of mutually exclusive arguments (i.e only one can be specified not both), particularly useful when creating flags:

```py
group = parser.add_mutually_exclusive_group()
group.add_argument('-v', '--verbose', action='store_true')
group.add_argument('-q', '--quite', action='store_true')
# so doing only one betweee -v and -q can be specified
```

back to [TOC](#TOC)

# Tips and tricks

### Python version

In [None]:
import sys
sys.version_info

### Exceptions handling
"look before you need" or "ask for permission" are two different approach we can use to catch errors; the first corrispond to try and except, the latter to and if statement. Generally speaking the if statement is faster than the try-except but it may be less expressive; moreover the try-except becomes a burden only if the exception is raised often, otherwise there is not much difference in computational time. 

### which import is faster
N.B. it matters only if the code is runned a humongus number of time

In [None]:
from timeit import timeit

# SLOWER
import math

# FASTER
from math import sqrt

n = 10_000_000

timeit(stmt='math.sqrt(2)', setup='import math',number=n), timeit(stmt='sqrt(2)', setup='from math import sqrt',number=n)

### Sentinel values instead of None
Sometimes we need to define function or classe with kw arguments and have a way to check if the user passed thata argument or not. The standard way of proceeding is to set the default value to `None`. This is fine, but sometimes `None` itself can be an acceptable parameter given by the user, but we won't catch it.

In [None]:
def some_func(kw=None):
    if not kw:
        print('kw was not passed')
    else:
        print('kw was passed')
        
some_func(), some_func(None), some_func(10)

As we can see, passing `None` as argument result in the wrong behavior (because actually an argument was passed.. it was None!). (btw.  in this case the same goes for any argument that has a truth values of `False`). An alternative approach is tu set a sentinel values as a flag, something so unique that the user could never use it.

To do this, a smart idea is to use as sentinel value the `id` of an object since it will be unique in memory. The most genereci way is to use the id of the python object class `id(object()`:

In [None]:
_SENTINEL = object()

def some_func(kw=_SENTINEL):
    if kw is _SENTINEL:
        print('kw was not passed')
    else:
        print('kw was passed')
        
some_func(), some_func(None), some_func(0), some_func([]), some_func(10)

As expected, now the behavior is correct and we know exactly if the user has passed an argument. Actually we could have set directly the values of `kw` equal to `object()` since it would have been created at runtime by python when encounter the function definition, thus that object id would have been unique.

### Switch statement in python PEP 3103
In other programming language, like Java, a switch is a structure that holds different values, and is able to switch from one to the other depending on the key value passed. In python we have different way to simulate this behavior, the simplest (and worst) is with a series of `if, elif` statements, using a dictionary or, more elegantly using an associative array like a single_dispatch_function

back to [TOC](#TOC)