# Course 2: Working with collections. User defined functions. Modules, packages. Working with files. Exception handling

## Iterating collections

The `for` is a handy method to iterate over a collection - or an *iterable* object. A list, a tuple, a dictionary are are iterable containers. All these objects have an `__iter()__` method defined inside them, which allow you to get an iterator, which in turn is used to sweep the collection element by element:

In [None]:
my_list = [1, 2, 3]
dir(my_list)

In [None]:
# get an iterator
...

 Stepping to the next element is done by calling function `next()`:

In [None]:
...

Stepping beyond the last element raises a `StopIteration` exception:

In [None]:
# print(next(my_iterator))

# ---------------------------------------------------------------------------
# StopIteration                             Traceback (most recent call last)
# ~\AppData\Local\Temp/ipykernel_19520/1188975057.py in <module>
# ----> 1 print(next(my_iterator))

# StopIteration:

The correct approach to iterate over a collection is:

In [None]:
my_iterator = iter(my_list)
 
while True:
    try:
        # Iterate by calling next
        item = next(my_iterator)
        print(item)
    except StopIteration:
        # exception appears when there are no more elements
        break

What are the benefits of iterators? iterators do not compute the items when they are generated, but just when it reaches the current item (lazy evaluation). As a result of this, you can save a lot of memory and CPU cycles by avoiding precomputing of the whole collection. This trick is used in `range` function, which does not generate the whole collection, unless you force this:

In [None]:
my_range = range(100)
print(my_range)
# forcing generation of all the elemnets in the range:
print(list(my_range))

Internally, a `for` cycle uses an interator over the collection.

## Collection comprehension

Suppose we have a list of numbers and we want to produce another list with squared values. It might be done as:

In [None]:
numbers = [1, 3, 6,  21, 22, 32, 33]
squared_numbers = []
...
    
print(squared_numbers)

However, there is a cleaner and mnost often a faster approach. 

Starting from a collection - most often: a list - one can create another list, using *list comprehension*. This is mainly a cycle over the elements of the initial collection:

In [None]:
squared_numbers = [...]
print(squared_numbers)

Optionally, you can add an inline `if` statement:

In [None]:
squares_of_even_numbers = [...]
# note the position of if: for filtering, it is at the end of comprehension expression
print(squares_of_even_numbers)

... and you can even add an `else` clause:

In [None]:
squares_or_cubes = [...]
# inline if-else, see Course 1
print(squares_or_cubes)

Exercise: if a list consists of other lists, how can you get the flattened list? For example, starting from a1 = [[1, 2], [3, 4, 5], [10]] we want to get its flattened version a2 = [1, 2, 3, 4, 5, 10]

In [None]:
a1 = [[1, 2], [3, 4, 5], [10]]
a2 = [...]
print(a2)

Comprehension is available for other types of collections as well: for example, we can start from a list and create a dictionary:

In [None]:
numbers = [1, 3, 6,  21, 22, 32, 33]
dict_numbers_and_squares = {...}
print(dict_numbers_and_squares)

... or lists of tuples:

In [None]:
# cartesian product
colours = [ "red", "green", "yellow", "blue" ]
things = [ "house", "car", "tree" ]
coloured_things = [...]
print(coloured_things)

# perform postprocessing checking
assert ...

Mai jos sunt cateva exemple de utilizare de comprehension peste colectii.

In [None]:
# Conversion from a list of Celsius Temperature to Fahrenheit values: 
# Fahrenheit = 1.8 * Celsius + 32
celsius_degrees = [-20, -10, 0, 5, 23, 35]
fahrenheit_degrees = [...]
print(fahrenheit_degrees)

In [None]:
# Sum of squares of number from 1 to 20
print(sum([x**2 for x in range(1, 21)]))  # note the range 2nd param

In [None]:
# We want to filter out stop-words from a list:
stop_words = ["a", "about", "above", "above", "across", "after", "afterwards", "again", "against", "all", "almost", "alone", "along", "already", "also","although","always","am","among", "amongst", "amoungst", "amount",  "an", "and", "another", "any","anyhow","anyone","anything","anyway", "anywhere", "are", "around", "as",  "at", "back","be","became", "because","become","becomes", "becoming", "been", "before", "beforehand", "behind", "being", "below", "beside", "besides", "between", "beyond", "bill", "both", "bottom","but", "by", "call", "can", "cannot", "cant", "co", "con", "could", "couldnt", "cry", "de", "describe", "detail", "do", "done", "down", "due", "during", "each", "eg", "eight", "either", "eleven","else", "elsewhere", "empty", "enough", "etc", "even", "ever", "every", "everyone", "everything", "everywhere", "except", "few", "fifteen", "fify", "fill", "find", "fire", "first", "five", "for", "former", "formerly", "forty", "found", "four", "from", "front", "full", "further", "get", "give", "go", "had", "has", "hasnt", "have", "he", "hence", "her", "here", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "him", "himself", "his", "how", "however", "hundred", "ie", "if", "in", "inc", "indeed", "interest", "into", "is", "it", "its", "itself", "keep", "last", "latter", "latterly", "least", "less", "ltd", "made", "many", "may", "me", "meanwhile", "might", "mill", "mine", "more", "moreover", "most", "mostly", "move", "much", "must", "my", "myself", "name", "namely", "neither", "never", "nevertheless", "next", "nine", "no", "nobody", "none", "noone", "nor", "not", "nothing", "now", "nowhere", "of", "off", "often", "on", "once", "one", "only", "onto", "or", "other", "others", "otherwise", "our", "ours", "ourselves", "out", "over", "own","part", "per", "perhaps", "please", "put", "rather", "re", "same", "see", "seem", "seemed", "seeming", "seems", "serious", "several", "she", "should", "show", "side", "since", "sincere", "six", "sixty", "so", "some", "somehow", "someone", "something", "sometime", "sometimes", "somewhere", "still", "such", "system", "take", "ten", "than", "that", "the", "their", "them", "themselves", "then", "thence", "there", "thereafter", "thereby", "therefore", "therein", "thereupon", "these", "they", "thickv", "thin", "third", "this", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "top", "toward", "towards", "twelve", "twenty", "two", "un", "under", "until", "up", "upon", "us", "very", "via", "was", "we", "well", "were", "what", "whatever", "when", "whence", "whenever", "where", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "whoever", "whole", "whom", "whose", "why", "will", "with", "within", "without", "would", "yet", "you", "your", "yours", "yourself", "yourselves", "the"]
paragraph_list = ['Stopword','filtering','is','a','common','step','in','preprocessing','text','for','various','purposes','This','is','a','list','of','several','different','stopword','lists','extracted','from','various','search','engines','libraries','and','articles','There', 'is','a','surprising','number','of','different','lists']
print('Initial:\n',paragraph_list)
filtered = [...]
print('\nAfter filtering:\n', filtered)        

### Quick test on collection comprehension speedup

In [None]:
N = 1000000

my_list = range(N)


In [None]:
%%timeit

power_list = []
for item in my_list:
    ...

In [None]:
%%timeit
power_list = [...]

## Functions

There are three types of functions:
* Python built-in or library functions, *e.g.* `len()`, `print()`, `sum()`, `np.sum()`
* User defined functions
* Lambda functions

A function is defined by using the keyword `def`. We must indent the block of statements composing the function. A function can return nothing, *i.e.* no `return` is written in its block, and in this case it is considered it returns `None`. Alternatively, it can return any number of values, stored as lists, tuples, dicts, etc. 

### User defined functions

We jump directly to some examples of user defined functions.

In [None]:
def hello():
    print('Hi there')
    
hello()

In [None]:
def hello_with_name(name):
    """
    The function takes an argument and shows the message: Hello followed by the argument's value.
    It returns the given argument, uppercase
    :param name: the name to greet
    :return: uppercase of :param name:
    """
    ...

name = 'David'
uppercase_name = hello_with_name(name)
print(uppercase_name)

The docstring - 'documentation string' allows for documenting the function. The docstring recommendations are in [pep-0257](https://www.python.org/dev/peps/pep-0257/).

In [None]:
help(hello_with_name)
print(hello_with_name.__doc__)

Styles for writing docstrings:

```
"""
This is a reST style.

:param param1: this is a first param
:param param2: this is a second param
:returns: this is a description of what is returned
:raises keyError: raises an exception
"""
```

```
"""
This is a javadoc style.

@param param1: this is a first param
@param param2: this is a second param
@return: this is a description of what is returned
@raise keyError: raises an exception
"""
```

```
"""
This is an example of Google style.

Args:
    param1: This is the first param.
    param2: This is a second param.

Returns:
    This is a description of what is returned.

Raises:
    KeyError: Raises an exception.
"""
```

```
"""
My numpydoc description of a kind
of very exhautive numpydoc format docstring.

Parameters
----------
first : array_like
    the 1st param name `first`
second :
    the 2nd param
third : {'value', 'other'}, optional
    the 3rd param, by default 'value'

Returns
-------
string
    a value in a string

Raises
------
KeyError
    when a key error
OtherError
    when an other error
"""
```

In [None]:
# An example of function returning multiple results at once
# The result is a tuple with two values

def min_max(a, b):
    """
    Computes the min and max of two values
    :param a: the former parameter, numerical type
    :param b: the latter paramater, numerical type
    :return: a tuple with min and max of :param a: and :param b:, in this order
    """
    if a<b:
        return a, b
    else:
        return b, a
    
x, y = 20, 10
min_2, max_2 = min_max(x, y)
print('Minimum value:', min_2, '; maximum value:', max_2)

In [None]:
# you can handle the params by specifying their name and value
min_max(a=5, b=14)

In [None]:
# you may swap the order
min_max(b=3, a=20)

We can specify params with default values, at the end of list of function's params

In [None]:
def greet(name, msg = "Good morning!"):
   """
   This function greets to the person with the provided message.

   If message is not provided, it defaults to "Good morning!"
   :param name: Name of the guy to be greeted
   :param msg: a message shown as greeting. It defaults to "Good morning"
   """

   print("Hello",name + ', ' + msg)

greet("Kate")
greet("Bruce","How do you do?")
# equivalent: greet(name="Bruce",msg="How do you do?")

We can have a parameter with a variable number of values. This type of arg is written with a leading `*` followed by its name, e.g. `*args`.

In [None]:
# Function with variable number of values
def greet(*names, msg = "Good morning!"):
    ...
        
greet('Dan', 'John', 'Mary')      
greet('Dan', 'John', 'Mary', msg='How do you do?')

One can define functions which manipulate a variable number of params passed as param_name=param_value. The traditional name is `kwargs` (keywords arguments), and its name is prepended with `**`:

In [None]:
...
    
demo_kwargs(fruits='apples', quantity='3', measurement_unit='kg')

By looking at kwargs above, we realize it is a dictionary under the hood:

In [None]:
def demo_kwargs_iter(**kwargs):
    for key, value in kwargs.items():
        print(key, value)
        
demo_kwargs_iter(fruits='apples', quantity='3', measurement_unit='kg')

Using `**` one can unpack a dictionary:

In [None]:
dictionary_arguments = {'fruits':'apples', 'quantity':'3', 'measurement_unit':'kg'}
demo_kwargs(**dictionary_arguments)

The params are specified with a specific ordering:
1. parameters given by their position
1. `*args`
1. parameters with default values
1. `**kwargs`

```python
def example2(arg_1, arg_2, *args, param_3="shark", param_4="blobfish", **kwargs):
```

### Lambda functions

You may use lambda functions (aka anonymous functions), consisting of a an expression, when a separate function definition would be not reused. A lambda function can take any number of arguments and returns the result based on simple computations. Lambda functions should work only with the passed arguments. The `return` keyword is omitted, the computed expression is the returned result. 

In [None]:
sum_as_lambda = ...
print(sum_as_lambda(3, 4))

In [None]:
# lambda function for value filtering
list_30 = list(range(30))
filtered = list(filter(lambda x: x%3==0, list_30))
print(filtered)

In [None]:
# lambda function for sorting:
sorted([-1, -2, -3, 2, 3, 4, -5, 6, 7, 8, 9], key=...)

### Callback functions

In Python, a function's name is a pointer to that function:

In [None]:
def sum_2(x, y):
    return x+y

def dif_2(x, y):
    return x - y

# print functions, not results of function calls
print(sum_2)
print(dif_2)

We can pass a function by its name as parameters to another functions:

In [None]:
def complex_operation(x, y, to_be_called):
    ...

print(complex_operation(2, 3, sum_2))
print(complex_operation(2, 3, dif_2))

### Generators

Suppose you want to impleemnt a Python function which returns a collection of elements. You may proceeed in two ways:
1. Create the collection and return it as such
2. Give access to each element of it, one by one

The first approach ie "eager loading", and may consume a lot of memory. If we stop from iterating its elements after the 10th item, we wasted time and memory to build and store it. 

The second approach favurs "lazy loading": an item of the collection is issued only when it is requested.

The second point is implemented by using generator functions, which benefit from `yield` statement: 

In [None]:
def lazy_generator():
    ...

In [None]:
# iterate with for
for item in lazy_generator():
    print(item)

In [None]:
# using iterators
iterator = iter(lazy_generator())
print(next(iterator))
print(next(iterator))
print(next(iterator))

Note that you do not return the values with `return`, but with `yield`. Return would definitely end the function call and all the elements yielded by the generator function will not be available.

The example above use hardcoded collection, a more realistic value would be:

In [None]:
def gen_squares(up_to):
    ...
        
for v in gen_squares(10):
    print(v)

## Type annotations

Starting with Python 3.5, one can annotate variables, function's params, and their return types. This operation is optional and improves code readability. The annotations are introduced in Python Enhancement Proposal document [PEP484](https://www.python.org/dev/peps/pep-0484/).

Aside from readability, one can use code analysis tools, as found for example in PyCharm or [MyPy](http://mypy-lang.org), or you can benefit from improved code completion support.

Type annotations are made after annotated entity's name, using colon and type's name:

In [None]:
age...
name...

Annotation does not further restict to changing the type of the variable in subsequent statements:

In [None]:
age = 21.5

A short example on function annotation follows:

In [None]:
def f(name:str, age:int) -> str:
    return 'Hello ' + name + ', you are ' + str(age) + ' years old'

f('William',23)

For complex types we can use the module `typing`, which provides annotation types like `Dict`, `Tuple`, `List`, `Set`, etc.

In [None]:
from typing import List

def mutiple_greetings(names: ..., ages: ...) -> None:
    for n, v in zip(names, ages):
        print(f'Hello {n}, you are {v} years old')
        
mutiple_greetings(['Ana', 'Dan', 'George'], [20, 21, 22])

You can define you own types for annotation:

In [None]:
from typing import Tuple

Point2D = Tuple[int, int]

def plot_point(point: Point2D) -> bool:
    ## ...do something
    return True

def rotate_point(p:Point2D, angle:float) -> Point2D:
    ## some statements
    ## newPoint = ....
    return newPoint

If a variable can have more than one possible type, we can do as follows:

In [None]:
from typing import Union

def print_value(value: ...) -> None:
    print(value)

If a variable can have the value `None`, you can annotate it  as:

In [None]:
from typing import Optional

def f(param: ...) -> str:
    if param is not None:
        return param.upper()
    else:
        return ""

For more elaborated annotations: callback functions, collections etc. we suggest the interested reader to dive into the bibliography.

### Recommended bibliography

1. [PEP484](https://www.python.org/dev/peps/pep-0484/) 
1. [typing — Support for type hints](https://docs.python.org/3/library/typing.html)
1. [Type hints cheat sheet (Python 3)](https://mypy.readthedocs.io/en/latest/cheat_sheet_py3.html)

## User-defined Python modules

A Python module is a file with extenion `py` hosting functions, classes, and variables. Importing of a module is done with the statement `import`. 

Example: we create a module - file Python mySmartModule.py - containing a function which computes the sum of elements in a list:


```python
# file mySmartModule.py
def my_sum(lst: List[Union[float, int]]) -> Union[float, int]:
    sum = 0
    for item in lst:
        sum += item
    return sum
```

We can use this module as:
```python
import mySmartModule

lst = [1, 2, 3]

sum_lst = mySmartModule.my_sum(lst)
print(sum_lst)
```

We can define an alias for the imported module:
```python
import mySmartModule as msm
```
and in this case we use it as:
```python
sum = msm.my_sum(lst)
```

The content of a module can be found with `dir`:
```python
>>> dir(msm)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'my_sum']
```

The elements with `__` as prefix and suffix are automatically added by Python. 

If one wants to be able to access all items defined in a module, without using `module_name.item_name`, one can write as follows:
```python
from mySmartModule import *

print(my_sum([1, 2, 3]))
```

However, it is recommended to import strictly the required items (or to explicitely enumerate all items, if needed), to avoid accidental overwrites of alreay implemented entities with the same name:

```python
from mySmartModule import my_sum

print(my_sum([1, 2, 3]))
```

When a module is imported, the search order is:
1. current directory
1. look in the location defined by environment variable `PYTHONPATH`, if defined
1. look in the default path

The default path is given by the variable `path` from module `sys`:

In [None]:
import sys
print(sys.path)

If a user puts a module into a path which is not among the ones above, then she can append it to the `sys.path` variable:

In [None]:
sys.path.append('./my_modules/')
from mySmartModule import my_sum
print(my_sum([1, 2, 3]))

A module can be used in two ways:
1. to expose different function or classes implementations, ot to access predefined varibles (e.g. `math.pi`):
```python
import math
print(math.pi)
```
2. it can be launched by itself as a script, by writing in command line interface: `python mySmartModule.py`. In this case, the desired code is written inside the Python module `mySmartModule.py` as:
```python
if __name__ == '__main__':
    # code which is executed when one launches this script
```

The code under the `if __name__ == '__main__'` will not be executed if the module is imported.

Example:
```python
def my_sum(my_list):
    sum = 0
    for item in my_list:
        sum += item
    return sum
	
if __name__ == '__main__':
	print('Usage example')
	a_list = list(range(100))
	print(my_sum(a_list))
```

## User-defined Python packages

A package is a collections of modules. Physically, it is a directory cotaining modules and other packages. It is mandatory to have inside the directory a file named `__init__.py` in any directory which is to be seen as a package. This file might even be empty, at the beginning.

We start from the following structure:
```
---myUtils\
 |------ mySmartModule.py
 |------ __init__.py
```

To import the function `my_sum` from module `mySmartModule.py` which is situated in directory (package) `myUtils` we could write:
```python
from myUtils.mySmartModule import my_sum
print(my_sum([1, 2, 3]))
```
but we would like to write it more briefly:
```python
from myUtils import my_sum
print(my_sum([1, 2, 30]))
```
that is to acoid explicitely mentioning the module `mySmartModule` from within package `myUtils`. To do this, we will add into the file `__init__.py` from within `myUtils` the line:
```python
from .mySmartModule import my_sum 
```
where the leading `.` refers to current directory (relative path). 

In [None]:
from myUtils.mySmartModule import my_sum
print(my_sum([1, 2, 10, 300]))

In [None]:
from myUtils import my_sum
print(my_sum([1, 2, 10, 300]))

One puts into `__init__.py` everything which is related to package's intitialization, e.g. loading data from disc or setting some variables to proper values.

If you want to create packages to be use dby a large comunity, and to publish them on PyPI, please folow [this tutorial](https://python-packaging.readthedocs.io/en/latest/).

Other short examples of package usage follow:

In [None]:
import re # package for regular expressions
my_string = 'I bought: apples, pies, bread... and coffee'
tokens = re.split(r'\W+', my_string)
print(tokens)

In [None]:
# Serialize values with pickle
import pickle

toys = { "lion": "yellow", "kitty": "red" }

pickle.dump( toys, open( "toys.pkl", "wb" ) )
del toys # no longer needed, will recover it from pickle file

# restore it
toys_restored = pickle.load( open( "toys.pkl", "rb" ) )
print('After deserialization:', toys_restored)

!del toys.pkl # remove the pickle file from the disk

## Working with files in Python

Opening of a file is done with:

In [None]:
f = open("log_file.txt")
# which is the same as the preferable explicit form:
f = ...
# rt: read mode, text file
# for reading a binary file:
# f = open("image.zzx", "rb")

You must close the file once you've done working with it: 

In [None]:
f.close()

To ensure the file is always closed, we suggest to follos the pattern:

In [None]:
with open("log_file.txt", "rt") as f:
    # read file's content
    pass
# no need to explicitely close it, the `with` block autoamtically calls f.close()

Getting a single line from `f` is done with method f.readline():

In [None]:
with open("log_file.txt", "rt") as f:
    first_line = ...
    
print(first_line)

Getting all the lines is done with `readlines()`:

In [None]:
with open("log_file.txt", "rt") as f:
    all_lines = ...
    
print(all_lines)
# strip the newline char

Opening a file for writing into it can be done in two ways: append (`a`) or write (`w`) mode. The latter one will overwrite the old content. 

In [None]:
with open("log_file.txt", "at") as f:
    f.write('Line added from program')

If you want to maipulate files and dirs, use package `os` or `shutil`.

## Exception handling

Exceptions are outlier cases encountered while running the code: invalid numerical operations, files bot found, illegal access to data structures, etc. 

If a block of code/a function call/etc is susceptible to produce an exception, we put it inside a `try...except` block:

In [None]:
my_values = [1, 2, 0, 3]

try:
    ...
except:
    print('Cannot compute it')

It is recommended to bind the caught exception to an object which can be further investigated:

In [None]:
my_values = [1, 2, 0, 3]

try:
    for v in my_values:
        print(1.0 / v)
except Exception as ex:
    print(f'Cannot compute it: {ex}')

Multiple except clauses may be used:

In [None]:
try:
   # compute
   pass

except ValueError:
   # handler for ValueError exception
   pass

except (IndexError, KeyError):
   # handle multiple exceptions
   # IndexError and KeyError
   pass

except:
   # catch-all
   pass

One may raise an exception if wanted:

In [None]:
# a = -1
# # ... code
# if a < 0:
#     raise ValueError(f'The argument must be >= 0, got {a}')

A `try` statement can optionally have a `finally` block. This block is always executed, regardless whether the previous `try` block succeeded or not.

In [None]:
try:
    f = open("log_file.txt", "rt")
    _ = 1/0
except ZeroDivisionError:
    pass
finally:
    ...
    print('For sure, f is closed now')