# Curs 2: Working with collections. User defined functions. Modules, packages. The NumPy package

## Collection comprehension

Starting from a collection - most often: a list - one can create another list, using *list comprehension*. This is mainly a cycle over the elements of the initial collection:

In [None]:
numbers = [1, 3, 6,  21, 22, 32, 33]
squared_numbers = [...]
print(squared_numbers)

Optionally, you can add an inline `if` statement:

In [None]:
squares_of_even_numbers = [...]
print(squares_of_even_numbers)

... and you can even add an `else` clause:

In [None]:
squares_or_cubes = [...]
print(squares_or_cubes)

Exercise: if a list consists of other lists, how can you get the flattened list? For example, starting from a1 = [[1, 2], [3, 4, 5], [10]] we want to get its flattened version a2 = [1, 2, 3, 4, 5, 10]

In [None]:
a1 = [[1, 2], [3, 4, 5], [10]]
a2 = [...]
print(a2)

Comprehension is available for other types of collections as well: for example, we can start from a list and create a dictionary:

In [None]:
numbers = [1, 3, 6,  21, 22, 32, 33]
dict_numbers_and_squares = {...}
print(dict_numbers_and_squares)

... or lists of tuples:

In [None]:
# cartesian product
colours = [ "red", "green", "yellow", "blue" ]
things = [ "house", "car", "tree" ]
coloured_things = [...]
print(coloured_things)

assert len(coloured_things) == len(colours) * len(things)

Mai jos sunt cateva exemple de utilizare de comprehension peste colectii.

In [None]:
# Conversuion from a list of Celsius Temperature to Fahrenheit values: Fahrenheit = 1.8 * Celsius + 32
celsius_degrees = [-20, -10, 0, 5, 23, 35]
fahrenheit_degrees = [...]
print(fahrenheit_degrees)

In [None]:
# Sum of squares of number from 1 to 20
print(sum([...]))  # note the range 2nd param

In [None]:
# We want to filter out stop-words from a list:
stop_words = ["a", "about", "above", "above", "across", "after", "afterwards", "again", "against", "all", "almost", "alone", "along", "already", "also","although","always","am","among", "amongst", "amoungst", "amount",  "an", "and", "another", "any","anyhow","anyone","anything","anyway", "anywhere", "are", "around", "as",  "at", "back","be","became", "because","become","becomes", "becoming", "been", "before", "beforehand", "behind", "being", "below", "beside", "besides", "between", "beyond", "bill", "both", "bottom","but", "by", "call", "can", "cannot", "cant", "co", "con", "could", "couldnt", "cry", "de", "describe", "detail", "do", "done", "down", "due", "during", "each", "eg", "eight", "either", "eleven","else", "elsewhere", "empty", "enough", "etc", "even", "ever", "every", "everyone", "everything", "everywhere", "except", "few", "fifteen", "fify", "fill", "find", "fire", "first", "five", "for", "former", "formerly", "forty", "found", "four", "from", "front", "full", "further", "get", "give", "go", "had", "has", "hasnt", "have", "he", "hence", "her", "here", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "him", "himself", "his", "how", "however", "hundred", "ie", "if", "in", "inc", "indeed", "interest", "into", "is", "it", "its", "itself", "keep", "last", "latter", "latterly", "least", "less", "ltd", "made", "many", "may", "me", "meanwhile", "might", "mill", "mine", "more", "moreover", "most", "mostly", "move", "much", "must", "my", "myself", "name", "namely", "neither", "never", "nevertheless", "next", "nine", "no", "nobody", "none", "noone", "nor", "not", "nothing", "now", "nowhere", "of", "off", "often", "on", "once", "one", "only", "onto", "or", "other", "others", "otherwise", "our", "ours", "ourselves", "out", "over", "own","part", "per", "perhaps", "please", "put", "rather", "re", "same", "see", "seem", "seemed", "seeming", "seems", "serious", "several", "she", "should", "show", "side", "since", "sincere", "six", "sixty", "so", "some", "somehow", "someone", "something", "sometime", "sometimes", "somewhere", "still", "such", "system", "take", "ten", "than", "that", "the", "their", "them", "themselves", "then", "thence", "there", "thereafter", "thereby", "therefore", "therein", "thereupon", "these", "they", "thickv", "thin", "third", "this", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "top", "toward", "towards", "twelve", "twenty", "two", "un", "under", "until", "up", "upon", "us", "very", "via", "was", "we", "well", "were", "what", "whatever", "when", "whence", "whenever", "where", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "whoever", "whole", "whom", "whose", "why", "will", "with", "within", "without", "would", "yet", "you", "your", "yours", "yourself", "yourselves", "the"]
paragraph_list = ['Stopword','filtering','is','a','common','step','in','preprocessing','text','for','various','purposes','This','is','a','list','of','several','different','stopword','lists','extracted','from','various','search','engines','libraries','and','articles','There', 'is','a','surprising','number','of','different','lists']
print('Initial:\n',paragraph_list)
filtered = [...]
print('\nAfter filtering:\n', filtered)        

## Functions

There are three types of functions:
* Python built-in functions, *e.g.* `len()`, `print()`
* User defined functions
* Lambda functions

A function is defined by using the keyword `def`. We must indent the block of statements composing the function. A function can return nothing, *i.e.* no `return` is written in its block, and in this case it is considered it returns `None`. Alternatively, it can return any number of values, stored as lists, tuples, dicts, etc. 

### User defined functions

We jump directly to some examples of user defined functions.

In [None]:
def hello():
    print('Hi there')
    
hello()

In [None]:
def hello_with_name(name):
    """
    The function takes an argument and shows the message: Hello followed by the argument's value.
    It returns the given argument, uppercase
    :param name: the name to greet
    :return: uppercase :param name:
    """
    ...

name = 'David'
uppercase_name = hello_with_name(name)
print(uppercase_name)
help(hello_with_name)
print(hello_with_name.__doc__)

In [None]:
# An example of function returning multiple results at once
# The results is a tuple with two values
def min_max(a, b):
    ...
    
x, y = 20, 10
min_2, max_2 = min_max(x, y)
print('Minimum value:', min_2, '; maximum value:', max_2)

In [None]:
# you can handle the params by specifying their name and value
min_max(a=5, b=14)

In [None]:
# you may swap the order
min_max(b=3, a=20)

We can specify params with default values, at the end of list of function's params

In [None]:
def greet(name, msg = "Good morning!"):
   """
   This function greets to the person with the provided message.

   If message is not provided, it defaults to "Good morning!"
   :param name: Name of the guy to be greeted
   :param msg: a message shown as greeting. It defaults to "Good morning"
   """
   print("Hello",name + ', ' + msg)

greet("Kate")
greet("Bruce","How do you do?")
# equivalent: greet(name="Bruce",msg="How do you do?")

We can have a parameter with a variable number of values. This type of arg is written with a leading `*` followed by its name, e.g. `*args`.

In [None]:
# Function with variable number of values
def greet(*names, msg = "Good morning!"):
    for name in names:
        print('Hello', name + ', ' + msg)
        
greet('Dan', 'John', 'Mary')      
greet('Dan', 'John', 'Mary', msg='How do you do?')

One can define functions which manipulate a variable number of params passed as param_name=param_value. The traditional name is `kwargs` (keywords arguments), and its name is prepended with `**`:

In [None]:
...
    
demo_kwargs(fruits='apples', quantity='3', measurement_unit='kg')

By looking at kwargs above, we realize it is a dictionary under the hood:

In [None]:
def demo_kwargs_iter(**kwargs):
    for key, value in kwargs.items():
        print(key, value)
        
demo_kwargs_iter(fruits='apples', quantity='3', measurement_unit='kg')

Using `**` one can unpack a dictionary:

In [None]:
dictionary_arguments = {'fruits':'apples', 'quantity':'3', 'measurement_unit':'kg'}
demo_kwargs(**dictionary_arguments)

The params are specified with a specific ordering:
1. parameters given by their position
1. `*args`
1. parametri with default values
1. `**kwargs`

```python
def example2(arg_1, arg_2, *args, param_3="shark", param_4="blobfish", **kwargs):
```

### Lambda functions

You may use lambda funcitons (aka anonymous functions), consisting of a an expression, when a separate function definition is not needed/not reused. A lambda function can take any number of arguments and returns the result based on simple computations. Lambda functions should work only with the passed arguments. The `return` keyword is omitted, the computed expression is the returned result. 

In [None]:
sum_as_lambda = ...
print(sum_as_lambda(3, 4))

In [None]:
# lambda function for value filtering
list_30 = list(range(30))
filtered = ...
print(filtered)

In [None]:
# lambda function for sorting:
sorted(...)

### Callback functions

In Python, a function's name is a pointer to that function:

In [None]:
def sum_2(x, y):
    return x+y

def dif_2(x, y):
    return x - y

print(sum_2)
print(dif_2)

We can pass a function by its name as parameters to another functions:

In [None]:
def complex_operation...

print(complex_operation(2, 3, sum_2))
print(complex_operation(2, 3, dif_2))

## Type annotations

Starting with Python 3.5, one cand annotate variables, function's parameters, and their return types. This operation is optional and improves code readability. The annotations are introduced in Python Enhancement Proposal document [PEP484](https://www.python.org/dev/peps/pep-0484/).

Aside from readability, one can use code analysis tools, as found for example in PyCharm or [MyPy](http://mypy-lang.org), or you can benefit from improved code completion support.

Type annotations are made after annotated entity's name, using colon and type's name:

In [None]:
age...
name...

Annotation does not further restict to changing the type of the variable in subsequent statements:

In [None]:
age = 21.5

A short example on function annotation follows:

In [None]:
def f...

f('William',23)

For complex types we can use the module `typing`, which provides annotation types like `Dict`, `Tuple`, `List`, `Set`, etc.

In [None]:
from typing import List

def mutiple_greetings(names..., ages...) -> None:
    for n, v in zip(names, ages):
        print(f'Hello {n}, you are {v} years old')
        
mutiple_greetings(['Ana', 'Dan', 'George'], [20, 21, 22])

Se pot defini tipuri utilizator, folosindu-ne de cele disponibile:

In [None]:
from typing import Tuple

Point2D = Tuple[int, int]

def plot_point(point: Point2D) -> bool:
    ## ...do something
    return True

def rotate_point(p:Point2D, angle:float) -> Point2D:
    ## some statements
    ## newPoint = ....
    return newPoint

If a variable can have more than one possible type, we can do as follows:

In [None]:
from typing import Union

def print_value(value: Union[str, int, float]) -> None:
    print(value)

If a variable can become `None`, you can annotate it  as:

In [None]:
from typing import Optional

def f(param: ...) -> str:
    if param is not None:
        return param.upper()
    else:
        return ""

For more elaborated annotations: callback functions, collections etc. we suggest the interested reader to dive into the bibliography.

### Recommended bibliography

1. [PEP484](https://www.python.org/dev/peps/pep-0484/) 
1. [typing — Support for type hints](https://docs.python.org/3/library/typing.html)
1. [Type hints cheat sheet (Python 3)](https://mypy.readthedocs.io/en/latest/cheat_sheet_py3.html)

## Modules

A Python module is a file with extenion `py` hosting functions, classes, and variables. Importing of a module is done with the statement `import`. 

Example: we create a module - file Python mySmartModule.py - containing a function which computes the sum of elements in a list:


```python
# file mySmartModule.py
def my_sum(lst: List[Union[float, int]]) -> Union[float, int]:
    sum = 0
    for item in lst:
        sum += item
    return sum
```

We can use this module as:
```python
import mySmartModule

lst = [1, 2, 3]

sum_lst = mySmartModule.my_sum(lst)
print(sum_lst)
```

We can define an alias for the imported module:
```python
import mySmartModule as msm
```
and in this case we use it as:
```python
suma = msm.my_sum(lst)
```

The content of a module can be found with `dir`:
```python
>>> dir(msm)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'my_sum']
```

The elements with `__` as prefix and suffix are automatically added by Python. 

If one wants to be able to access all items defined in a module, without using `module_name.item_name`, one can write as follows:
```python
from mySmartModule import *

print(my_sum([1, 2, 3]))
```

However, it is recommended to import strictly the required items (or to explicitely enumerate all items, if needed), to avoid accidental overwrites of alreay implemented entities with the same name:

```python
from mySmartModule import my_sum

print(my_sum([1, 2, 3]))
```

When a module is imported, the search order is:
1. cuurent directory
1. look in the location defined by environment variable `PYTHONPATH`, if defined
1. look in the default path

The default path is given by the variable `path` from module `sys`:

In [None]:
import sys
print(sys.path)

If a user puts a module into a path which is not among the ones above, then she can append it to the `sys.path` variable:

In [None]:
sys.path.append('./my_modules/')
from mySmartModule import my_sum
print(my_sum([1, 2, 3]))

A module can be used in two ways:
1. to expose different function or classes implementations, ot to access predefined varibles (e.g. `math.pi`):
```python
import math
print(math.pi)
```
2. it can be launched by itself as a script, by writing in command line interface: `python mySmartModule.py`. In this case, the desired code is written inside the Python module `mySmartModule.py` as:
```python
if __name__ == '__main__':
    # code which is executed when one launches thos script
```

The code under the `if __name__ == '__main__'` will not be executed if the module is imported.

Example:
```python
def my_sum(lista):
    sum = 0
    for item in lista:
        sum += item
    return sum
	
if __name__ == '__main__':
	print('Usage example')
	a_list = list(range(100))
	print(my_sum(a_list))
```

## Python packages

A package is a collections of modules. Phisically, it is a directory cotaining modules and other packages. It is mandatory to have a file named `__init__.py` in any directory which is to be seen as a package. This file might even be empty, at the beginning.

We start from the following structure:
```
---myUtils\
 |------ mySmartModule.py
 |------ __init__.py
```

To import the function `my_sum` from module `mySmartModule.py` which is situated in directory (package) `myUtils` we could write:
```python
from myUtils.mySmartModule import my_sum
print(my_sum([1, 2, 3]))
```
but we would like to write it more briefly:
```python
from myUtils import my_sum
print(my_sum([1, 2, 30]))
```
that is to acoid explicitely mentioning the module `mySmartModule` from within package `myUtils`. To do this, we will add into the file `__init__.py` from within `myUtils` the line:
```python
from .mySmartModule import my_sum 
```
where the leading `.` refers to relative path. 

In [None]:
from myUtils.mySmartModule import my_sum
print(my_sum([1, 2, 10, 300]))

In [None]:
from myUtils import my_sum
print(my_sum([1, 2, 10, 300]))

One puts into `__init__.py` everything which is related to package's intitialization, e.g. loading data from disc or setting some variables to proper values.

If you want to create packages to be use dby a large comunity, and to publish them on PyPI, please folow [this tutorial](https://python-packaging.readthedocs.io/en/latest/).

Other shotr examples of package usage follow:

In [None]:
import re # package for regular expressions
my_string = 'I bought: apples, pies, bread... and coffee'
tokens = re.split(r'\W+', my_string)
print(tokens)

In [None]:
# Serialize values with pickle
import pickle

toys = { "lion": "yellow", "kitty": "red" }

pickle.dump( toys, open( "toys.pkl", "wb" ) )
del toys # no longer needed, will recover it from pickle file

# restore it
toys_restored = pickle.load( open( "toys.pkl", "rb" ) )
print('After deserialization:', toys_restored)

!del toys.pkl # remove the pickle file from the disk

## The NumPy package

NumPy (Numerical Python) is the core package for scientific computations in Python. It supports vectors and multidimensional matrices (tensors), functions for random number generation, linear algebra, signal processing, basic stats, etc. NumPy is a standard used by other packages. The manipulated data must fit RAM memory. NumPy relies on  C compiled code. 

In many situations data can be converted to numbers:
* an image in grey can be seen as a 2d matrix; every number is the intensity of the corresponding pixel  (0 - black, 255 - white)
![Imagine minsi scalata 0-1](./images/mnist_image.png)
* a colour image can be seen as a 3d matrix, of represented as RGB: the "parallel" matrices are representing a colour channel:
![Imagine RGB descompusa in 3 canale](./images/channelsrgb.gif)
* an audio file can be seen as one or more vectors, where the number of vectors is the number of recoring channels. For a wav file, the values represent membrane's displacement, sampled in time:
![sunet](./images/sound.png)
* a text can be translated in numerical vectors by using techniques as [Bag of words](https://en.wikipedia.org/wiki/Bag-of-words_model), [Word2vec](https://en.wikipedia.org/wiki/Word2vec), or any other type of embedding.

Vector and matrix representation used by NumPy is much more efficient than Python lists. The NumPy code actually uses native libraries, not interpreted code. If you vectorize you code, the efficiency is even greater on actual CPUs. 

The basci data typoe from NumPy is `ndarray` - n-dimensional array. 

In [None]:
# importing numpy, traditionally aliased as np
import numpy as np

# create a vector starting from a Python list
x = np.array([1, 4, 2, 5, 3])

# printing x's type
print(...) 

# all the elements in the array are of the same type
print(...) 

# you can explicitely specify the underlying type in the array
y = np.array([1, 2, 3], dtype=np.float16)
print(y.dtype)

In [None]:
# useful functions
all_zeros = ...
print(all_zeros)
# printing number of elements on each dimension (axis)
print(...)

In [None]:
# 2d matrix
mat = ...   # note we start from lists of lists
print(mat)
print(mat.shape)
print(mat[0, 1])

In [None]:
# matrix with the same value:
mat_7 = np.ones((4, 10)) * 7
print(mat_7)

The number of dimensions of a matrix is given by:

In [None]:
print('Number of dimensions of all_zeros:', all_zeros.ndim)
print('Number of dimensions of mat:', mat.ndim)

The total number of elements, byte size of an element are found as:

In [None]:
print('mat size: {0}\nmat element size: {1} bytes\nmat.dtype:{2}'.format(mat.size, mat.itemsize, mat.dtype))

In [None]:
# usefull cases
all_pi = np.full((3, 2), np.pi)

print(all_pi)
print(np.eye(3))

In [None]:
# equally spaced values in an interval; the interval bounds are part of the vector
print(np.linspace(0, 10, 5))

In [None]:
# `arange` works identically with Python's range function
some_values = ...
print(some_values)
print(type(some_values))

In [None]:
# random numbers
x = ...
print(x)

Tipurile de date folosibile pentru ndarrays sunt:

| Type  | Explanation |
| ---- | -----------|
| bool_ | 	Boolean (True or False) stored as a byte | 
| int_ | 	Default integer type (same as C long; normally either int64 or int32) | 
| intc | 	Identical to C int (normally int32 or int64) | 
| intp | 	Integer used for indexing (same as C ssize_t; normally either int32 or int64) | 
| int8 | 	Byte (-128 to 127) | 
| int16 | 	Integer (-32768 to 32767) | 
| int32 | 	Integer (-2147483648 to 2147483647) | 
| int64 | 	Integer (-9223372036854775808 to 9223372036854775807) | 
| uint8 | 	Unsigned integer (0 to 255) | 
| uint16 | 	Unsigned integer (0 to 65535) | 
| uint32 | 	Unsigned integer (0 to 4294967295) | 
| uint64 | 	Unsigned integer (0 to 18446744073709551615) | 
| float_ | 	Shorthand for float64. | 
| float16 | 	Half precision float: sign bit, 5 bits exponent, 10 bits mantissa | 
| float32 | 	Single precision float: sign bit, 8 bits exponent, 23 bits mantissa | 
| float64 | 	Double precision float: sign bit, 11 bits exponent, 52 bits mantissa | 
| complex_ | 	Shorthand for complex128. | 
| complex64 | 	Complex number, represented by two 32-bit floats (real and imaginary components) | 
| complex128 | 	Complex number, represented by two 64-bit floats (real and imaginary components) |

A popular operation is taking an array and changin its shape:

In [None]:
# from a vector to a matrix
vec = np.arange(10)
mat = vec.reshape(2, 5)
print(vec)
print(mat)

In [None]:
#...and vice versa:
vec2 = mat.flatten()
print(vec2)

Matrices can be concatenated; the concatenation dimension (axis) should be specified:

In [None]:
a = np.array([[1, 2], [3, 4]], float)
b = np.array([[5, 6], [7,8]], float)

In [None]:
# vertical concatenation (stacking)
vertical = np....
print(vertical)

The axis concept is defined for matrices with at least two dimensions. For a 2D matrix, the axis 0 sweeps the matrix vertically, while axis 1 sweeps it horizontally.

In [None]:
# horizontal concatenation
horizontal = np...
print(horizontal)

In [None]:
# same as:
vertical = np.vstack((a, b))
horizontal = np.hstack((a, b))
print(vertical)
print(horizontal)

In [None]:
matrix = np.arange(15).reshape(3, 5)
print(matrix)

In [None]:
sum_by_columns= ...
print(sum_by_columns)

In [None]:
sum_by_rows= ...
print(sum_by_rows)

### Operations with `ndarrays`

Sunt implementate operatiile matematice uzuale din algebra liniara: inmultire cu scalari, adunare, scadere, inmultire de matrice.

Exemple:
* adunare
* produs Hadamard, produs matricial, produs scalar:

In [None]:
# multiplying by a scalar
a = np.array([[1, 2, 3], [4, 5, 6]])
print('a=\n', a)
b = ...
print('b=\n', b)

In [None]:
# add and substract
sum_mat = a + b
print(sum_mat)
diff_mat = a - b
print(diff_mat)

The multiplication operator \* is implemented in a different way from linear algebra: For two matrices with 
same shapes the elements at the same coordinates are multiplied: c[i, j] = a[i, j] * b[i, j]. This is called pointwise multiplication, or Hadamard product, and it is often used in signal processing and machine learning

In [None]:
# multiplication by * leads to Hadamard product: c[i, j] = a[i, j] * b[i, j]
c = a*b
print(c)
for i in range(c.shape[0]):  # c.shape[0] = number of rows of matrix c
    for j in range(c.shape[1]): # c.shape[1] = number of columns of matrix c
        print(c[i, j] == a[i, j] * b[i, j])

The above operations are using linear algebra libraries, which are optimized for the current microprocessors. It is recommended to use them instead of (nested) for cycles:

In [None]:
# create matrices
matrix_shape = (100, 100)
a_big = np.random.random(matrix_shape)
b_big = np.random.random(matrix_shape)

In [None]:
%%timeit
c_big = np.empty_like(a_big)
for i in range(c_big.shape[0]):
    for j in range(c_big.shape[1]):
        c_big[i, j] = a_big[i, j] * b_big[i, j]

In [None]:
%%timeit
c_big = a_big * b_big

In [None]:
# 'raising to power' using ** : each element of the matrix is individually raised to that power.
print('initial matrix:\n', a)
a_to_the_power_of_2 = a ** 2
print('after squaring each component:\n', a_to_the_power_of_2)
power_3 = np.power(a, 3)
print('after raising to the power of 3:\n', power_3)

You can use the / operator to get pointwise division (element by element) of two matrices:

In [None]:
print('a=', a)
print('b=', b)
print('a/b=', a/b)

In [None]:
# raising a square matrix to a power, as defined in linear algebra
square_matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
pow_3 = np.linalg.
print(pow_3)

If you call a Numpy-defined numerical function on an ndarray, the result will be an ndarray of same shape as the original one. Its elements are computed by applying the function on each of the initial matrix componets, one by one:

In [None]:
x = np.arange(6).reshape(2, 3)
print(x)
y = np.exp(x)
assert x.shape == y.shape
for i in range(0, x.shape[0]):
    for j in range(0, x.shape[1]):
        assert ...

In [None]:
# linear-algebra style matrix product
a = np.random.rand(3, 5)
b = np.random.rand(5, 10)
assert a.shape[1] == b.shape[0]
c = np....
# equivalent form
c = a....
assert a.shape[0] == c.shape[0] and b.shape[1] == c.shape[1]
# or even shorter:
c = a@b

NumPy defines a whole pletora of functions: `all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where` - [docs here](https://docs.scipy.org/doc/numpy-dev/reference/generated/).

## Indexing 

So far, we used individual indices to refer to particular elements in a NumPy array:
```python
vector[index]
# or
matrix[i, j]
```

In [None]:
vector = np.arange(10)
print(vector)
print('vector[4]={0}'.format(vector[4]))

In [None]:
matrix = np.arange(12).reshape(3, 4)
print(matrix)
print(matrix[2, 1])

For a matrix one can use indices as:
```python
m[i][j] 
```
but this is innefficient compared to `m[i,j]`, as in the former case a temporary copy ofrow `i` is done, and from this one the element at index `j` is picked.

By using `slicing`, one can refer to a whole subset of elements. For example, for vectors one gets:

In [None]:
vector = 10 * np.arange(10)
print(vector)
print(...) # note that the rightmost boundary is not used for selection; the elements are retrieved up to index 6-1

In [None]:
indices = [1, 3, 2, 7]
print(vector)
print(...)

In [None]:
# or we can use a sequence of indices, with initial/step/final value given
vector[...]

For a matrix you can use:

In [None]:
matrix = 10 * np.arange(20).reshape(4, 5)
print(matrix)

In [None]:
print(matrix[1,])
# which is the same with the more explicit form:
print(matrix[...])

In [None]:
# you can slice indices, on any axis
matrix[1:3, :]

In [None]:
# slicing on avery axis
matrix[1:3, 2:4]

### Logical indexing

You can execute logical operations against the elements of a NumPy array. As a result of this, you will obtain an ndarray of the same shape as the initial array, filled in with `True` and `False`. The exact boolean value is the result of the logical operation on each value:

In [None]:
a = np.array([[1,2], [3, 4], [5, 6]])
print(a)
print(a > 2)

Furthermore, the resulted nparray can be used for indexing. You will get only those elements for which the boolean operation yielded `True`:

In [None]:
larger_than_2 = a > 2
print(a[larger_than_2])
# direct approach
print(a[a>2])

# Note that the initial shape of a is lost. Can you explain why?

For joint conditions one can use boolean operators. For example, to select elements which are larger than 2, but smaller than 6, one writes:

In [None]:
a[np.logical_and(a > 2, a < 6)]

Similar operators are: `np.logical_or`, `np.logical_not`, `np.logical_xor`.

A popular request for ndarrays is getting those FP elements which are defined - i.e. those which are not NaNs (NaN = not a number, values resulted due to invalid operations like: 0/0, np.sqrt(-1), np.log(-1), etc.):

In [None]:
tab = np.array([[1.0, 2.3, np.nan, 4], [10, np.nan, np.nan, 0]])
print(tab[~np.isnan(tab)])

In all cases, indexing returns a view of the initial array. Modifications done on this view will actually update the underlying initial array:

In [None]:
print('Before:\n', tab)
tab[...] = 0.0
print('After:\n', tab)

Logical indexing allows selection of the elements in an array whose contents is to be changed:

In [None]:
# even numbers are multiplied by 10, the other ones remain unchanged 
matrix = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print('Before update:\n', matrix)
matrix[...] *= 10
print('After update:\n', matrix)

You may perform update on a specific axis:

In [None]:
matrix = np.array([[1, 2, 3, 4], [5, 6, 7, 8]], dtype=np.float32)
print('Before update:\n', matrix)
# columns 0, 2, 3 will be updated
bool_columns = [...]
matrix[:, bool_columns] = (matrix[:, bool_columns] +3 )/10
print('After update\n', matrix)

### Recommended bibliography 
1. https://docs.scipy.org/doc/numpy-1.13.0/glossary.html
1. https://engineering.ucsb.edu/~shell/che210d/numpy.pdf
1. http://www.scipy-lectures.org/intro/numpy/numpy.html#indexing-and-slicing

## Broadcasting

Broadcasting allows operations with matrices of incompatible dimensions, under specific circumstances. For example, following the mathematical definition of matrix addiction, the matrices `a` and `b`b below cannt be added:

In [None]:
a = np.array([[0.0,0.0,0.0],[10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]]) 
b = np.array([0.0,1.0,2.0])  

print('a=\n{0}\n'.format(a))
print('b=\n{0}\n'.format(b))

Through broadcasting, the matrix `b` is automatically extended through copy/paste of its line:
![broadcast](./images/broadcast1.png)

In [None]:
# broadcasting
result = a + b
print('result=\n{0}\n'.format(result))

Broadcasting is done if some conditions are fulfilled. When one operates on two matrices, NumPy compares the sizes on each dimension, starting with the last dimension. Two dimensions are compatible when: 

1. they are equal, or
1. one of them is 1

The rules above are not fulfilled, for example, by:

![Broadcat imposibil](./images/broadcast2.png)
or for the case below:

In [None]:
x = np.arange(4)
y = np.ones(5)
print(x.shape, y.shape)

#print(x+y) # ValueError: operands could not be broadcast together with shapes (4,) (5,) 

If NumPy can solve the lack of data through content replication, it will do this automatically:

In [None]:
x = np.arange(4).reshape(4, 1)
print('x shape: ', x.shape)
print('x:\n', x)

In [None]:
y = np.arange(5).reshape(1, 5)
print('y shape: ', y.shape)
print('y:\n', y)

In [None]:
z = x + y
print('z shape:', z.shape)
print('z\n', z)

### Broadcast example

([Source](https://eli.thegreenplace.net/2015/broadcasting-arrays-in-numpy/)) For specific variants of food, we decompose them in fats, carbs, proteins, in grams. We want to convert them into calories, by using multiplicative constants:
1. calories for fats = 9 * fats in grams
1. calories for proteins = 4 * protein grams
1. calories for carbs = 4 * carbs in grams

![tabel portii](./images/broadcast3.png)

The multiplication is leveraged by broadcasting:

In [None]:
weights = np.array([
  [0.3, 2.5, 3.5],
  [2.9, 27.5, 0],
  [0.4, 1.3, 23.9],
  [14.4, 6, 2.3]])

cal_per_g = np.array([9, 4, 4])

# broadcasting
calories = weights * cal_per_g

print('Calories:\n', calories)

### Bibliography

[Basic broadcasting: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)

http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc

http://cs231n.github.io/python-numpy-tutorial/#numpy-broadcasting

## Vectorized computation

Motivational presentation: https://www.kdnuggets.com/2017/11/forget-for-loop-data-science-code-vectorization.html 

### Exemplu

We have 2 collections of numbers: the first one contains distances, the second one is time requested to cover them. We want to compyte the speed. We will present two approaches: a classical `for` cycle, and vectorized computation.

In [None]:
distances = [10, 20, 23, 14, 33, 45]
durations = [0.3, 0.44, 0.9, 1.2, 0.7, 1.1]

In [None]:
# Version 1: traditional cycle
speeds = []
for i in range(len(distances)):
    ...
    
print('Speeds: ', speeds)

In [None]:
# Version 2: vectorization
# Vectorization works on vector/matrices, the first step is to convert lists to NumPy arrays

distances_array = ...
durations_array = ...

# We use NumPy operations which work with full matices. The C code  
# uses Single Instruction Multiple Data (SIMD) facilities from the CPU. 

speeds_array = ...

print(speeds_array)


### Benefits of vectorization

1. Fast execution
1. Short and often more readable code

Example: let us compute
$$
\sum\limits_{i=0}^{N-1} (i\%3-1) \cdot i
$$

In [None]:
# Python funciton with non-vectorized computation (`for` cycle)

N = 100000

def func_python(N):
    ...

print(func_python(N))

In [None]:
%timeit func_python(N)

In [None]:
# function with vectorized computation

def func_numpy(N):
    i_array = np.arange(N)
    return ...

print(func_numpy(N))

In [None]:
%timeit func_numpy(N)

Most operations and NumPy functions work for any number of values, element by element - they are also called universal functions. They are optimized to work with SIMD devices. The following operators and functions are efficiently working with arrays:

- arithmetic operators: + - * / // % **
- bitwise operators: & | ~ ^ >> <<
- comparisons: < <= > >= == !=
- math functions: np.sin, np.log, np.exp, ...
- special scipy functions: `scipy.special.*`

Although some NumPy functions are already provided by Python - e.g. `sum`, `min`, `mean` -ufuncs are much faster:

In [None]:
from random import random
c = [random() for i in range(N)]

In [None]:
# Python builtin function
%timeit sum(c)

In [None]:
# NumPy vectorized: prepare the NumPy array
c_array = np.array(c)

In [None]:
# NumPy vectorized: use ufunc sum
%timeit c_array.sum()

### Solved exercise

We have $n$ points in 2D space. Their coordinates are given in vectors **x** and **y**, respectively. Compute the closest pair of points, using Euclidean distance:
$$
d^2((x_i, y_i), (x_j, y_j)) = (x_i-x_j)^2 + (y_i-y_j)^2
$$

In [None]:
n = 1000
x = np.random.random(size = n)
y = np.random.random(size = n)

In [None]:
# Version 1: compute the matrix `d` of pairwise distances. d[i, j] will store the square of the distance 
# between points of coordinates (xi, yi) and (xj, yj), respectively.

In [None]:
%%timeit

d = np.empty((n, n))
for i in range(n):
    for j in range(n):
        d[i, j] = (x[i] - x[j])**2 + (y[i]-y[j])**2

In [None]:
# compute the indices i and j, i != j, for which the distance is minimized

def closest_pair(mat):
    n = mat.shape[0]
    # distance between a point and itself is always 0; we will exclude these cases by setting infinity on the main diagonal of distances
    i = np.arange(n)
    mat[i, i] = np.inf
    pos_flatten = np.argmin(mat)
    return pos_flatten // n, pos_flatten % n
    

In [None]:
# Version 2: vectorized computation and broadcasting

In [None]:
%%timeit

dx = (x[:, np.newaxis] - x[np.newaxis, :]) ** 2
dy = (y[:, np.newaxis] - y[np.newaxis, :]) ** 2

d = dx + dy

### Bibliography
[https://speakerdeck.com/jakevdp/losing-your-loops-fast-numerical-computing-with-numpy-pycon-2015](https://speakerdeck.com/jakevdp/losing-your-loops-fast-numerical-computing-with-numpy-pycon-2015)

[Losing your Loops Fast Numerical Computing with NumPy](https://www.youtube.com/watch?v=EEUXKG97YRw)