# Python for Machine Learning

## Maciej Szankin

# About me

<div style="float:left; padding:30px;">
    <img src="https://avatars0.githubusercontent.com/u/4785345?s=460&u=347ee5cbd03d6f3972af3b66ec530eef19f1af1e&v=4" style="width:300px;" />
</div>
<div style="float:left; padding:30px;">
    <h2>Maciej Szankin</h2>
    <h4>Deep Learning Software Engineer @ Intel</h4>
</div>

# Introduction - How do I participate?

* Run as script locally on your own machine
* Use Jupyter Notebooks (vs IPython Notebook) on

## Google Colab

* Available at https://colab.research.google.com/
* It's free! You only need a Google account to access
* Offers different environment: CPU / GPU / TPU
* Resources are not guaranteed
* Avoid hogging resources - you can get lower priority in the future if requested resources are not actievely used!
* For improved experience - Colab Pro - https://colab.research.google.com/signup

## Google Colab - running code

# TODO link do tej prezentacji
# TODO RUN

## Google Colab - Installing new packages

* You can execute bash commands in a cell.
* Prepend your command with `!` to run it in a shell
* When running Google Colab you get your own virtualized environment - you can install packages
* The environment will be cleaned upon exiting - make your life easier and install all additional dependencies in the first cell.

* Example command to install Python's wget:
  ```bash
  !pip install wget
  ```

* Verify:
     ```python
    import wget
    wget.__version__
    ```

In [2]:
import wget
print(wget.__version__)

3.2


## ML.eti.pg.gda.pl

* Available at http://ml.eti.pg.gda.pl/
* JupyterHub-based preconfigured environment deployed for this School
* Server is located in Gdansk, Poland
* Use it if you can't use Google Colab

## ML.eti.pg.gda.pl - running code

# TODO link do tej prezentacji
# TODO RUN

## ML.eti.pg.gda.pl - Installing new packages

<center><h2>You can't! Sorry! 💔</h2></center>

## Run locally

```bash
pip install jupyter

# or even better:
conda install jupyter

jupyter notebook
```

# Python

<center>
    <img src="https://imgs.xkcd.com/comics/python.png" />
    <div><i>Source: https://xkcd.com/353/</i></div>
</center>

## Python
* Object Oriented
* however #1 - no explicit encapsulation: "After all, we're all consenting adults here."
* however #2 - no class interface, only (multi)inheritance
* multi-paradigm
* interpreted
* strongly typed
* dynamically typed
* designed for code clarity
* object introspection
*  interactive mode (terminal / IPython)
* interfaces to many popular programming languages: 
    * C++
    * Java
    * .NET
    * and many others
* has it's own package manager - pip & easy_install
* before writing a library - check if it exists

## How to get Python

* https://www.python.org/downloads/
* https://www.anaconda.com/products/individual
* apt / brew


## Loops

In [None]:
def find(seq, target):
    found = False
    for i, value in enumerate(seq):
        if value == target:
            found = True
            break
    if not found:
        return -1
    return i

find(range(0,10), 6)

In [None]:
def find(seq, target):
    for i, value in enumerate(seq):
        if value == target:
            break
    else:
        return -1
    return i

find(range(0,10), 6)

## List comprehensions

In [None]:
X = X = [0, 1, 2, 3, 4, 5]
X_even_sqr = [even_sqr**2 for even_sqr in X if even_sqr%2==0]
X_even_sqr

## List slicing

## Dictionary comprehensions

## Functions

## Decorators

In [19]:
import time

# Decorator definition
def timeit(function):
    def timed(*args, **kwargs):
        time_start = time.time()
        result = function(*args, **kwargs)
        time_end = time.time()
        print('Function {}({}) returned {} in: {:6.5f}ms'.format(function.__name__, *args, result, time_end-time_start))
        return result
    return timed

## Classes

* `__init__` is not a true constructor - https://www.quora.com/Can-we-say-init__-self-is-a-constructor-in-Python-How-can-this-be-justified


## What makes your Python code better

### PEP8 Style Guide

Guidelines for:
* Code layout (Indentation - Tabs or Spaces?, Maximum Line Length, Imports etc.)
* String quotes
* Whitespaces
* Naming conventions
* Overriding principles

Validate your code with pycodestyle utility (pep8 [just the utility] has been renamed to pycodestyle (GitHub issue #466)).

```bash
pip install pep8 / pip install pycodestyle
```

Examples: https://pypi.org/project/pep8/, more at https://www.python.org/dev/peps/pep-0008/

```bash
$ pycodestyle <directory or a file>

  python101/math.py:4:1: E302 expected 2 blank lines, found 1
  python101/math.py:4:10: E231 missing whitespace after ','
```

### PEP257 Docstrings

A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute of that object.

* All modules should normally have docstrings, and all functions and classes exported by a module should also have docstrings.
* Public methods (including the __init__ constructor) should also have docstrings.
* A package may be documented in the module docstring of the __init__.py file in the package directory.
* For consistency, always use """triple double quotes"""
* Validate your code with pep257 utility.
    ```bash
    pip install pep257
    ```
* Good read: https://blog.dolphm.com/pep257-good-python-docstrings-by-example/

```bash
$ pep257 python101/
python101/__init__.py:1 at module level:
        D104: Missing docstring in public package
python101/math.py:1 at module level:
        D100: Missing docstring in public module
python101/math.py:1 in public function `add`:
        D103: Missing docstring in public function
python101/math.py:4 in public function `sub`:
        D103: Missing docstring in public function
```

### PEP257 Docstrings

In [None]:
class Class1(object):
    """ Class
    multiline description
    """
    a = None

    def __init__(self):
        """Multiline
        constructor description
        """
        self.b = None

help(Class1)

### Typical project structure

```
/example_pkg
--/example_pkg
----__init__.py
----my_script.py
--setup.py
--LICENSE
--README.md
```

### Packaging

1. Prepare the directory tree as shown in `Typical project structure`
1. Install required packages:
    ```bash
    pip install setuptools wheel
    ```
1. Run this command from the same directory where `setup.py` is located:
    ```bash
    python setup.py sdist bdist_wheel
    ```
1. This will created a `dist` directory with built package. To test:
    1. Install created package:
        ```bash
        python -m pip install dist/python_tutorial-0.0.1-py3-none-any.whl
        ```
    1. Go to a different directory, enter Python shell, and try entering the code below:
    ```python
    import python_tutorial
    dir(python_tutorial)
    ```

# Tensor


> a mathematical object analogous to but more general than a vector, represented by an array of components that are functions of the coordinates of a space.

# NumPy

Let's do a quick overview of numpy.

## NumPy


* Numerical Python Library
* high performance implementation of multidimensional arrays

```python
import numpy as np
```

## NumPy | Creating an Array

Let's go through a couple of ways we can create a numpy array
We are already familiar with python's list, which is elements in sequence.
Note: array and list are two different things

```python
py_list = [1,2,3]
```

if we want to convert it to numpy array we can do
```python
np.array(py_list)
```

if you check a type of this object you can see that it's no longer a list, it's a np.ndarray,ndarray means N dimensional array


In [5]:
py_list = [1,2,3]

np.array(py_list)

array([1, 2, 3])

## NumPy | Array of Type

* NumPy arrays are homogeneous - all elements are of the same type

## NumPy

* Indexed by a tuple of non-negative integers
* In NumPy dimensions are called axes
* NumPy’s array class is called `ndarray`. It is also known by the alias `array`
* `numpy.array` != `Python's array.array`

## NumPy

__*Example #1__

* RGB value of the pixel can be represented as `[0, 113, 197]`
* This array has one axis
* That axis has 3 elements in it. We say it has a length of 3

__Example #2__

* A multicolor line can be represented as
  ```
  [[  0, 113, 197],
   [ 58,  64,  59]]
  ```
* This array has 2 axes
* The first axis is of length 2
* The second axis has a length of 3

TODO Example 3 na malym obrazku, troche dalej po shape itp itd

## NumPy

In [4]:
!conda install ny

Collecting numpy
  Downloading numpy-1.19.1-cp36-cp36m-win_amd64.whl (12.9 MB)
Installing collected packages: numpy
Successfully installed numpy-1.19.1


## NumPy

In [54]:
import numpy as np

a = np.array([[  0, 113, 197],
              [ 58,  64,  59]])

# ndarray.ndim
# the number of axes (dimensions) of the array.
print(f'ndarray.ndim = {a.ndim}')

# ndarray.shape
# the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension.
# For a matrix with n rows and m columns, shape will be (n,m). 
# The length of the shape tuple is therefore the number of axes, ndim.
print(f'ndarray.shape = {a.shape}')

# ndarray.size
# the total number of elements of the array. This is equal to the product of the elements of shape.
print(f'ndarray.size = {a.size}')

# ndarray.dtype
# an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.
print(f'ndarray.dtype = {a.dtype}')

# ndarray.itemsize
# the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.
print(f'ndarray.itemsize = {a.itemsize}')

# ndarray.data
# the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

ndarray.ndim = 2
ndarray.shape = (2, 3)
ndarray.size = 6
ndarray.dtype = int64
ndarray.itemsize = 8


In [38]:
# TODO to moze powinno byc potem

@timeit
def fun_numpy(x):
    import numpy as np
    a = np.ones((x,x), dtype=np.int32)
    return a.sum()

@timeit
def fun_std(x):
    a = [[1 for _ in range(x)] for _ in range(x)]
    s = 0
    for i, _ in enumerate(a):
        for j, _ in enumerate(a[0]):
            s += a[i][j]
    return s

n = 1000
fun_numpy(n)
fun_std(n)

Function fun_numpy(1000) returned 1000000 in: 0.00170ms
Function fun_std(1000) returned 1000000 in: 0.12697ms


1000000

TODO argument for multiprocessing

## NumPy: arrays

## Numpy: boolean array indexing

## Numpy: datatypes

## Numpy: operations on arrays

## Numpy: broadcasting

# Graph Framework

Like TensorFlow, it will be lazy:

    1. Build the GRAPH, which represents the data flow of the computations
    2. Run a SESSION, which executes the operations in the graph

## Graph Framework: Operations

In [None]:
class Operation(object):
    def __init__(self, input_nodes=[]):
        self.input_nodes = input_nodes
        self.output_nodes = []
        
        for node in input_nodes:
            node.output_nodes.append(self)
        
        _default_graph.operations.append(self)
    
    def compute(self):
        raise NotImplemented()

In [None]:
class Add(Operation):
    def __init__(self, x, y):
        super().__init__([x, y])
    
    def compute(self, x_var, y_var):
        self.inputs = [x_var, y_var]
        return x_var + y_var

In [None]:
class Mul(Operation):
    def __init__(self, x, y):
        super().__init__([x, y])
    
    def compute(self, x_var, y_var):
        self.inputs = [x_var, y_var]
        return x_var * y_var

In [None]:
class MatMul(Operation):
    def __init__(self, x, y):
        super().__init__([x, y])
    
    def compute(self, x_var, y_var):
        self.inputs = [x_var, y_var]
        return x_var.dot(y_var)

## Graph Framework: Placeholders, Variables and Graph object

* Placeholder - a special node which is used as data input
* Variable - changable parameter of the graph
* Graph - a global variable which connects variables and placeholders to operations

In [None]:
class Placeholder(object):
    def __init__(self):
        self.output_nodes = []
        _default_graph.placeholders.append(self)

In [None]:
class Variable(object):
    def __init__(self, initial_value=None):
        self.value = initial_value
        self.output_nodes = []
        
        _default_graph.variables.append(self)

In [None]:
class Graph(object):
    def __init__(self):
        self.operations = []
        self.placeholders = []
        self.variables = []
    
    def set_as_default(self):
        global _default_graph
        _default_graph = self

z = Ax + b

A = 10

b = 1

z = 10x + 1

In [None]:
g = Graph()
g.set_as_default()

A = Variable(10)
b = Variable(1)
x = Placeholder()

y = Mul(A, x)
z = Add(y, b)

print(z)

## Graph Framework: Session

In [None]:
class Session(object):
    def run(self, operation, feed_dict={}):
        nodes_postorder = self._traverse_postorder(operation)
        
        for node in nodes_postorder:
            if type(node) == Placeholder:
                node.output = feed_dict[node]
                
            elif type(node) == Variable:
                node.output = node.value
                
            else:
                node.inputs = [input_node.output for input_node in node.input_nodes]
                node.output = node.compute(*node.inputs)
            
            if type(node.output) == list:
                node.output = np.array(node.output)
        return operation.output
                
    def _traverse_postorder(self, operation):
        nodes_postorder = []
        
        def recurse(node):
            if isinstance(node, Operation):
                for input_node in node.input_nodes:
                    recurse(input_node)
            nodes_postorder.append(node)
            
        recurse(operation)
        return nodes_postorder
        

In [None]:
sess = Session()
sess.run(operation=z, feed_dict={x: 10})

# TensorFlow

# PyTorch