## Python Workshop

---
---

### What is Python? Why Python?


Python is an interpreted, interactive, object-oriented programming language that combines remarkable power with very clear  syntax.


<u>Pros</u>
- Open source with a vibrant community
- Simple and parsimonious syntax - makes its code easy to learn and share
- Easy OOP
- Huge array of third-party packages
- Flexibility and versatility (e.g. used for hacking (CIA), scientific computing (NASA), producing films (Pixar, Disney), crawling web pages (Google), recommeding songs (Spotify), etc.)


<u>Cons</u>:
- Speed limitations - not as fast as low-level languages
- Problems with threading because of GIL (PyPy attempts to solve that)
- Python is not a good choice for memory intensive tasks

Conclusion: Python is a good general-purpose language, but also "second best for everything".

 
<a href="https://www.economist.com/science-and-technology/2018/07/19/python-has-brought-computer-programming-to-a-vast-new-audience" target="_blank">
  <img src="https://www.economist.com/sites/default/files/imagecache/1280-width/20180728_WOC883.png">
</a><br>
<br>

---
<br>

### Installation instructions

There are quiet a few Python distributions out there (see [here](https://wiki.python.org/moin/PythonDistributions)).<br>
They all are build around the reference implemenation of Python called [CPython](https://www.python.org/doc/).<br>
The main difference between them is the packages they come preloaded with (check out [this article](https://www.infoworld.com/article/3267976/python/anaconda-cpython-pypy-and-more-know-your-python-distributions.html)). <br>

For data science and machine learning the [Anaconda](https://www.anaconda.com/what-is-anaconda/) distribution of Python is the most popular.<br>
Please install it by following the step-by-step instructions:

* <a href="https://docs.anaconda.com/anaconda/install/" target="_blank">Install Anaconda</a>

**Note**: During the installation you will be asked whether you want to install Visual Studio Code. Please install that as well.<br><br>


Upon the completion of the installation:

- Open the terminal (or Anaconda prompt on Windows) and run:`jupyter notebook`<br>
If a jupyter notebook has opened in your browser, it means that everything went well and you have a reason to celebrate!<br>
If not, go back to the installation instructions and make sure that you have followed them closely.

- To prepare the VS Code environment please look at the [Python in VS Code](https://code.visualstudio.com/docs/languages/python) page.<br>
Make sure that you have managed to run the `hello.py` file you are asked to create.

---
---
---

### Table of Contents:
1. [Python Environment](#one)

   - Jupyter Notebooks
   - Jupyter Lab
   - IDEs
   

2. [Package management](#two)

   - Numpy, Pandas, Scikit-Learn
   - Plotting
     - Matplotlib
     - Plotly
     - Bokeh


3. [Types, Variables](#three)

4. [Control Structures](#four)

5. [Functions](#five)

6. [Fast Python](#six)
   - Numba, Cython, etc
   - `pybind11`
   

7. [OOP and Packaging](#seven)

8. [Files](#eight)

9. [Data Analysis](#nine)

[The Python Totorial](https://docs.python.org/3/tutorial/index.html)

---
## 1. Python Environment <a class="anchor" id="one"></a>


- **Jupyter Notebooks**<br><br>

- **Jupyter Lab**<br><br>

- **IDEs**

In [None]:
print('Hello world!')

In [None]:
1+1

In [None]:
%lsmagic

In [None]:
%pwd

In [None]:
%%bash
info python

In [None]:
%debug

In [None]:
%%time

In [None]:
%timeit

- **IDEs**

Since October 2018 it is also possible to access and create Jupyter Notebooks from inside VS Code ([Python in Visual Studio Code – October 2018 Release](https://blogs.msdn.microsoft.com/pythonengineering/2018/11/08/python-in-visual-studio-code-october-2018-release/)).

This feature allows one to write interactive Python code within the IDE.


---
## 2. Package Management <a class="anchor" id="two"></a>

Installing packages may be done by running in the terminal:

`pip install <package_name>`


Sometimes, e.g. when working with virtual environments you may want to use: `pipenv` instead of `pip`.
See [this guide](https://realpython.com/pipenv-guide/).

In [None]:
%%bash 
pip install pgpasslib

In [None]:
import this

In [None]:
import numpy as np

In [None]:
# Check version of the package
np.__version__

In [None]:
np.random.normal(loc=1, scale=3, size=10)

---

In [None]:
import pandas as pd

In [None]:
pd.DataFrame({'A':[1,2,4,4],
              'B':['alpha','beta','gamma','omega']
             })

In [None]:
from scipy import stats

In [None]:
stats.norm.pdf(0)

- **Plotting**

`Matplotlib`, `Bokeh`, `Plotly`, `Altair`, `Seaborn`, `ggplot`, `Pygal`, etc

In [None]:
import matplotlib
import matplotlib.pyplot as plt

In [None]:
# Data for plotting
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)

In [None]:
fig, ax = plt.subplots()
ax.plot(t, s)

ax.set(xlabel='time (s)', ylabel='voltage (mV)',
       title='About as simple as it gets, folks')
ax.grid()

# fig.savefig("test.png")
plt.show()

---

![image.png](attachment:image.png)

---

In [None]:
import plotly.graph_objs as go
# import plotly.figure_factory as ff
from ipywidgets import interact
# from plotly import tools
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

In [None]:
fig = go.FigureWidget()
fig.add_scatter(x=t, y=s)
fig.layout = dict(title='Sin function', xaxis=dict(title='time(s)'))
fig

In [None]:
fig = go.FigureWidget()
scatt = fig.add_scatter(line=dict(shape='spline'))

xs=np.linspace(0, 6, 100)

@interact(a=(1.0, 4.0, 0.01), b=(0, 10.0, 0.01), color=['red', 'green', 'blue'])
def update(a=3.6, b=4.3, color='blue'):
    with fig.batch_update():
        scatt.x=xs
        scatt.y=np.sin(a*xs-b)
        scatt.line.color=color
fig

---
## 3. Types, Operators, Variables <a class="anchor" id="three"></a>

- **Data Types**: Integers, Floats, Booleans, Strings

In [None]:
type(1)

In [None]:
type(1.0)

In [None]:
type(True)

In [None]:
type('alphabeta')

In [None]:
for i in range(5):
    print('Iteration '+str(i))

In [None]:
for i in range(5):
    print('Iteration {0}'.format(i))

In [None]:
'2'+'3'

---

In [None]:
type([1,'two',3,1])

In [None]:
type({1,'two',3,1})

In [None]:
type((1,'two',3,1))

In [None]:
type({'a': 10, 'b': 20, 'c': 30})

In [None]:
type(np.array([110,1]))

In [None]:
type(pd.DataFrame())

- **Operators**: Arithmetic, Comparison, Logical

In [None]:
# Addition
1+1

In [None]:
# Division
10/1

In [None]:
# Exp
np.exp(1j*np.pi)

In [None]:
# Power
np.e**10

In [None]:
# Floor division
20//6

Python's order of operations is the same as that of normal mathematics: parentheses first, then exponentiation, then multiplication/division, and then addition/subtraction.

![image.png](attachment:image.png)

In [None]:
# Modulo
20%6

In [None]:
# Comparison
1 == 1+1e-16

In [None]:
# Logical
True or False

- **Variables**

In [None]:
x = 1

In [None]:
# Inplace operators
x += 7

In [None]:
x = 'a'

In [None]:
del x

##### Exercise

1. Use `numpy` to generate 2 5x5 matrices with uniformly distributed random numbers. Find their product. 

2. Invert the resulting matrix

---
---
## 4. Control Structures <a class="anchor" id="four"></a>

In [None]:
1 == 2

In [None]:
1 != 2

In [None]:
a, b = np.random.uniform(size=(2,10))
a > b

### `IF` statements

In [None]:
if 1 > 2:
    print('A')

In [None]:
if 1:
    print(2)

In [None]:
if (1>2) or (3>2):
    print('Expression')

In [None]:
if (1>2):
    print('Expression 1')
else:
    print('Expression 2')

In [None]:
if (1>2):
    print('Expression 1')
elif (3==2):
    print('Expression 2')

- Spacing is important!

---
### `FOR` loops

In [None]:
for i in range(10):
    print(i)

In [None]:
arr = ['alpha', 1, 'beta', 2, 3, 'red']
for (i, a) in enumerate(arr):
    print(i,'\t',a)

---
* List comprehesions

In [None]:
[a for a in arr]

In [None]:
[a for a in arr if type(a)==int ]

---
### `WHILE` loops

In [None]:
i = 1
while i <=5:
    print(i)
    i = i + 1

In [None]:
i = 0
while True:
    i = i +1
    if i == 2:
        print("Skipping 2")
        continue
    if i == 5:
        print("Breaking")
        break
    print(i)

---
##### Exercises

Problem 1, 2, 8 from ProjectEuler.net

<u>Optional</u>:<br>
Generate an array of 10000 random integers between 0 and 1000.<br>
Sort this array in an ascending order. <br>
Time the execution time. Compare with `np.sort()`.

---
## 5. Functions <a class="anchor" id="five"></a>

In [None]:
def fun1(a):
    a+=1
    print(a)

In [None]:
fun1(2)

In [None]:
def fun2(a=1):
    a+=1
    return a

In [None]:
fun2()

In [None]:
def fun3(a):
    a+=1
    if a%3 == 0:
        return True
    elif (a%3 == 1):
        return 1
    return 'Last return'

---

In [None]:
def func(named_arg, *args):
    print(named_arg)
    print(args)

In [None]:
func(1, 2, 'gamma')

In [None]:
# **kwargs (standing for keyword arguments) allows you to handle named arguments that you have not defined in advance.
def my_func(x, y=7, *args, **kwargs):
    print(args)
    print(kwargs)

In [None]:
my_func(2, 3, 4, 5, 6, a=7, b=8)

---

In [None]:
f1 = lambda x: (np.exp(x)-1)/x
f2 = lambda x: (np.exp(x)-1)/np.log(np.exp(x))

In [None]:
f1(1e-15)

In [None]:
f2(1e-15)

---

In [None]:
def fib(n):
    """Compute the n'th Fibonnaci number"""
    a, b = 1, 1
    for i in range(n):
        a, b = a + b, a

    return a

##### Exercises

1. Write a function that searches a list/array for a given value. Return a corresponding boolean value.<br><br>

2. Generate 10000 numbers from the standard normal distribution.<br>
Plot a histogram.<br>
Overlay the pdf line.<br><br>

3. Generate 10000 numbers from the normal distribution with mean zero and standard deviation of 10.<br>
Apply the `tanh` function to these values.<br>
Plot the histogram of the resulting array.

---
## 6. Fast Python <a class="anchor" id="two"></a>

- ### Better Code

E.g. task: sample 20 Bernoulli variables 1000000 times and compute the sample frequency of observing a number higher than 10.

In [None]:
np.mean(np.random.binomial(n=20, p=0.5, size=1000000) > 10)

---
* Vectorize

In [None]:
v = np.random.normal(size=(1000,1000))

In [None]:
@np.vectorize
def noneg(n):
    if n < 0:
        return 0
    return n

In [None]:
noneg(v)

---

In [None]:
def polyn(n):
    total = 0
    for i in range(n):
        total += (7*n*n) + (-3*n) + 42
    return total

In [None]:
ntimes = 10000
%timeit -n $ntimes polyn(1000)

---
- Numba

In [None]:
import numba

In [None]:
@numba.jit
def polyn(n):
    total = 0
    for i in range(n):
        total += (7*n*n) + (-3*n) + 42
    return total

In [None]:
%timeit -n $ntimes polyn(1000)

---
- Cython

In [None]:
%load_ext cython

In [None]:
%%cython
def ployn(int n):
    cdef int total = 0
    cdef i
    
    for i in range(n):
        total += (7*n*n) + (-3*n) + 42
    return total

In [None]:
%timeit -n $ntimes polyn(1000)

---

In [None]:
%%cython
def fibx(int n):
    cdef int i, a, b
    a, b = 1, 1
    for i in range(n):
        a, b = a + b, a
    return a

In [None]:
%timeit fib(10)
%timeit fibx(10)

---
- [`pybind11`](https://pybind11.readthedocs.io/en/master/basics.html) (recommended for C++ 11 because of its simplicity)

---
Check out also: `Dask`, `Pandas or Ray`, `CuPy`, `PyCuda`

---
## 7. OOP and Packaging essentials<a class="anchor" id="seven"></a>

* Classes

In [None]:
class Dog:
    # Class Attribute
    species = 'mammal'

    # Initializer / Instance Attributes
    def __init__(self, name, color, age):
        self.name = name
        self.color = color
        self.age = age

In [None]:
dog1 = Dog('Felix','ginger', 4)
dog2 = Dog('Rover','dog-colored', 1)
dog3 = Dog('Boxer','brown', 3)

In [None]:
dog1.age

In [None]:
dog2.species

---

In [None]:
class Dog:
    # Class Attribute
    species = 'mammal'

    # Initializer / Instance Attributes
    def __init__(self, name, color, age):
        self.name = name
        self.color = color
        self.age = age
        
    # instance method
    def description(self):
        return "{} is {} years old".format(self.name, self.age)

In [None]:
dog3 = Dog('Boxer','brown', 3)

In [None]:
dog3.description()

---
Sometimes it is useful to make a file that can be both imported as a module and run as a script.<br>
To do this, place script code inside **if \_\_name\_\_ == "\_\_main\_\_"**. 

In [None]:
import example

In [None]:
example.function()

In [None]:
example.add()

---
In Python, the term packaging refers to putting modules you have written in a standard format, so that other programmers can install and use them with ease.<br>
This involves use of the modules **setuptools** and **distutils**. 

---
More on OOP: [here](https://docs.python.org/3/tutorial/classes.html)<br>
More on packaging: [here](https://packaging.python.org/).

---
## 8. Files <a class="anchor" id="eight"></a>

In [None]:
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

In [None]:
iris.head(10)

In [None]:
df = pd.read_stata('pew_research_center_june_elect_wknd_data.dta')

In [None]:
df[['rid','zipcode','msa']].to_csv('file.txt', sep=' ', header=False)

---

In [1]:
np.loadtxt('file.txt')

NameError: name 'np' is not defined

#### Exceptions

In [None]:
# Exceptions
try:
    print(1/0)
except ZeroDivisionError:
    print(4)
else:
    print(5)

In [None]:
try:
    data = np.loadtxt('file.tt')
except:
    data = np.loadtxt('file.txt')
else:
    print('No such file!')

---

In [None]:
# Display first few lines without reading the entire file
with open(filename, encoding='utf-8') as file:
    for lnum, line in enumerate(file):
        if lnum > 2:
            break
        print(line[:-1],'\n')

In [None]:
# Read from File
with open(file, encoding='utf-8') as csvFile:
    reader = csv.reader(csvFile, delimiter='⁞')
    data = list(reader)

In [None]:
# Write to File
with open('data.csv','w', encoding='utf-8') as fp:
    myFile = csv.writer(fp)
    myFile.writerows(data)

#### Problem 7 from the Self-Assessment Assignment for the Machine Learning course

---
## 9. Machine Learning in Python <a class="anchor" id="nine"></a>

[Machine Learning Course](https://www.coursera.org/learn/machine-learning)<br>
Assignment 1

---
---

##### Additional Resources:


* [SoloLearn](https://www.sololearn.com/Course/Python/)
* [Udacity: Intro to Python Programming](https://classroom.udacity.com/courses/ud1110)  
    