## Python Workshop

---
---

### What is Python? Why Python?


Python is an interpreted, interactive, object-oriented programming language that combines remarkable power with very clear  syntax.


<u>Pros</u>
- Open source with a vibrant community
- Simple and parsimonious syntax - makes its code easy to learn and share
- Easy OOP
- Huge array of third-party packages
- Flexibility and versatility (e.g. used for hacking (CIA), scientific computing (NASA), producing films (Pixar, Disney), crawling web pages (Google), recommeding songs (Spotify), etc.)


<u>Cons</u>:
- Speed limitations - not as fast as low-level languages
- Problems with threading because of GIL (PyPy attempts to solve that)
- Python is not a good choice for memory intensive tasks

Conclusion: Python is a good general-purpose language, but also "second best for everything".

 
<a href="https://www.economist.com/science-and-technology/2018/07/19/python-has-brought-computer-programming-to-a-vast-new-audience" target="_blank">
  <img src="https://www.economist.com/sites/default/files/imagecache/1280-width/20180728_WOC883.png">
</a><br>
<br>

---
<br>

### Installation instructions

There are quiet a few Python distributions out there (see [here](https://wiki.python.org/moin/PythonDistributions)).<br>
They all are build around the reference implemenation of Python called [CPython](https://www.python.org/doc/).<br>
The main difference between them is the packages they come preloaded with (check out [this article](https://www.infoworld.com/article/3267976/python/anaconda-cpython-pypy-and-more-know-your-python-distributions.html)). <br>

For data science and machine learning the [Anaconda](https://www.anaconda.com/what-is-anaconda/) distribution of Python is the most popular.<br>
Please install it by following the step-by-step instructions:

* <a href="https://docs.anaconda.com/anaconda/install/" target="_blank">Install Anaconda</a>

**Note**: During the installation you will be asked whether you want to install Visual Studio Code. Please install that as well.<br><br>


Upon the completion of the installation:

- Open the terminal (or Anaconda prompt on Windows) and run:`jupyter notebook`<br>
If a jupyter notebook has opened in your browser, it means that everything went well and you have a reason to celebrate!<br>
If not, go back to the installation instructions and make sure that you have followed them closely.

- To prepare the VS Code environment please look at the [Python in VS Code](https://code.visualstudio.com/docs/languages/python) page.<br>
Make sure that you have managed to run the `hello.py` file you are asked to create.

---
---
---

### Table of Contents:
1. [Python Environment](#one)

   - Jupyter Notebooks
   - Jupyter Lab
   - IDEs
   

2. [Package management](#two)

   - Numpy, Pandas, Scikit-Learn
   - Plotting
     - Matplotlib
     - Plotly
     - Bokeh


3. [Types, Variables](#three)

4. [Control Structures](#four)

5. [Functions](#five)

6. [Fast Python](#six)
   - Numba, Cython, etc
   - `pybind11`
   

7. [OOP and Packaging](#seven)

8. [Files](#eight)

9. [Data Analysis](#nine)

https://docs.python.org/3/tutorial/index.html

---
## 1. Python Environment <a class="anchor" id="one"></a>


- **Jupyter Notebooks**<br><br>

- **Jupyter Lab**<br><br>

- **IDEs**

In [1]:
print('Hello world!')

Hello world!


In [2]:
1+1

2

- Python is case sensitive.

- Spacing is important.

In [None]:
%lsmagic

In [None]:
%pwd

In [None]:
%%bash 
info python

In [None]:
%debug

In [None]:
%time

In [None]:
%timeit

- **IDEs**

Since October 2018 it is also possible to access and create Jupyter Notebooks from inside VS Code ([Python in Visual Studio Code – October 2018 Release](https://blogs.msdn.microsoft.com/pythonengineering/2018/11/08/python-in-visual-studio-code-october-2018-release/)).

This feature allows one to write interactive Python code.


---
## 2. Package Management <a class="anchor" id="two"></a>

Installing packages may be done by running in the terminal:

`pip install <package_name>`

`pipenv` instead of `pip`

Sometimes when you want to install virtual environment.

In [None]:
%%bash 
pip install psycopg2

In [None]:
import this

In [None]:
import numpy as np

In [None]:
# Check version of the package
np.__version__

In [None]:
import pandas as pd

In [None]:
pd.DataFrame({'A':[1,2,4,4],
              'B':['alpha','beta','gamma','omega']
             })

- Plotting

Matplotlib, Bokeh, Plotly, 

In [None]:
import matplotlib
import matplotlib.pyplot as plt

In [None]:
# Data for plotting
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)

In [None]:
fig, ax = plt.subplots()
ax.plot(t, s)

ax.set(xlabel='time (s)', ylabel='voltage (mV)',
       title='About as simple as it gets, folks')
ax.grid()

# fig.savefig("test.png")
plt.show()

---

In [None]:
import plotly.graph_objs as go
import plotly.figure_factory as ff
from ipywidgets import interact
# from plotly import tools
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

In [None]:
fig = go.FigureWidget()
fig.add_scatter(x=t, y=s)
fig.layout = dict(title='Sin function', xaxis=dict(title='time(s)'))
fig

---
## 3. Types, Variables, Operators <a class="anchor" id="three"></a>

- **Data Types**: Integers, Floats, Booleans, Strings

In [None]:
type(1)

In [None]:
type(1.0)

In [None]:
type(True)

In [None]:
type('alphabeta')

In [None]:
input("Enter something please: ")

In [None]:
type([1,'two',3,1])

In [None]:
type({1,'two',3,1})

In [None]:
type((1,'two',3,1))

In [None]:
type({'a': 10, 'b': 20, 'c': 30})

In [None]:
type(np.array([110,1]))

In [None]:
type(pd.DataFrame())

- **Operators**: Arithmetic, Assignment, Comparison, Logical

In [None]:
# Addition
1+1

In [None]:
# Division
10/1

In [None]:
# Exp
np.exp(1j*np.pi)

In [None]:
# Power
np.e**10

In [None]:
# Floor division
20//6

In [None]:
# Modulo
20%6

In [None]:
# Comparison
1 == 1+1e-16

In [None]:
# Logical
True or False

---

In [None]:
f1 = lambda x: (np.exp(x)-1)/x
f2 = lambda x: (np.exp(x)-1)/np.log(np.exp(x))

In [None]:
f1(1e-15)

In [None]:
f2(1e-15)

---
---
## 4. Control Structures <a class="anchor" id="four"></a>

- Spacing is important!

In [None]:
for i in range(10):
    print(i)

In [None]:
arr = ['alpha', 1, 'beta', 2, 3, 'red']

In [None]:
for (i, a) in enumerate(arr):
    print(i,'\t',a)

**Exercises**

Problem 1, 2, 8 from ProjectEuler.net

In [None]:
# Problem 1
sum([i for i in range(1,1000) if i%3==0 or i%5==0])

---

In [None]:
# Problem 2
s = sum([i**i for i in range(1,1001)])
str(s)[-10:]

---

In [None]:
# Problem 8
num = """
73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450
"""

In [None]:
num = num.replace('\n','')

In [None]:
prod_max = 0
for i in range(len(num)-13):
    prod = np.prod([int(d) for d in num[i:i+13]])
    if prod > prod_max:
        prod_max = prod
prod_max

---
## 5. Functions <a class="anchor" id="five"></a>

In [None]:
a=1

In [None]:
def fun(a=1):
    a+=1

In [None]:
fun()
a

In [None]:
def fib(n):
    """Compute the n'th Fibonnaci number"""
    return n+fib(1)

In [None]:
fib

---
## 6. Fast Python <a class="anchor" id="two"></a>

- ### Better Code

E.g. task: sample 20 Bernoulli variables 1000000 times and compute the sample frequency of observing a number higher than 10.

In [None]:
np.mean(np.random.binomial(n=20, p=0.5, size=1000000) > 10)

---
- Numba

In [None]:
%%time
s=0
for i in range(1,100000):
    s += 1/i**2

---
- Cython

---
- `pybind11`

---
## 7. OOP and Packaging <a class="anchor" id="seven"></a>

More on packaging: [here](https://packaging.python.org/).

---
## 8. Files <a class="anchor" id="eight"></a>

In [None]:
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

In [None]:
iris.head(10)

In [None]:
df = pd.read_stata('pew_research_center_june_elect_wknd_data.dta')

In [None]:
df[['rid','zipcode','msa']].to_csv('file.txt', sep=' ', header=False)

---

In [None]:
np.loadtxt('file.txt')

---

In [None]:
with open('data.csv','w', encoding='utf-8') as fp:
    myFile = csv.writer(fp)
    myFile.writerows(data)

#### Problem 7 from the Self-Assessment Assignment for the Machine Learning course

---
## 9. Machine Learning in Python <a class="anchor" id="nine"></a>

Machine Learning Course<br>
Assignment 1

---
---

Additional Resources:
    - SoloLearn
    - DataCamp
    
    https://www.python-course.eu/numpy.php
    https://www.python-course.eu/python3_course.php
    
    