# Week 1 Cheat Sheet

* Session 1 - Refresher course with Python 3.7
* Session 2 - Numpy, Scipy, and matplotlib
* Session 3 - Pandas, Scikit-Learn

## Refresher course with Python 3.7</h2>


* Generator
* List comprehensions
* Lambda operators
* String and list operations

## Generators

* A special kind of function that return a lazy iterator


In [None]:
def fib():
    a, b = 0, 1
    while True:
        yield b
        a, b = b, a+b 

## List comprehensions

* List comprehensions provide a concise way to create lists
* Expression followed by a **for** clause, then zero or more **for** or **if** clauses
* List comprehensions can be nested (matrix)


In [None]:
[ expression for a in clause for b in clause if clause ]

## Dictionary comprehension

* Provide a concise way to create dictionaries

In [None]:
a={ i:i**2 for i in range(10) }
print(a)

## Map, reduce and lambda functions

* Map - compute $y_k=f(x_k)$ to each item $x_k$ of an iterable
* Reduce - recursively compute $y_k=f(x_k,y_{k-1})$ to fold items $x_k$ of an iterable into a cumulative value
* Lambda - anonymous functions not bound to an identifier, used as argument to other functions, e.g. $f$ of map and reduce

In [None]:
from functools import reduce
reduce(lambda x,y:x+y, map(lambda x:x**2, [1,2,3]))

# Let's get started

* Open [notebook 1](./DSLab_week1-1.ipynb)
* Meet us back in 20min

## Python Mathematical libraries
<img align="left" width="30%" src="./figs/pythonstack.svg"/>

## NumPy

* Core library for scientific computing in Python
* Provides a high-performance multidimensional array object
* Large collection of high-level mathematical functions to operate on arrays objects
* Optimized for size and performance

## Numpy
* Optimized for size and peformance

In [None]:
import numpy as np
pyList  = range(1,1000000)
npArray = np.arange(1,1000000)

In [None]:
%%timeit
[1./i for i in pyList]

In [None]:
%%timeit
1./npArray

## Numpy
* A NumPy *ndarray* is a grid of values, all of the same type
* Indexed by a value or range (`:`) of non-negative integers
* The *ndarray.shape* is the size of the array on each dimension

In [None]:
import numpy as np
a=np.array([[1,2,3],[4,5,6]])
print(a[:,:])
print(a.shape)

## Relevant NumPy functions
* **arange**([start,] stop[, step], [, dtype=None])
* **linspace**(start, stop, num=50, endpoint=True, retstep=False)
* **reshape**(array, newshape, order=‘C’)
* **copy**(obj, order='K’)

## Relevant NumPy functions
* **matmul**(x1, x2, ...) or x1 @ x2
* **multiply**(x1, x2, ...) or x1 * x2
* **dot**(x1, x2, ...)

In [None]:
import numpy as np
a=np.arange(12).reshape((4,3),order='C') # A,C,F
print(a)

In [None]:
b=np.arange(12).reshape((3,4),order='C')
b @ a

## SciPy
* Built on NumPy
* Mathematical library for Scientific and Technical Computing
   - Linear algebra, Interpolation, Integration
   - Image and signal processing, FFT
   - Linear optimization
   - Spatial algebra
   - Statistical functions


# Let's get started

* Open [notebook 2](./DSLab_week1-2.ipynb)
* See you in 20min

## Pandas and Pandas DataFrames
* Powerful & flexible data munging library
* Built on top of NumPy
    - NumPy stores your data in arrays
    - Pandas takes the arrays, ...\
        ... and gives you labeld index to it
* Pandas is a 2-D labeled data structure (table) with columns of potentially different types
    - Basically dictionary based NumPy *ndarray*
* Recommended reading [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/index.html)

## Scikit-learn - Machine Learning in Python
* [documentation](http://scikit-learn.org/stable/index.html)
    - Classification, [Decision Trees](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.tree) and [Random Forests](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier)
    - Regression (logistic regression)
    - [Clustering](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster) ([K-Mean](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html))
    - [Nearest Neighbors](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.neighbors)
    - Dimensionality reduction ([PCA](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA))
    - [Model selection](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection) ([hyper-parameters](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV))
    - ...

# Let's get started

* Open [notebook 3](./DSLab_week1-3.ipynb)
* See you in 20min