# Machine Learning 101 in Python
### Led by Matteo Hessel, UCL ML Alumni, currently a Research Engineer at Google DeepMind


# Preparation

## Why Python?
If you want to be a Data Scientist and apply machine learning, regardless of the application domain you will encounter Python. The reasons that make it an excellent language for data science are several:

First of all is a very simple language, if you know any other programming language you will learn it very fast.

Second, is very flexible, it offers features from imperative, scripting, object-oriented, and functional programming.
This flexibility means that you can do very fast prototyping, because you can use it at the level of abstraction you need it, if you want to run some tests with different models you can do it with few lines of code without caring about structure and classes, and potentially do so when you want to put the selected model into production. 
If you try to do the same with Java you will have 20 classes, 5 interfaces before you can even start tackling your problem...

Third, you can rely on countless libraries, developed open source by a huge community, you have:
- NumPy, which wraps highly optimized C code, for fast matrix operations, 
- Pandas, which integrates naturally with Numpy, offers tools for data loading, formatting, and transformation, 
- NLTK offers tools for natural language processing (tokenizing, sentence splitting, parsing), all of these are trivial to implement but will take hours of boring coding if you have to do it yourself,
- Scikit-learn, which we shall see during the workshop, integrated with Numpy and Pandas, offers many models for statistics and machine learning.
- Finally Theano, Chainer, Tensor Flow offer neural network frameworks and easy integration of high performance computing hardware such as GPUs.

### The fundamental features of Python
- Interpreted language
- Dynamic Typing
- Two versions: Python2.x and Python3.x
- Just in time compiler: PyPy
- First class functions
- Class system and Inheritance

Remember to **indent**!


## What is Jupyter?
Jupyter is a multi-language open-source project which provides notebooks, web-based interactive scripting environments, for many uses. Particularly useful for data scientists. iPython is a python kernel for jupiter.

Installing Jupyter:
- Download [Anaconda Python 3.5](https://www.continuum.io/downloads)
- Start the installer of Conda
- Check installation typing `python` in Terminal (exit with Ctrl+D)
Install / update Jupyter typing `conda install jupyter` in Terminal
Run jupyter typing `jupyter notebook` in Terminal

Install the packages:
```
conda install numpy
conda install scipy
conda install scikit-learn
conda install pandas
conda install nltk
conda install matplotlib
```

---


# Supervised Learning
## Aims


## Classification
Practical 1: K-NN on Jupiter using Scikit Learn

Practical 2: Logistic Regression on Jupiter using Scikit Learn


## Regression
Practical 3: Linear Regression on Jupiter using Scikit Learn

Practical 4: Polynomial Regression on Jupiter using Scikit Learn

## Overfitting and Generalization
Practical 5: an experimental pipeline


## Testing: Holdout vs Cross-Validation

## Other Supervised Learning Algorithms: 
RandomForests

Practical 6: using RandomForests on some more complex task (e.g. my affective computing project at UCL)