# Basic Principles of Machine Learning

<img src="figs/traditional_vs_ML.jpg" width="70%">

## What does the figure above mean exactly?

#### Let us demonstrate it using a simple regression problem.

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

# use seaborn for plot defaults
# this can be safely commented out
import seaborn; seaborn.set()

First, let us generate some data!

In [None]:
x = np.arange(10)
y = 2 * x + 1

In [None]:
print("X values: ", x)
print("Corresponding Y values: ", y)

In [None]:
plt.plot(x, y, '--r');

Here we'll dive into the basic principles of machine learning, and how to
utilize them via the Scikit-Learn API.

## The Scikit-learn Estimator Object

Every algorithm is exposed in scikit-learn via an ''Estimator'' object. For instance a linear regression is implemented as so:

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
LinearRegression?

**Estimator parameters**: All the parameters of an estimator can be set when it is instantiated, and have suitable default values:

In [None]:
model = LinearRegression(normalize=True)
print(model.normalize)

In [None]:
print(model)

**Estimated Model parameters**: When data is *fit* with an estimator, parameters are estimated from the data at hand. All the estimated parameters are attributes of the estimator object ending by an underscore:

In [None]:
# The input data for sklearn is 2D: (samples == 3 x features == 1)
X = x[:, np.newaxis]
print(X)
print(y)

In [None]:
# fit the model on our data
model.fit(X, y)

In [None]:
dir(model)

In [None]:
# underscore at the end indicates a fit parameter
print(model.coef_)
print(model.intercept_)

The model found a line with a slope 2 and intercept 1, as we'd expect.

### Why did we get perfect estimates?

Now, let us add some noise to our data:

In [None]:
x = np.arange(200)
y = 2 * x + 1 + np.random.randn(200)

In [None]:
# print("X values: ", x)
# print("Corresponding Y values: ", y)

In [None]:
plt.plot(x, y, '+r');

In [None]:
X = x[:, np.newaxis]
model.fit(X,y)

In [None]:
# underscore at the end indicates a fit parameter
print(model.coef_)
print(model.intercept_)

In [None]:
theta = np.array([[model.coef_[0]],[model.intercept_]])
# print(theta)

X_new = np.array([[0],[200]])
# print(X_new)

X_new_b = np.c_[np.ones((2,1)),X_new]
# print(X_new_b)

y_est = X_new_b.dot(theta)
# print(y_est)

In [None]:
plt.plot(X_new,y_est,'b-')
plt.plot(X,y,'r+')
plt.xlabel("$x$", fontsize=18)
plt.ylabel("$y$", rotation=0, fontsize=18);

What we saw above was an example of **model fitting**. 

**General Setting:**

(Given a sample of training data, fit a (mathematical) model)  i.e. choose a “suitable” mathematical function and determine “appropriate” parameters from the data

#### Problems often encountered during model fitting:

**Overfitting** : Matches data too closely, fails to generalize.

**Underfitting** : Doesn't match the data closely enough.

What we aim for is a well-fitted model that represents our data accurately.

Once we have a well-fitted model, we can use the fitted model for **generalization**, e.g. for

- **inference**

- **reasoning**

- **predictions**

- **decision making**


on **test data** i.e. on previously unseen data, i.e. in practice

**Aim:** Recognize pattern
    
**Steps:**
    
1. Extract feature from pattern
2. Feature vector in feature space
3. Blobs or categories in feature space/ vector space
4. Decide a boundary in feature space (line / function)
5. Predict new examples using which side of the boundary does the item fall in the feature space

6. Our task is to come up with boundaries for points between different classes


## Three components of ML

### 1. Representation
### 2. Evaluation
### 3. Optimization

<img src="figs/pipeline.jpg" width="90%">

#### How to choose a model?

Consider the following aspects:

- Nature of the data
	(categorical, relational, vectorial…)

- Problem to be solved
	(forecasting, classification, ranking…)
    
- Available background information
	(physical laws of nature, theories…)
    
- Computational issues
	(ease of implementation, runtime ...)


#### Now, that we have decided for a model, how to fit a model?

- By minimizing or maximizing an appropriate objective

(many, many different methods/ criteria)

1. Maximum likelihood
2. Maximum entropy
3. Least squares (quadratic optimization)
4. Gradient descend



#### Problems with model fitting:

- The training sample may not be representative (poor generalization)

- The chosen model may be too flexible, over-fitting (poor generalization)


#### Take home messages (This is what separates data scientists from random Joe(s) who are doing machine learning) 

- Number of training data may impact generalization (more the better)

- Number of degrees of freedom may impact generalization (fewer the better)

**Note:** You should not apply algorithms blindly

Rephrasing it slightly, the process of building/ implementing and using a ML system inovlves the following phases: 

- **training phase**
- **test phase**
- **application phase**

## Three Fundamental Machine Learning problems

<img src="figs/ML.png" width="90%">

## Commonly practiced Machine Learning approaches

<img src="figs/ML_branches.png" width="90%">

<img src="figs/deep-learning.png" width="90%">

## Three fundamental ML problems

### 1. Classification

### 2. Regression

### 3. Clustering

<img src="figs/sklearn_cheatsheet.png" width="90%">