## <font color='darkblue'>Section7 - Machine Learning Basics</font>
This notebook is created from udemy course "<b>[Deep Learning Prerequisites: The Numpy Stack in Python](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/)</b>" which introducts Numpy, Scipy, Pandas, and Matplotlib: prep for deep learning, machine learning, and artificial intelligence.

## <font color='darkblue'>Section Introduction</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643806#overview))<br/>

### <font color='darkgreen'>New Section: Machine Learning</font>
* We'll even apply a deep neural network
* Think of this like a machine learning mini course
* This section not originally part of this course, why?
    * Learn to add and multiply before learning calculus
    * What order should I take your course in?
    
![neural network](images/S7_1.png)
<br/>

### <font color='darkgreen'>Problem</font>
Those who want to market ML do the opposite:
* Make machine learning sound as complex & magical as possible
* Give you a basic, but non-mathematical explanation - to reveal the magic and make it as if you've learned a clever trick
* The more high level and multive-sounding, the more you feel you've accomplished.
* 2 or 3 lines of code to use an API: the matic spell. But you still don't know how the spell works.

### <font color='darkgreen'>The realistic approach</font>
* Instead of making ML try to sound magical, let's make it sound as dumb as possible.
* Demonstrate that ML is no more than a geometry problem - this is real intuition
* Instead of matic, use spatial/mathematical reasoning.

### <font color='darkgreen'>Numpy's brothers and sisters</font>
* Numpy
* Scipy
* Pandas
* Matplotlib
* Scikit-learn
* (and more...)

## <font color='darkblue'>What's Classification</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643808#overview)) <br/>
* Classic machine learning benchmark
* Given: input image -> predict: what digit:
![neural network](images/S7_2.png)
<br/>

### <font color='darkgreen'>2 Common types of data</font>
* Images (ex. MINST) -- computer vision
* Text (ex. emails) -- natural language processing (NLP)
* The entire Internet is (mostly) made up of these 2 things.

### <font color='darkgreen'>Text Example: Spam Detection</font>
* Input: email, predict: spam or not spam

### <font color='darkgreen'>Common Theme</font>
![neural network](images/S7_3.png)
<br/>

### <font color='darkgreen'>Learning</font>
* How does the machine learning model learn to make correct predictions?
* We're given a dataset on which to train
* It consists of input data and targets (aka labels)
* Learning: given many examples, figure out the "pattern"

### <font color='darkgreen'>Code</font>
Remember, we have to do 2 things: learn and make new predicitons:
```python
import RandomForestClassifier from sklearn

model = RandomForestClassifier()
model.fit(X, Y) # learning
predictions = model.predict(X) # make predictions
```

### <font color='darkgreen'>Evaluation</font>
* How do we know the model have learned something useful?
* Measure its accuracy (aka. Classification rate)
    * Accuracy = #correct / #total
* In code (but also wasy to do manually):
```
model.score(X, Y)
```

## <font color='darkblue'>Classification in code</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643814#overview))<br/>

In [6]:
from future.utils import iteritems
import numpy as np
from sklearn.datasets import load_breast_cancer

# Load the data
data = load_breast_cancer()
type(data)

sklearn.utils.Bunch

In [14]:
# Check what fields supported in data
data.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])

In [8]:
# We have 569 instances with 30 features
data.data.shape

(569, 30)

In [13]:
# What feature(s) we have
data.feature_names

array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error',
       'fractal dimension error', 'worst radius', 'worst texture',
       'worst perimeter', 'worst area', 'worst smoothness',
       'worst compactness', 'worst concavity', 'worst concave points',
       'worst symmetry', 'worst fractal dimension'], dtype='<U23')

In [16]:
# target=0 -> malignant; target=1 -> benign
data.target_names

array(['malignant', 'benign'], dtype='<U9')

In [9]:
# Extract X (data), y (target) 
X, y = data.data, data.target

In [18]:
from sklearn.model_selection import train_test_split

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3)

In [19]:
# Start training
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)

RandomForestClassifier()

In [21]:
# Evaluation: training score
model.score(X_train, y_train)

1.0

In [22]:
# Evaluation: testing score
model.score(X_test, y_test)

0.9707602339181286

In [24]:
# Making prediction
predictions = model.predict(X_test)
predictions[:10]

array([0, 1, 0, 1, 1, 0, 0, 1, 0, 1])

In [26]:
# Check the accuracy
N = len(y_test)
np.sum(predictions == y_test) / N 

0.9707602339181286

In [32]:
# Let's use DL to solve the same problem here
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler

# Pre processing
scaler = StandardScaler()
X_train_std = scaler.fit_transform(X_train)
X_test_std = scaler.transform(X_test)

In [30]:
# Training
model = MLPClassifier()
model.fit(X_train_std, y_train)



MLPClassifier()

In [37]:
# Evaluation
train_score, test_score = model.score(X_train_std, y_train), model.score(X_test_std, y_test)

print(f"train_score={train_score:.03f}; test_score={test_score:.03f}")

train_score=0.997; test_score=0.982


## <font color='darkblue'>What's Regression</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643816#overview)) <br/>

### <font color='darkgreen'>Regression</font>
* We just covered classification, which seems very intuitive.
* Regression is also very intuitive

![Regression ex](images/S7_4.png)
<br/>

### <font color='darkgreen'>Regression vs. Classification</font>
* Classification: Predict a category
* Regression: Predict a number on the real line
* In regression, the numbers do have meaning

### <font color='darkgreen'>Ex: Predicting House Prices</font>
* Real estate application
* Can have >1 input
    * neighbor
    * floor
    * ...
    

### <font color='darkgreen'>Obligatory Business Example</font>
![Regression ex](images/S7_5.png)
<br/>

### <font color='darkgreen'>What will the code look like?</font>
![Regression ex](images/S7_6.png)
<br/>
```python
model = LinearRegression()
model.fit(X, y) # Learning
predictions = model.predict(X) # make predictions
score = model.score(X, y) # evaluation by MSE
```

### <font color='darkgreen'>Why not MSE</font>
* House prices will range from hundreds of thousands to millions
* Grades will range from 0-100
* An error of $100^2$ is not bad for house prices, but it is bad for grades!
![Regression ex](images/S7_7.png)
<br/>

### <font color='darkgreen'>The $R^2$</font>
Is literally the [correlation coefficient squared](https://en.wikipedia.org/wiki/Coefficient_of_determination):
> In statistics, the **coefficient of determination**, denoted $R^2$ or $r^2$ and pronounced "R squared", is the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

![Regression ex](images/S7_8.png)
<br/>

We'll look at $R^2$ again in a latter course. For now, just remember that important points.
* $R^2$=1 is the best ($SS_{res} $ = 0)
* $R^2$=0 is the dumbest model (pick the average target, regardless of input)

## <font color='darkblue'>Regression in code</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643818#overview)) <br/>
In this lecture, we're going to look at how to do regression in code rather than file for this lecture. ([data source:airfoil_self_noise.dat](https://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise))

In [41]:
# Loading data
import numpy as np
import pandas as pd

df = pd.read_csv('datas/airfoil_self_noise.dat', sep='\t', header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5
0,800,0.0,0.3048,71.3,0.002663,126.201
1,1000,0.0,0.3048,71.3,0.002663,125.201
2,1250,0.0,0.3048,71.3,0.002663,125.951
3,1600,0.0,0.3048,71.3,0.002663,127.591
4,2000,0.0,0.3048,71.3,0.002663,127.461


In [42]:
# Check columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1503 entries, 0 to 1502
Data columns (total 6 columns):
0    1503 non-null int64
1    1503 non-null float64
2    1503 non-null float64
3    1503 non-null float64
4    1503 non-null float64
5    1503 non-null float64
dtypes: float64(5), int64(1)
memory usage: 70.6 KB


In [49]:
# Retrieve X, y
data = df[list(range(5))].values
target = df[5].values

In [51]:
data.__class__

numpy.ndarray

In [52]:
# Split train/test data
from sklearn.model_selection import train_test_split

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3)

In [53]:
# Training
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

LinearRegression()

In [55]:
# Evaluation
train_score, test_score = model.score(X_train, y_train), model.score(X_test, y_test)
print(f"train score={train_score:.03f}; test score={test_score:.03f}")

train score=0.515; test score=0.512


In [56]:
# Prediction
predictions = model.predict(X_test)
predictions[:10]

array([124.37422429, 122.27092714, 123.61013951, 131.19680768,
       132.16258229, 126.51044553, 120.38656008, 124.92217894,
       126.23651668, 118.09571416])

In [57]:
# Try other model
from sklearn.ensemble import RandomForestRegressor

model2 = RandomForestRegressor()
model2.fit(X_train, y_train)

RandomForestRegressor()

In [59]:
# Evaluation - the result is better than the first model
train_score, test_score = model2.score(X_train, y_train), model2.score(X_test, y_test)
print(f"train score={train_score:.03f}; test score={test_score:.03f}")

train score=0.989; test score=0.924


## <font color='darkblue'>What is a feature vector</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643820#overview)) <br/>

![Regression ex](images/S7_9.png)
<br/>

### <font color='darkgreen'>How can I come up with good features?</font>
* First option - use your domain knowledge (ex. Your height won't impact your grade score)
* Second option - A purely mathematical approach (ex. Convolution, Tylor expansion)
* Actually, you can combine two approaches:
![feature vector](images/S7_10.png)
<br/>

## <font color='darkblue'>ML is nothing but Geometry</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643822#overview)) <br/>
* Machine learning isn't magic, it's just geometry!
* We will see how this "works" for both classification and regression
![feature vector](images/S7_11.png)
<br/>

### <font color='darkgreen'>Regression</font>

### <font color='darkgreen'>Classification</font>


## <font color='darkblue'>All data are the same</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643826#overview)) <br/>
* The algorithm is the same no matter the data set.
* When you plot the data into the coordination, all you need to find the best way to separate those classes (classification) or find a line to fit the trend (regression).

## <font color='darkblue'>Comparing Different Machine Learning Models</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643830#overview)) <br/>
* Which model should I choose?
* Same lines of code no matter which model I choose
* Shouldn't always choose the most powerful?
* How do I know which one is the most powerful?

### <font color='darkgreen'>The Approach</font>
* This lecture: some general ideas and concepts
* Nothing will repace learning about how these models work
* Learn the algorithms, their pros and cons, where they fail and succeed

### <font color='darkgreen'>Linear Models</font>
* Very easy to interpret
 
![feature vector](images/S7_12.png)
<br/>

### <font color='darkgreen'>Basic Nonolinear Models</font>
* Don't be fooled! They are not necessarily "better" than a linear model
* Example: Naive Bayes, Decision Tree, K-Lnearest Neighbor

### <font color='darkgreen'>Ensemble Models</font>
* Random Forest, AdaBoost, ExtraTrees, Gradient Boosted Trees.
* Average the predictions from multiple trees
* [XGBoost](https://en.wikipedia.org/wiki/XGBoost) has been used to win a significant number of Kaggle contests

### <font color='darkgreen'>Supported Vector Machine (SVM)</font>
* Was the "go-to" method for a long time
* Today that is deep learning, but SVM used to beat NN
* Powerful and nonlinear, but they do not scale
* Most datasets these days are too large.

### <font color='darkgreen'>Deep Learning</font>
* You've only seen the tip of the iceberg (MLP)
* State of the art in CV and NLP
* Not "plug-and-play" (unlike Random Forest)
* You souldn't normally use SKLearn.

### <font color='darkgreen'>Summary</font>
* Don't take this table as gospel
* ML is a field of experimentation, not philosophy
![feature vector](images/S7_13.png)
<br/>

## <font color='darkblue'>Machine Learning and Deep Learning: Future topics</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643834#overview)) <br/>

### <font color='darkgreen'>Unsupervised learning</font>

### <font color='darkgreen'>Reinforcement learning</font>

### <font color='darkgreen'>Practical concepts</font>

### <font color='darkgreen'>Hyperparameters</font>

## <font color='darkblue'>Summary</font>
([course link](https://www.udemy.com/course/deep-learning-prerequisites-the-numpy-stack-in-python/learn/lecture/19643838#overview)) <br/>
* Understanding ML as a black box
* Input/Output
* What functions performs
* In the future, implementing ML algorithms just means "fill in the blanks"

### <font color='darkgreen'>It's not magic, it's geometry</font>

### <font color='darkgreen'>All data is the same</font>

### <font color='darkgreen'>Integrating ML Code</font>

### <font color='darkgreen'>Comparing ML models</font>