# Machine learning overview

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/insop/ML_crash_course/blob/main/1_ml_overview.ipynb)

## Outline

- Objectives
- Target audiences
- Machine learning: what and examples
- Types of machine learning systems
- A short example
- Tutorials: numpy, pandas

## Objectives
- discuss high level ML overview
- review representative ML systems
- able to identify potential areas that AI/ML could be helpful
- understand the development cycle of AI/ML applications
- understand the end-to-end view of AI/ML applications

## Target audiences
- want to understand the high level view
    - read the notebook
- want to know how internals of ML system works
    - run the code in the notebook
- want to know
    - look at the material provided as references

## Machine learning: what and examples

### What is machine learning?

- a predictor $f$ takes some input $x$ and generates output $\hat{y}$
    - $x$ $\rightarrow$ $f$ $\rightarrow$ $\hat{y}$
- a predictor $f$ learns from training dataset
- to learn, prediction performance needs to be measured, such as prediction accuracy
- a predictor usually call ML model
- input can be structured or unstructured data

Machine learning is subfield of artificial intellegence, which includes statical machine learning and artificial neural netowrks.

In the following, we will introduce three task areas of ML: classfication, regressions, other types of prediction tasks.

#### Classification

- a binary classification predicts positive or negative output
    - $x$ $\rightarrow$ $f$ $\rightarrow$ $0$ or $1$
- examples of binary classfication
    - spam detection, is spam or not?
    - tagging offensive comments on discussion forums
- $k$ multiclass classification predicts one of $k$ classes
    - $x$ $\rightarrow$ $f$ $\rightarrow$ $0, 1, ..., (k-1)$
- examples of multiclass classfication
    - image classification
    - digit classification

Note that the binary classification is also known as a logistic regression, do not confused with the following regression task. Each dataset consist of an input and output label, which is a ground truth class for supervised training. We will discuss more on supervised machine learning.

#### Regression
- prediction output is real number, $\hat{y} \in \mathbb{R}$
- examples
    - housing price prediction given many features of the house
    - forcasting cmopany's revenue next year based on performance metrics


Regression is also called as linear regression, which has long history dated back 1809 by Gauss and  Legendre.

#### Other types of prediction tasks
- any other types of model predicts complex objects
- task could often decomposed or cascade with one or more classification and regression tasks
- examples:
    - object detection: predicts object types and locations
    - language translatoin: given input language and translate to other language

The other types of tasks sound daunting, but they are essentially built by more simpler regression and classfication tassk.

## Types of machine learning systems
- Supervised learning
- Unsupervised learning
- Reinforcement learning

### *Supervised learning

### *Unsupervised learning

### *Reinforcement learning

## Short ML example

We will see a simple ML example that predict $y$ value give $x$ value, which is regression type of ML. The following [code](https://github.com/ageron/handson-ml2/blob/3cffb49fffb4d79db5e68de1fc5f91d5e74262e8/01_the_machine_learning_landscape.ipynb) is from book [1].

Import python modules, [sklearn](https://scikit-learn.org/stable) provide many of the widely used ML models.

In [None]:
import sklearn
import sys

import matplotlib.pyplot as plt # graph
import numpy as np # number handling
import pandas as pd # structured data handling
import sklearn.linear_model
import os

A helper function to prepare dataset using OECD's life satisfaction values and the IMF's GDP per capita.

In [None]:
def prepare_country_stats(oecd_bli, gdp_per_capita):
    oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
    oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
    gdp_per_capita.rename(columns={"2015": "GDP per capita"}, inplace=True)
    gdp_per_capita.set_index("Country", inplace=True)
    full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita,
                                  left_index=True, right_index=True)
    full_country_stats.sort_values(by="GDP per capita", inplace=True)
    remove_indices = [0, 1, 6, 8, 33, 34, 35]
    keep_indices = list(set(range(36)) - set(remove_indices))
    return full_country_stats[["GDP per capita", 'Life satisfaction']].iloc[keep_indices]

In [None]:
# To plot pretty figures directly within Jupyter
%matplotlib inline
import matplotlib as mpl
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

In [None]:
# prepare download path
datapath = os.path.join("datasets", "lifesat", "")

In [None]:
# Download the training dataset
import urllib.request
DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
os.makedirs(datapath, exist_ok=True)
for filename in ("oecd_bli_2015.csv", "gdp_per_capita.csv"):
    print("Downloading", filename)
    url = DOWNLOAD_ROOT + "datasets/lifesat/" + filename
    urllib.request.urlretrieve(url, datapath + filename)

In [None]:
# load training data and prepare input dataset

oecd_bli = pd.read_csv(datapath + "oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv(datapath + "gdp_per_capita.csv",thousands=',',delimiter='\t',
                             encoding='latin1', na_values="n/a")

# Prepare the data
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]

In [None]:
# Visualize the training dataset
country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')
plt.show()

The follosing two lines of code trains the ML model (linear regression)

In [None]:
# Select a linear model
model = sklearn.linear_model.LinearRegression()

# Train the model
reg = model.fit(X, y)

We can check this linear equation
$y = \theta_0 x + \theta_1$, 

where $x$ is "GDP per capita", $y$ is "Life satisfaction"

In [None]:
𝜃1 = reg.coef_[0][0]
𝜃0 = reg.intercept_[0]
print(f"y = {𝜃1:.6f}x + {𝜃0:.2f}")

Now we can plot the linear equation on top of the training dataset.

In [None]:
country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')

plt.axis([0, 60000, 0, 10])
X=np.linspace(0, 60000, 1000)
plt.plot(X, 𝜃0 + 𝜃1*X, "r")
plt.text(5000, 3.1, r"$\theta_0$ = {:.2f}".format(𝜃0), fontsize=14, color="b")
plt.text(5000, 2.2, r"$\theta_1$ = {:.6f}".format(𝜃1), fontsize=14, color="b")
plt.show()

Now we can make prediction for Cyprus.

In [None]:
cyprus_gdp_per_capita = gdp_per_capita.loc["Cyprus"]["GDP per capita"]
print("Cyprus's GDP per capita {}".format(cyprus_gdp_per_capita))


In [None]:
# Make a prediction for Cyprus

cyprus_predicted_life_satisfaction = model.predict([[cyprus_gdp_per_capita]])[0][0]
cyprus_predicted_life_satisfaction

print("Life satisfaction of Cyprus is: {}".format(cyprus_predicted_life_satisfaction))

We can plot our prediction overlay with the linear equation and training data.

In [None]:
country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')

plt.axis([0, 60000, 0, 10])
X=np.linspace(0, 60000, 1000)
plt.plot(X, 𝜃0 + 𝜃1*X, "r")
plt.text(5000, 3.1, r"$\theta_0$ = {:.2f}".format(𝜃0), fontsize=14, color="b")
plt.text(5000, 2.2, r"$\theta_1$ = {:.6f}".format(𝜃1), fontsize=14, color="b")

plt.plot([cyprus_gdp_per_capita, cyprus_gdp_per_capita], [0, cyprus_predicted_life_satisfaction], "r--")
plt.text(25000, 5.0, r"Prediction = {:.2f}".format(cyprus_predicted_life_satisfaction), fontsize=14, color="b")
plt.plot(cyprus_gdp_per_capita, cyprus_predicted_life_satisfaction, "rx")

plt.show()

## Credits

This note book follws ...

## References

1. Chapter 1 from Book [Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)
2. Code example [01_the_machine_learning_landscape](https://github.com/ageron/handson-ml2/blob/3cffb49fffb4d79db5e68de1fc5f91d5e74262e8/01_the_machine_learning_landscape.ipynb)