# Chapter 2 Slides
## Ethem Alpaydin - Introduction to Machine Learning

Open source notebooks/slides for Ethem Alpaydin's Introduction to Machine Learning with examples in Python 3.6.

In [1]:
import numpy as np
from support import basic_plot

from bokeh.plotting import Figure, output_notebook, show
from bokeh.models import Range1d
from bokeh.models import NumeralTickFormatter
from bokeh.palettes import Spectral8 as colors
from bokeh.layouts import gridplot

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

In [2]:
import sys
sys.path.insert(0, '/home/vagrant/notebooks')
from custom_theme import output_notebook_themed
output_notebook_themed()

# output_notebook()

# Classification

\\[{\cal X} = \left\{ {{{\bf{x}}^t},{r^t}} \right\}_{t = 1}^N\\]

\\[{\bf{x}^t} = \left[ \begin{array}{l}
x_1^t\\
x_2^t
\end{array} \right]\\]

\\[r^t = \left\{ \begin{array}{l}
1{\text{ if }}{\bf{x^t}}{\text{ is a positive example}}\\
0{\text{ if }}{\bf{x^t}}{\text{ is a negative example}}
\end{array} \right.\\]

# Classification example

Is an example of a car a "family car"?

\\[{\bf{x}^t} = \left[ \begin{array}{l}
x_{\text{price}}^t\\
x_{\text{engine power}}^t
\end{array} \right]\\]

\\[r^t = \left\{ \begin{array}{l}
1{\text{ if }}{\bf{x^t}}{\text{ is a family car}}\\
0{\text{ if }}{\bf{x^t}}{\text{ is a not a family car}}
\end{array} \right.\\]

In [3]:
X = np.array([
        [10000,100],[20000,400],[25000,300],[40000,250],[20000,200],
        [15000,150],[12500,250],[29000,170],[30000,250],[25000,200],
    ])

y = np.array([
    'not_family','not_family','family','not_family','family',
    'not_family','not_family','not_family','family','family'
])

print(X, '\n')
print(y)

[[10000   100]
 [20000   400]
 [25000   300]
 [40000   250]
 [20000   200]
 [15000   150]
 [12500   250]
 [29000   170]
 [30000   250]
 [25000   200]] 

['not_family' 'not_family' 'family' 'not_family' 'family' 'not_family'
 'not_family' 'not_family' 'family' 'family']


In [4]:
def car_plot():
    plot = basic_plot(title='Family Car Example', xaxis_label='Price (USD)', 
                  yaxis_label='Engine Power (hp)')
    plot.scatter(x=X[y=='family',0], y=X[y=='family',1], 
                 legend='family car', color='red', fill_alpha=0.3333, size=10)
    plot.scatter(x=X[y!='family',0], y=X[y!='family',1], 
                 legend='not family car', color='blue', fill_alpha=0.3333, size=10)
    plot.min_border = None
    return(plot)

In [5]:
show(car_plot())

# Hypothesis Class

We induce the hypothesis that "the class \\(C\\) of family car is a rectangle
in the price-engine power space."

\\[h \in {\cal H}\\]

Where \\(h\\) is a single rectangle from all possible rectangles \\({\cal H}\\).

# Classification Model

\\[h = ({p_1} \le {\text{price}} \le {p_2}){\text{ AND }({e_1}} \le {\text{engine power}} \le {e_2})\\]

\\[h({\mathbf{x}}) = \left\{ \begin{gathered}
  1{\text{ if }}h{\text{ classifies }}{\mathbf{x}}{\text{ as a positive example}} \hfill \\
  0{\text{ if }}h{\text{ classifies }}{\mathbf{x}}{\text{ as a negative example}} \hfill \\ 
\end{gathered}  \right.\\]

In [6]:
class RectangleClassifier:
    def __init__(self, rect_left=None, rect_right=None, rect_bottom=None, rect_top=None):
        self.rect_left = rect_left
        self.rect_right = rect_right
        self.rect_bottom = rect_bottom
        self.rect_top = rect_top
        
    def fit(self, X, y):
        pos_samples = X[y == 1]
        neg_samples = X[y != 1]
        pos_x_min, pos_x_max = pos_samples[:, 0].min(), pos_samples[:, 0].max()
        pos_y_min, pos_y_max = pos_samples[:, 1].min(), pos_samples[:, 1].max()
        neg_sample_left =   neg_samples[:,0][neg_samples[:,0] < pos_x_min].max()
        neg_sample_right =  neg_samples[:,0][neg_samples[:,0] > pos_x_max].min()
        neg_sample_below =  neg_samples[:,1][neg_samples[:,1] < pos_y_min].max()
        neg_sample_above =  neg_samples[:,1][neg_samples[:,1] > pos_y_max].min()
        self.rect_left =   (pos_x_min + neg_sample_left)/2
        self.rect_right =  (pos_x_max + neg_sample_right)/2
        self.rect_bottom = (pos_y_min + neg_sample_below)/2
        self.rect_top =    (pos_y_max + neg_sample_above)/2
        
    def predict(self, X):
        prediction = (X[:,0] > self.rect_left) & \
                     (X[:,0] < self.rect_right) & \
                     (X[:,1] > self.rect_bottom) & \
                     (X[:,1] < self.rect_top)
        return(prediction)
    
    def loss(self, X, y):
        prediction = self.predict(X)
        error = (prediction != y).sum()
        return(error)

# One hot-encoding

In [7]:
y_one_hot = (y == 'family').astype(int)
print(y_one_hot)

[0 0 1 0 1 0 0 0 1 1]


In [8]:
classifier = RectangleClassifier()

classifier.fit(X, y_one_hot)

print('x_min', classifier.rect_left)
print('x_max', classifier.rect_right)
print('y_min', classifier.rect_bottom)
print('y_min', classifier.rect_top, '\n')

print('empirical error', classifier.loss(X, y_one_hot))

x_min 17500.0
x_max 35000.0
y_min 185.0
y_min 350.0 

empirical error 0


In [9]:
plot_rect_hypo = car_plot()
plot_rect_hypo.title.text = 'Family Car Example: Rectangle Hypothesis'
plot_rect_hypo.background_fill_color = '#E6E6FE'
plot_rect_hypo.quad(left=classifier.rect_left, right=classifier.rect_right, 
                    bottom=classifier.rect_bottom, top=classifier.rect_top,
                    color='#FEE6E6', level='underlay')

In [10]:
show(plot_rect_hypo)

# Empirical Error

\\[E(h|\mathcal{X}) = \sum\limits_{t = 1}^N {1(h({{\mathbf{x}}^t}) \ne {r^t})}  = \sum\limits_{t = 1}^N {\left\{ \begin{gathered}
  {\text{1 if }}h({{\mathbf{x}}^t}) \ne {r^t} \hfill \\
  0{\text{ otherwise}} \hfill \\ 
\end{gathered}  \right.} \\]

In [11]:
plot_fp = car_plot()
classifier_fp = RectangleClassifier(rect_left=classifier.rect_left, 
                                    rect_right=classifier.rect_right, 
                                    rect_bottom=160, 
                                    rect_top=classifier.rect_top)
plot_fp.title.text = 'Family Car Example: False Positive. Loss: ' + \
                     str(classifier_fp.loss(X, y_one_hot))
plot_fp.background_fill_color = '#E6E6FE'
plot_fp.quad(left=classifier_fp.rect_left, right=classifier_fp.rect_right, 
             bottom=classifier_fp.rect_bottom, top=classifier_fp.rect_top,
             color='#FEE6E6', level='underlay')

In [12]:
show(plot_fp)

In [13]:
plot_fn = car_plot()
classifier_fn = RectangleClassifier(rect_left=classifier.rect_left, 
                                    rect_right=classifier.rect_right, 
                                    rect_bottom=classifier.rect_bottom, 
                                    rect_top=280)
plot_fn.title.text = 'Family Car Example: False Negative. Loss: ' + \
                     str(classifier_fn.loss(X, y_one_hot))
plot_fn.background_fill_color = '#E6E6FE'
plot_fn.quad(left=classifier_fn.rect_left, right=classifier_fn.rect_right, 
             bottom=classifier_fn.rect_bottom, top=classifier_fn.rect_top,
             color='#FEE6E6', level='underlay')

In [14]:
show(plot_fn)

# Regression

\\[{\cal X} = \left\{ {{{\bf{x}}^t},{r^t}} \right\}_{t = 1}^N\\]

If there is no noise, we want to **interpolate**:

\\[r^t = f({\bf{x}}^t)\\]

In **regression** random noise is assumed to be added to the function: 

\\[r^t = f({\bf{x}}^t) + \varepsilon \\]

Where \\(\varepsilon\\) is random Gaussian noise. The explanation for noise is that there are extra hidden (latent) variables \\({\bf{z}}^t\\) that we cannot observe:

\\[r^t = f({\bf{x}}^t, {\bf{z}}^t)\\]

In [15]:
class SimpleRegression:
    def __init__(self, w=None, b=None):
        self.w = w
        self.b = b
        
    def fit(self, x, y):
        x_mean = x.mean()
        y_mean = y.mean()
        N = len(x)
        self.w = (np.sum(x * y) - x_mean * y_mean * N) /  \
                 (np.sum(x**2) - N * x_mean**2)
        self.b = y_mean - self.w * x_mean
        
    def predict(self, x):
        return (self.w * x) + self.b
    
    def loss(self, x, y):
        prediction = self.predict(x)
        loss = (1/len(x)) * np.sqrt(np.sum(((y - prediction)**2)))
        return loss

In [16]:
x = np.array([0, 5000, 10000, 20000, 40000, 60000, 80000])
y = np.array([50000, 45000, 40000, 35000, 30000, 25000, 20000])

linear_model = SimpleRegression()
linear_model.fit(x, y)

In [19]:
plot = basic_plot(title='linear regression', xaxis_label='x: Mileage', 
                  yaxis_label='y: Price')

plot.xaxis[0].formatter = NumeralTickFormatter(format='0')
plot.y_range = Range1d(0, int(max(y)+5000))
plot.scatter(x=x, y=y, size=10)
plot.line(x=np.linspace(0,120000,100), 
          y=linear_model.predict(np.linspace(0,120000,100)), 
          color=colors[0], line_width=2.5, alpha=0.75,
          legend = '1: ' + str(int(linear_model.loss(x, y))));

In [20]:
show(plot)

# Polynomial Features

IE Quadratic:

\\[g(x) = {w_2}{x^2} + {w_1}x + {w_0}\\]

In [19]:
models = {}
for degree in [2,3,4,5,6,7,8]:
    poly = PolynomialFeatures(degree=degree)
    x_ = poly.fit_transform(x.reshape(-1, 1))
    models[degree] = LinearRegression()
    models[degree].fit(x_, y)
    prediction = models[degree].predict(x_)
    loss = (1/len(x)) * np.sqrt(np.sum((y - prediction)**2))
    plot.line(x=np.linspace(0,120000,100), 
              y=models[degree].predict(poly.transform(np.linspace(0,120000,100).reshape(-1,1))), 
              line_color=colors[degree-1], line_width=2.5, alpha=0.5,
              legend=str(degree) + ': ' + str(int(loss)))

plot.title.text = 'overfitting'

In [20]:
show(plot)

# Ill-Posed Problem

Learning is an **ill-posed problem**, data is never sufficient to find a unique solution.

For binary classification, after seeing N example cases, there remain \\({2^{{2^d} - N}}\\) possible functions.

In [23]:
def rectangle_plot(title=None): 
    plot = basic_plot(width=200, height=200, title=title)
    plot.rect(x=[0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5],
              y=[0.5, 1.5, 2.5, 0.5, 1.5, 2.5, 0.5, 1.5, 2.5], 
              width=1, height=1, line_color='grey', color=None,)
    plot.outline_line_color = None
    plot.axis.visible = False
    return(plot)

p1 = rectangle_plot()
p1.rect(x=[0.5,1.5,2.5], y=[1.5,1.5,0.5], width=1, height=1, color='grey')
# p1.title. = 'r^t = 1'

p2 = rectangle_plot()
p2.rect(x=[2.5,1.5,0.5], y=[0.5,1.5,2.5], width=1, height=1, color='grey')

p3 = rectangle_plot()
p3.rect(x=[2.5,1.5,0.5,0.5], y=[0.5,1.5,2.5,0.5], width=1, height=1, color='grey')

In [24]:
# Source: Learning from Data (Abu-Mostafa, et al 2012)
show(gridplot([[p1,p2,p3]], toolbar_location=None))

## Inductive Bias 

The set of assumptions we make to have learning possible is called the **inductive bias** of the learning algorithm. 
    
One way we introduce inductive bias is when we assume a hypothesis class. (IE Rectangles)

This is called **model selection**, which is choosing between possible H.

# Generalization

How well a model trained on the training set predicts the correct output for new instances is called **generalization.** The goal of machine learning is to generalize well on new, unseen examples.

**Triple tradeoff**: 
* the complexity of the hypothesis we fit to data
* the amount of training data
* the generalization error on new examples.

# Dimensions of a Supervised Machine Learning Algorithm

* Model 

\\[g(x|\theta)\\]

* Loss

\\[E(\theta |\mathcal{X}) = \sum\limits_t {L({r^t},g({x^t}|\theta ))} \\]

* Optimization

\\[\theta^* = \arg \mathop {\min }\limits_\theta  E(\theta |\mathcal{X})\\]