**Artificial Inteligence (CS550)**
<br>
Date: **12 February 2020**
<br>
Location: **SU, NEW STEM building**
<br>
Room: **304**

Title: **Seminar №4**
<br>
Speaker: **Dr. Shota Tsiskaridze**

In [None]:
%load_ext autoreload
%autoreload 2

%matplotlib inline

<h2 align="center">Linear Regression</h2>

<h3 align="center">Land Price</h3>

$\bullet$ For simplicity, we consider the **toy linear regression problem**.

$\bullet$ For this we will use **synthetic data**.

$\textbf{Definition}$. **Synthetic data** is **any production data applicable to a given situation that are not obtained by direct measurement** according to the McGraw-Hill Dictionary of Scientific and Technical Terms.

$\bullet$ We are going to predict a **land price** (for simplicity, suppose its **quadratic**, i.e **X by X square** meters) given the **length of its side** (in meters).

$\bullet$ Let's assume that true **dependence** between the **length of land's side** and its **price** is given by **quadratic equation**:

$$y = f(x) = a \cdot x^2 + b \cdot x + c > 0,$$

where $x$ is the length of land's side and $y$ is its price.

$\bullet$ We also assume that our **observations are noisy**, and we model this noise by adding a **normally distributed (Gaussian)** term $\varepsilon$ to our quadratic equation.

In [None]:
# Import the required Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures
import sklearn.preprocessing as sk

Let's introduce a method to generate some dataset.

In [None]:
def generate_dataset(a, b, c, size: int, eps: float=2):
    """
    Input:
    a, b, c: the coefficients of our quadratic equation
    size:    size of the dataset
    eps:     standard deviation (spread or "width") of the noise

    Output:
    X:       array of independent variables
    y:       array of dependent variables
    e:       error term, i.e. noise
    """
    
    # define a quadratic equation f(x)
    f = lambda x: a*x**2 + b*x + c
    
    # define X axis
    X = np.linspace(0.1, 10, num=size)
    
    # define Y axis
    y = [f(i) for i in X]
    
    # generate the normally distributed noise
    e = np.random.normal(0, eps, size=(size))
    
    return np.array(X), np.array(y), e

In [None]:
# define the true parameters for quadratic equation
a = 4
b = 1
c = 2

# generate the dataset
X, y_true, e = generate_dataset(a, b, c, size=20, eps=10)

# create true observations by adding noise
y_obs = y_true + e

plt.figure(figsize=(10, 10))
plt.scatter(X, y_obs, s=20)
plt.plot   (X, y_true, 'g')
plt.xlabel("$x$ [meters]",   fontsize=40)
plt.ylabel("$y$ [currency]", fontsize=40)
plt.grid()
#plt.savefig("p_original.png")

In [None]:
type(X), type(y_obs), type(e)

$\bullet$ Let's select a method we want to fit the above dataset:

$$\hat{f_1}(x) = \theta_0 + \theta_1 \cdot x.$$

In [None]:
# defining linear regression model
model = LinearRegression(fit_intercept=False)  #fit_intercept=False means disabling bias term


$\bullet$ Therefore, our **regression models** involve the following **components**:
- The **unknown parameters** denoted as a vector $\theta$;
- The **independent variables** denoted as an array $𝑋$.
- The **dependent variable** denoted as an array $y$.
- The **error term** defined as a noise and denoted as an array $e$.

In [None]:
# add dummy column for the bias
X1 = np.c_[np.ones(X.shape[0]), X]
X1[0:5]

In [None]:
# fitting the model using Ordinary Least Square
model.fit(X1, y_obs) 

# display learned weights: 𝜃_0 and 𝜃_1
model.coef_

In [None]:
# generate predictions
y_pred = model.predict(X1) 

In [None]:
# plot learned model

plt.figure(figsize=(10, 10))
plt.scatter(X, y_obs, s=20)
plt.plot   (X, y_true, 'g')
plt.plot   (X, y_pred, 'r')
plt.xlabel("$x$", fontsize=40)
plt.ylabel("$y$", fontsize=40)
plt.grid()
#plt.savefig("p_1.png")

$\textbf{Question}$: Is it a **good** fit or **bad** fit, and **Why**?

&ensp; Obviously, this model is too simple and doesn't have enough **power** to capture quadratic dependence.

&ensp; Mathematically speaking, linear function isn't able to approximate quadratic one.

$\bullet$ Let's introduce additional features to help model. We gonna use **polynomial features** up to **10-th**:

$$\hat{f_{10}}(x)=\theta_0 + \theta_1 \cdot x + \theta_2 \cdot x^2 + \theta_3 \cdot x^3 + \theta_4 \cdot x^4 + \theta_5 \cdot x^5 + \theta_6 \cdot x^6 + \theta_7 \cdot x^7 + \theta_8 \cdot x^8 + \theta_9 \cdot x^9 +\theta_{10} \cdot x^{10}.$$

$\textbf{Note}$: despite the fact that we add polynomial features, **model stays linear** because linearity is relative to weights and not features.

Let's use **Scikit-Learn** library which helps with generating new features: 

`from sklearn.preprocessing import PolynomialFeatures`

In [None]:
# define the helper class
poly = PolynomialFeatures(degree=10, include_bias=True)

# create features
X2 = poly.fit_transform(X.reshape(-1, 1))
X2[:1]

In [None]:
model = LinearRegression(fit_intercept=False) 

model.fit(X2, y_obs) # fitting again with new feature set
model.coef_

In [None]:
y_pred = model.predict(X2)

In [None]:
plt.figure(figsize=(10, 10))
plt.scatter(X, y_obs, s=20)
plt.plot   (X, y_true, 'g')
plt.plot   (X, y_pred, 'r')
plt.xlabel("$x$", fontsize=40)
plt.ylabel("$f(x)$", fontsize=40)
plt.grid()
#plt.savefig("p_10.png")


$\textbf{Question}$: Is it a **good** fit or **bad** fit, and **Why**?

$\textbf{Excersice 1}$. **Generate several samples** and **fit the model**. Explain the observations!

**Conclusions**:

 - **Very simple** models have **poor performance** due to lack of expressive power to learn data distribution.
 - **Very complex** models have **poor performance** due to excessive expressive power leading to **fitting noise** instead of the real data


- In other words, the **first one** is **Underfit** and the **second one** is **Overfit** the our dataset.

$\textbf{Excersice 2}$. **Write the code** that **fits** our data with **quadratic polynomial model**. Plot the results.

In [None]:
# Write me

- Let's for a given **single data point** $x_0$ calculate the following parameters using the **first model**:
 - **True Value** 
 - **Predicted Value**
 - **Bias**
 - **Variance**

- For this, let's firts **write a method** that **generates** the dataset, **fits** it and returs the **true and predicted values**.

In [None]:
def generate_and_predict(x, poly):
    """
    Input:
    x:        selected data point value
    model:    selected model, or polynom

    Output:
    true_val: true value for the selected data point x
    pred_val: predicted value
    """

    # generate the dataset
    X, y_true, e = generate_dataset(a, b, c, size=20, eps=10)

    # create true observations by adding noise
    y_obs = y_true + e 
        
    # augment the dataset with selected model features
    X1 = poly.fit_transform(X.reshape(-1, 1))

    # define linear regression model
    model = LinearRegression(fit_intercept=False) 
    
    # fit the model using Ordinary Least Square
    model.fit(X1, Y) 
    
    true_val = f(x)
    
    return true_val, model.predict(poly.fit_transform([[x]]))[0]

$\textbf{Excersice 3}$. **Write the code** that trains model on **different datasets** and calculate **expected prediction** and it's **fluctuation**.

In other words, you need to:

- select the data point on which you will measure prediction quality and fluctuations:  `x0 = 5`
- select the polynomial function, for example with 10th degree:`poly = ...`
- train model on different datasets using the **generate_and_predict** method: `N = 1000`
- save all the predicted values in an array: `y_preds`
- get the **True Value**: `true_val`
- get the **Average Estimate**: `mean_val`
- get the **Bias**: `bias`
- get the **Variance**: `variance`
- fill the code below, print it and plot the distributions

In [None]:
# fill me
true_val = 0
mean_val = 0
bias     = 0
variance = 0  

print('True Value: ',       true_val)
print('Average Estimate: ', mean_val)
print('Bias: ',             bias)
print('Variance: ',         variance)

In [None]:
# fillme 

fig, ax = plt.subplots(1, 2, figsize=(15, 5))
sns.distplot(y_preds, ax=ax[0])
ax[1].scatter([0]*len(y_preds), y_preds, s=0.1)
ax[1].scatter([0], [y_true], s=100)

$\textbf{Excersice 4}$. **Plot** the similar **distributions** for the select the **first degree polynomial function**. Explain the both observations!


- Let's add **regularization term** importing the **Ridge** and **Lasso** regressions from the Scikit-Learn library

In [None]:
from sklearn.linear_model import Ridge, Lasso

# define the learning rate alpha (or lambda)
alpha = 100

# define linear regression model with regularization term
model_ridge = Ridge(alpha=alpha, fit_intercept=False) 
model_lasso = Lasso(alpha=alpha, fit_intercept=False) 

poly = PolynomialFeatures(degree=6, include_bias=True)

X2 = poly.fit_transform(X.reshape(-1, 1))

# fit the model using Ordinary Least Square
model_ridge.fit(X2, y_obs)
model_lasso.fit(X2, y_obs)

yr_pred = model_ridge.predict(X2)
yl_pred = model_lasso.predict(X2)


print("\nL2 Norm of the Ridge model weights:", np.sum(model_ridge.coef_**2))
print('weights:')
print('\n'.join('𝜽%d=\t%.3f' % (i, theta) for i, theta in enumerate(model_ridge.coef_)))

print("\nL2 Norm of the Lasso model weights:", np.sum(model_lasso.coef_**2))
print('weights:')
print('\n'.join('𝜽%d=\t%.3f' % (i, theta) for i, theta in enumerate(model_lasso.coef_)))

plt.figure(figsize=(10, 10))
plt.scatter(X, y_obs, s=20)
plt.plot   (X, y_true, 'g')
plt.plot   (X, yr_pred, 'r')
plt.plot   (X, yl_pred, 'b')
plt.xlabel("$x$ [meters]",   fontsize=40)
plt.ylabel("$y$ [currency]", fontsize=40)
plt.grid()
#plt.savefig("p_regularization.png")

$\textbf{Excersice 5}$. **Try** polynomial functions with the **different degrees**, for example `degree = 5`, `degree = 10`, `degree = 20`. Explain the observations!

<h1 align="center">End of Seminar</h1>