# Linear Regression

Lets assume that we have $N$ observation and $M$ features. The problem of linear regression is defined as:

$$\vec{y}_p = \boldsymbol{X} \vec{w},$$
* $\vec{y}_p$, a vector of size $N$, represents our predictions. 
* $\boldsymbol{X}$ is a matrix of $(N\times M)$. 
* $\vec{w}$ is the fitting parameters, its our job to find them.

Let measure how good are predictions are, we use Mean Squared Error (MSE) to calculate the distance of our predictions from true values:

$$J = (\vec{y}_p - \vec{y}_t)^2$$

* $\vec{y}_t$ is a vector of true values. 
* $J$ is MSE and it is an scalar.

Now our job is to find $\vec{w}$ in such a way that it minimizes the cost function. Here we are going to use Stochastic Gradient Descent (SGD) to do that. 



In [3]:
def mse(y_t, y_p):
    '''returns the mean squared error.
    Parameters
    ----------
    y_t : numpy array, shape (n_samples,) 
        True labels.
    y_p : numpy array (float), shape (n_samples,)
        Output of classifier (not lables).
    
    Returns
    -------
    float
        Returns MSE.
    
    Example
    -------
    >>> import numpy as np
    >>> t = np.array([-1,1,1])
    >>> y = np.array([1,-1,0])
    >>> mse(t, y)
    9
    '''

In [6]:
def d_mse(X, y_t, y_p):
    '''
    Gradient of mean squared error. 

    Parameters
    ----------
    X : numpy array, shape (n_smaples, n_features)
        Matrix of features.    
    y_t : numpy array, shape (n_samples,)
        True labels.
    y_p : numpy array (float), shape (n_samples,)
        Output of classifier (not lables).
    
    Returns
    -------
    numpy array, shape (n_features,)
        Returns the gradient of MSE.
    
    Example
    -------
    >>> import numpy as np
    >>> X = np.array([[-1,-1],[1,1],[0,1]])
    >>> y_t = np.array([-1,1,1])
    >>> y_p = np.array([1,0,1])
    >>> d_mse(X, y_t, y_p)
    array([?, ?])
    '''

In [2]:
def l2(w):
    '''
    Return l2 penalty. 
    w : weights of the model, a numpy vector of (n_features)
    '''

def d_l2(w):
    '''
    Return gradient of l2 penalty.
    w : weights of the model, a numpy vector of (n_features)
    '''


In [None]:
def GD():
    '''
    Gradient Descent learning. 
    
    The default loss function is MSE and the default penalty is l2. 
    
    If the test set is provided, it keeps running until the cost stop decreasing. 
    If the test set is not provided, it keeps running until the improvment in the cost is less than **tol**.
    
    Parameters
    ----------
    eta: float 
        Learning rate, default to 0.0001.
        
    alpha: float
        Regularization parameter, defaults to 1.0
        
    b: int 
        batch size, defaults to 100.
    
    epoches: int
        number of epoches to train, defaults to 1000.
    
    normalize: bool
        True: normalize data by mean and std (default).
        False: do not change the data.
        
    tol: float 
        Stop training if the improvment in cost function is less than tol (when test set is not provided). 
        Defaults to 0.00001.

    Returns
    -------
    numpy array, shape (n_features,)
        Returns the weights (fitting parameters).
        
    Example
    -------
    >>> import numpy as np
    >>> import matplotlib.pyplot as plt
    ... (write your test heere.)
    '''

In [8]:
def SGD(X, y_t, y_p, eta=0.0001, alpha=1.0, b = 100, epoches=1000, tol=0.00001):
    '''
    mini-batch Stochastic Gradient Descent learning. 
    
    The default loss function is MSE and the default penalty is l2. 
    
    If the test set is provided, it keeps running until the cost stop decreasing. 
    If the test set is not provided, it keeps running until the improvment in the cost is less than **tol**.
    
    Parameters
    ----------
    eta: float 
        Learning rate, default to 0.0001.
        
    alpha: float
        Regularization parameter, defaults to 1.0
        
    b: int 
        batch size, defaults to 100.
    
    epoches: int
        number of epoches to train, defaults to 1000.
    
    normalize: bool
        True: normalize data by mean and std (default).
        False: do not change the data.
        
    tol: float 
        Stop training if the improvment in cost function is less than tol (when test set is not provided). 
        Defaults to 0.00001.

    Returns
    -------
    numpy array, shape (n_features,)
        Returns the weights (fitting parameters).
        
    Example
    -------
    >>> import numpy as np
    >>> import matplotlib.pyplot as plt
    ... (write your test heere.)
    '''
    

Now consider a simple linear model $y =  x + 1$. Given x, this gives us the true value of $y$, ($y_t$). 
Generate some $x$ and $y$ pairs, use your code to infer fitting parameters. Plot the ground truth $y = x +1$ and your fitted model for the interval of $x \in [-2,2]$.

Now generate some other $x$ and $y$ pairs, add a normally distributed noise ($\mu =0 $ $\sigma = 0.1$) to $y$ and infer fitting parameters again. Plot the theory and fitted line. 