# 1. Univariate Regression
### 1.1. Model Definition
The model defines a linear relationship between a single feature and a real-valued label.

$$y=\beta_0+\beta x+\epsilon$$

where 
- $\beta_0$ is the intercept
- $\beta$ are regression coefficients
- $\epsilon \sim N(0,\sigma^2)$

### 1.2. MCLE  for Univariate Regression (mean-centered)
We fit the model without the intercept by using mean-centered data.

$$\hat{ \beta}_{MCLE}=\underset{\beta}{argmin}\sum_{i=1}^N(y_i - \hat{y}_i)^2$$

where $\hat{y}_i=\beta x_i$

***

$argmin$ can be solved in closed-form:

$$\hat{ \beta}_{MCLE}=\frac{\sum_{i=1}^N x_iy_i}{\sum_{i=1}^N (x_i)^2}$$


In [1]:
class UnivariateRegression:
        

    def fit(self, X, y, mean_centered=True):
        """
        Learns regression coefficient beta given X,y.

        Args:
            X: 1D data set

            y: 1D label

            mean_centered: True if data is mean-centered, False otherwise.

        Return:
            UnivariateRegression model.
        """
        if mean_centered:
            Xy_sum = np.dot(X.T,y).item(0)  
            X_square_sum = np.dot(X.T,X).item(0)
            beta = Xy_sum / X_square_sum
        else:
            raise NotImplementedError('mean_centered=False is not implemented.')

        self.weights = beta
        
        return self



# 2. Multivariate Regression

### 2.1. Model Definition (augmented)
The model defines a linear relationship between multiple features and a real-valued label. The augmented version moves $\beta_0$ into the $\beta$ vector.

$$y=X^T\beta  +\epsilon$$

where 
- $\beta$ is a vector of regression coefficients, including the intercept $\beta_0$
- $\epsilon \sim N(0,\sigma^2)$

### 2.2. MCLE for Multivariate Regression (augmented)

$$\hat{ \beta}_{MCLE}=\underset{\beta}{argmin} \left( y-X\beta \right)^T \left( y-X\beta \right)$$


***


When $X$ is invertible, $argmin$ can be solved in closed-form:

$$\hat{ \beta}_{MCLE}=(X^TX)^{-1}X^Ty$$

###  2.3. MAP estimation for Multivariate Regression (mean-centered) (ridge)
Also known as Ridge Regression, this method is used when $N<J$, where $N$ is the number of samples and $J$ is the the number of features. In this case, it is not possible to use MLE because the matrix $X^TX$ is not invertible. Instead, MAP estimation can be used. Note that no prior is placed on the intercept, so we assume that the data has been mean-centered.

$$\hat{ \beta}_{MAP}=\underset{\beta}{argmin} \left( \left( y-X\beta \right)^T \left( y-X\beta \right)+\lambda\beta^T\beta\right)$$

where 
- $\lambda$ is a regularization parameter, $\lambda=\frac{\sigma^2}{\sigma_0^2}$
- $\sigma^2$: variance of the data
- $\sigma_0^2$: variance of the prior

***

$argmin$ can be solved in closed-form:

$$\hat{ \beta}_{MAP}=\left( X^TX + \lambda I \right)^{-1} X^Ty$$



In [3]:
class RidgeRegression:
    
    def __init__(self, l):
        '''
        Args:
            l: regularization parameter lambda
        '''
        self.l = l
    
    def fit(self, X,y):
        """
        Learns multivariate regression coefficients using ridge regression.

        Args:
            X: multivariate data set

            y: 1D label

        Return:
            RidgeRegression model.
        """
        cols = X.shape[1]

        a = np.matmul(X.T,X)   
        b = np.identity(cols)
        c = np.multiply(self.l,b)
        d = np.add(a,c)
        e = np.linalg.inv(d)  # invert
        f = np.matmul(e,X.T)
        g = np.matmul(f,y)

        self.weights = g
        
        return self


### 2.4. MAP Estimation for Multivariate Regression  (lasso)
Sparse regression

$$\hat{ \beta}_{MAP}=\underset{\beta}{argmin} \left( \left( y-X\beta \right)^T \left( y-X\beta \right)+\lambda \sum_{i=1}^J |\beta_i|\right)$$

where 
- $\lambda$ is a regularization parameter, $\lambda=\frac{\sigma^2}{\sigma_0^2}$
- $\sigma^2$: variance of the data
- $\sigma_0^2$: variance of the prior
- $|\beta_i|$: absolute value of each of the weights
