# Assignment 2

### Extra Task
Implement matrix multiplication of two matrices

### Extra Task
Implement Hadamard product of two matrices

### Data 

#### Please use HW_gender data from Assignment 1. Use the weight to predict the height of a person. You can try different variants: per gender or for overall data. Please argue why did you prefer one variant over another in the report. 

# Linear Regression 

### Implement "loss" function
 

All the algorithms in machine learning rely on minimizing or maximizing a function, which we call “objective function”. The group of functions that are minimized are called “loss functions”. A loss function is a measure of how good a prediction model does in terms of being able to predict the expected outcome.

There is not a single loss function that works for all kind of data. It depends on a number of factors including the presence of outliers, choice of machine learning algorithm, time efficiency of gradient descent, ease of finding the derivatives and confidence of predictions. The purpose of this blog series is to learn about different losses and how each of them can help data scientists.

Loss functions can be broadly categorized into 2 types: Classification and Regression Loss.
<img src="loss.png" />
Our area of interest in this homework is **Regression Loss** and I will talk about 2 regression loss functions only.

### 1. Mean Square Error, Quadratic loss, L2 Loss
is the most commonly used regression loss function. MSE is the sum of squared distances between our target variable and predicted values.
<img src="mse.png" />

In [1]:
import numpy as np
def mse(true, pred):
    """
    true: array of true values    
    pred: array of predicted values
    
    returns: mean square error loss
    """
    
    return np.sum((true - pred)**2)

### 2. Mean Absolute Error, L1 Loss
is another loss function used for regression models. MAE is the sum of absolute differences between our target and predicted variables. So it measures the average magnitude of errors in a set of predictions, without considering their directions. (If we consider directions also, that would be called Mean Bias Error (MBE), which is a sum of residuals/errors). The range is also 0 to ∞.
<img src="mae.png" />

In [2]:
def mae(true, pred):
    """
    true: array of true values    
    pred: array of predicted values
    
    returns: mean absolute error loss
    """
    
    return np.sum(np.abs(true - pred))

### MSE vs. MAE (L2 loss vs L1 loss)
In short, using the squared error is easier to solve, but using the absolute error is more robust to outliers.

### Implement "fit" function gradient descent 

We minimized J(ϴ) by trial and error above — just trying lots of values and visually inspecting the resulting graph. There must be a better way? Queue gradient descent. Gradient Descent is a general function for minimizing a function, in this case the Mean Squared Error cost function.

Gradient Descent basically just does what we were doing by hand — change the theta values, or parameters, bit by bit, until we hopefully arrived a minimum.

We start by initializing theta0 and theta1 to any two values, say 0 for both, and go from there. Formally, the algorithm is as follows:

<img src="gd.png" />

where α, alpha, is the learning rate, or how quickly we want to move towards the minimum. If α is too large, however, we can overshoot.

<img src="gd2.png" />


In [3]:
import matplotlib.pyplot as plt
import numpy as np

# original data set
X = [1, 2, 3]
y = [1, 2.5, 3.5]

# slope of best_fit_1 is 0.5
# slope of best_fit_2 is 1.0
# slope of best_fit_3 is 1.5

hyps = [0.5, 1.0, 1.5] 

# mutiply the original X values by the theta 
# to produce hypothesis values for each X
def multiply_matrix(mat, theta):
    mutated = []
    for i in range(len(mat)):
        mutated.append(mat[i] * theta)

    return mutated

# calculate cost by looping each sample
# subtract hyp(x) from y
# square the result
# sum them all together
def calc_cost(m, X, y):
    total = 0
    for i in range(m):
        squared_error = (y[i] - X[i]) ** 2
        total += squared_error
    
    return total * (1 / (2*m))

# calculate cost for each hypothesis
for i in range(len(hyps)):
    hyp_values = multiply_matrix(X, hyps[i])

    print("Cost for ", hyps[i], " is ", calc_cost(len(X), y, hyp_values))

Cost for  0.5  is  1.0833333333333333
Cost for  1.0  is  0.08333333333333333
Cost for  1.5  is  0.25


### Implement "predict" function 

In [None]:
import numpy as np 
import matplotlib.pyplot as plt 
  
def estimate_coef(x, y): 
    # number of observations/points 
    n = np.size(x) 
  
    # mean of x and y vector 
    m_x, m_y = np.mean(x), np.mean(y) 
  
    # calculating cross-deviation and deviation about x 
    SS_xy = np.sum(y*x) - n*m_y*m_x 
    SS_xx = np.sum(x*x) - n*m_x*m_x 
  
    # calculating regression coefficients 
    b_1 = SS_xy / SS_xx 
    b_0 = m_y - b_1*m_x 
  
    return(b_0, b_1) 
  
def plot_regression_line(x, y, b): 
    # plotting the actual points as scatter plot 
    plt.scatter(x, y, color = "m", 
               marker = "o", s = 30) 
  
    # predicted response vector 
    y_pred = b[0] + b[1]*x 
  
    # plotting the regression line 
    plt.plot(x, y_pred, color = "g") 
  
    # putting labels 
    plt.xlabel('x') 
    plt.ylabel('y') 
  
    # function to show plot 
    plt.show() 
  
# def main(): 
    # observations 
    x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
    y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12]) 
  
    # estimating coefficients 
    b = estimate_coef(x, y) 
    print("Estimated coefficients:\nb_0 = {}  \ 
          \nb_1 = {}".format(b[0], b[1])) 
  
    # plotting regression line 
    plot_regression_line(x, y, b) 
  
# if __name__ == "__main__": 
#     main()

### Depict the plot of loss over iterations
 

### Choose the "learning rate" value, show the comparison to other values via loss plot
 

### Plot the regression line you have found
 