# $l2 \text{-} norm$ vs $l1 \text{-} norm$
### By Jacob Marshall

# Introduction
Interactively show the difference between the $l2 \text{-} norm$ and $l1 \text{-} norm$ in regression. This is done by applying the $l2 \text{-} norm$ and $l1 \text{-} norm$ in two cases: Case 1 with an outlier in the sample data and Case 2 with noise in the sample data. The code provided is not critical to understanding the different methods but is needed to generate the interactive widgets.

# Nomenclature
$x_{i}$ and $y_{i} \hspace{10mm}$ Actual recorded values of sample data. <br>
$\hat{x}_{i}$ and $\hat{y}_{i}  \hspace{10mm}$ Input/Output values from a model, $f(\hat{x}_{i}) = \hat{y}_{i}$. It's just good convetion to use the $\hat{}$ symbol to signify that a value is input/output for a model. <br>
$r_{i}  \hspace{22mm}$ A residual, the difference between an actual recorded value and a predicted value. $r_{i} = y_{i} - \hat{y}_{i}$ <br>
$\epsilon \hspace{23mm}$ Model error, calculated by summing all the residuals. The objective of regression is to minimize this error.

# Theory
## $l2 \text{-} norm$
It is common in regression for the residuals to be squared before summation to find the error. Squaring the residuals makes the error a special kind of error called the summed squared error (SSE). Most people don't realize it but they are using the $l2 \text{-} norm$.
$$\epsilon = SSE =\sum^{n-1}_{i=0} r_{i}^2 = \sum^{n-1}_{i=0} (y_{i} - \hat{y}_{i})^2$$

## $l1 \text{-} norm$
An alternative to the $l2 \text{-} norm$ is the $l1 \text{-} norm$ where only the absolute value of the residual is taken before summation to fo find the error. <br>
$$ \epsilon = \sum^{n-1}_{i=0} \begin{vmatrix} r_{i} \end{vmatrix} = \sum^{n-1}_{i=0} \begin{vmatrix} y_{i} - \hat{y}_{i} \end{vmatrix}$$

# Code that is common to both cases

In [18]:
# import modules
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.optimize import minimize
import ipywidgets as widgets

In [19]:
# values for the two norms
l1_norm = 1
l2_norm = 2

# regresion (class): perform regression on the model y = mx + b
class regression():
    def __init__(self, y_sample, x_sample, norm):
        self.y_sample = y_sample 
        self.x_sample = x_sample 
        self.norm = norm
        
        parameters_guess = (1,1) # initial guess for m and b in model
        
        self.fit = minimize(self.obj, parameters_guess) # minimize the error (perform regression)
        self.parameters = self.fit.x # store regressed parameters
        self.error = self.obj(self.parameters)
        
        # use the regressed parameters in model
        self.x_model = np.linspace(min(self.x_sample), max(self.x_sample), 100)
        self.y_model = self.model(self.parameters, self.x_model)

    def obj(self, parmeters):
        y_hat = self.model(parmeters, self.x_sample)
        residual = self.y_sample - y_hat
        residual_normalized = (np.abs(residual))**self.norm
        error = np.sum(residual_normalized)
        return error
    
    def model(self, parameters, x):
        # the model is a straight line
        m, b = parameters
        return m * x + b

# plot_helper (def): help graph the results    
def plot_helper(y_true, x_true, l1, l2, word):
    plt.figure(figsize=(8, 6))
    plt.plot(l1.x_sample, l1.y_sample, 'gx', label='Sample Data with {}'.format(word)) # plot sample data
    plt.plot(x_true, y_true, 'k--', label='True value') # plot true value
    plt.plot(l2.x_model, l2.y_model, 'b-', label='Model using l2-norm') # plot l2-norm
    plt.plot(l1.x_model, l1.y_model, 'r-', label='Model using l1-norm') # plot l1-norm
    plt.grid()
    plt.ylim(0,30)
    plt.legend(bbox_to_anchor=[1.37, 1])
    plt.show()

# Case 1: Outlier in sample data

In [20]:
def outlier(magnitude):
    # create true data
    n = 20 # number of sample points
    y_true = np.ones(n) * 10
    x_true = np.linspace(0,100,n)
    
    # sample data set with an outlier
    y_outlier = y_true.copy()
    y_outlier[n//3] = y_outlier[n//3] * magnitude # add outlier to y sample data
    x_outlier = x_true.copy()
    
    outlier_l1 = regression(y_outlier, x_outlier, l1_norm)
    outlier_l2 = regression(y_outlier, x_outlier, l2_norm)
    
    plot_helper(y_true, x_true, outlier_l1, outlier_l2, 'outlier')
    print('l1 error: {} \t l1 paramters: {}'.format(outlier_l1.error, outlier_l1.parameters))
    print('l2 error: {} \t l2 paramters: {}'.format(outlier_l2.error, outlier_l2.parameters))

In [21]:
widgets.interact(outlier, magnitude=(1.1,3,0.1)) # adjust the magnitude of the outlier

<function __main__.outlier>

The model using the $l2 \text{-} norm$ becomes more biased as the outlier increases. The model using the $l1 \text{-} norm$ has little change in the event of an outlier. 

# Case 2: Noise in sample data

In [22]:
def noise(sigma):
    # create true data
    n = 20 # number of sample points
    y_true = np.ones(n) * 10
    x_true = np.linspace(0,100,n)
    
    # sample data set with noise
    y_noise = y_true.copy()
    y_noise = y_noise + np.random.normal(0, sigma, n) # add noise to y sample data
    x_noise = x_true.copy()
    
    noise_l1 = regression(y_noise, x_noise, l1_norm)
    noise_l2 = regression(y_noise, x_noise, l2_norm)
    
    plot_helper(y_true, x_true, noise_l1, noise_l2, 'noise')
    print('l1 error: {} \t l1 paramters: {}'.format(noise_l1.error, noise_l1.parameters))
    print('l2 error: {} \t l2 paramters: {}'.format(noise_l2.error, noise_l2.parameters))

In [23]:
widgets.interact(noise, sigma=(0.1,10,0.1)) # adjust the σ of the noise

<function __main__.noise>

The model using $l2 \text{-} norm$ tends to perform better than the model using $l1 \text{-} norm$. However there are times when there isn't much of a difference in performance between the two norms. 

# Conclusion
The disadvantage of the $l2 \text{-} norm$ is that it makes the model more baised toward outliers in the sample data. The advantage of the $l2 \text{-} norm$ is that it generally, not always, performs better when there is noise in the sample data set.