In this notebook we show a simple example of perfroming linear regression using Maximum Likelihood Estimation. 

Some of the useful Resources we used can be found here:
* http://jekel.me/2016/Maximum-Likelihood-Linear-Regression/
* http://suriyadeepan.github.io/2017-01-22-mle-linear-regression/
* https://arxiv.org/pdf/1008.4686.pdf
    
    The last one is a research paper: 
    Data analysis recipes: Fitting a model to data.  
      Authors: David W. Hogg, Jo Bovy, Dustin Lang (2010)
        See Equations 9-11

In [1]:
from __future__ import division
from scipy.optimize import minimize
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from math import pi
from numpy import std, exp, log, log10


# 1. EXAMPLE 1
This example uses two separate functions to achieve the calculation. This is a good idea to keep the concepts separate, since you can use the LogLikelihood function with many models, not just a linear model. 

This function also uses the dot product to handle the errors: np.dot(error.T, error). This becomes problematic if we don't pass the function reasonable starting parameters. 

## The first example we show uses two separate functions:

In [7]:
def LogLikelihood(estimate, true, n):
    '''
    LogLikelihood(estimate, true, n)
    -- this function calculates the log-likelihood. To maximize 
        the likelihood you minimize: -1 * max(likelihood)

    
    PARAMETERS:
    ----------
    
    estimate: float, is the estimate for y based on the model. 
    true:     float, is the y-data value. 
    n:        float, length of the estimate. 
    
    RETURNS:
    ----------
    returns the log-likelihood. 
    
    
    NOTES:
    ----------
    The error is the true - estimate, or ydata - ymodel. These are also known 
    as the residuals of the fit; the distances between the actual data point 
    along the y-axis and the model. Errors/residuals are along y-axis only
    meaning they are the vertical offsets and not perpendicular offsets. 
    Vertical offsets are the most common in linear regression. 
    
    The likelihood function is set up for chi-squared as the statistic to minimize. 
        
    It is in likelihood format:
    L = ((1./(2*pi* (sig^2)))^(n/2)) * exp(- ( np.dot(error.T, error)/(2 * (sig^2))))

    In log-likelihood format:
    LogLike = -(n/2)*log(2*pi*(sig^2)) - (1/(2*(sig^2))) * np.dot(error.T, error)
    
    L is called the likelihood funtion. LogLike is the log of the likelihood function;
    thus, LogLike = log(L).
    
    error is the error matrix:
    (Y - X * theta).T  (Y-X * theta)
    
    .T stands for the transpose 
    
    '''
    error       = true - estimate   # ydata - yModel
    sigma       = std(error)        # residual
    print(sigma)
    L = ((1.0/(2.0*pi*sigma*sigma))**(n/2.)) * exp(-1*((np.dot(error.T, error))/(2.*sigma*sigma)))
    return log(L) # log-likelihood.

    
def line(parameters):
    '''
    line(parameters)
    -- Linear model. This is the function that will be minimized 
        by the LogLikelihood function. 
    
    PARAMETERS:
    ----------
    parameters: list, [m, b], where m and b are the slope and y-intercept. 
        
    '''
    m,b       = parameters # m:slope, b:yintercept
    yModel    = m * x + b  # estimate of y based on model. 
    f         = LogLikelihood(yModel, y, len(yModel))
    return (-1*f)

In [None]:
def LogLikelihood(estimate, true, n, sigma):
    error = true - estimate   # ydata - yModel, aka residual
    f = (-0.5*n*log(2.0*pi*sigma*sigma)) - (sum(error**2)/(2.0*sigma*sigma))
    return f
    
def line(parameters):
    m,b,sigma = parameters # m:slope, b:yintercept
    yModel    = m * x + b  # estimate of y based on model. 
    f         = LogLikelihood(yModel, y, len(yModel), sigma)
    return (-1*f)

def lnlike(theta, x, y, yerr):
    m, b, lnf = theta
    model = m * x + b
    inv_sigma2 = 1.0/(yerr**2 + model**2*np.exp(2*lnf))
    return -0.5*(np.sum((y-model)**2*inv_sigma2 - np.log(inv_sigma2)))

Another way to setup the LogLike function is to bring the log into the L function. If we do that, we change L to f, since it's no longer the likelihood, but the log-likelikelihood. 

    f = (-0.5*n*log(2.0*pi*sigma*sigma)) - (sum(error**2)/(2.0*sigma*sigma))
    return f
Notice how we handle the errors differently in this one. You still get the same results. 

### READ IN DATA 

We log our data because this particular linear relationship, the Amati Correlation equation, is often presented in a form found by logged x- and y-axis data. X-axis (logged Eiso), Y-axis (logged Epeak). We also divide Eiso energies by 1E52 so our y-intercept is inline with most publications. 

In [8]:
data  = pd.read_csv('/Users/KimiZ/GRBs2/Sample/AmatiDataSample.txt', sep=',', header=0)
x     = log10(data.eiso/(1.0E52))
y     = log10(data.epeakRest)

In [9]:
# initial input guess of 0.5 for slope and 2 for y-intercept.
res = minimize(line, np.array([0.5, 2]), method='L-BFGS-B')
res
      

0.220564660689
0.220564660048
0.220564660689
0.644496485464
0.644496493192
0.644496485464
0.224441058552
0.224441060198
0.224441058552
0.220243803733
0.220243804196
0.220243803733
0.220163944057
0.220163944463
0.220163944057
0.219894459343
0.21989445934
0.219894459343
0.219894444059
0.219894444059
0.219894444059
0.219894444059
0.219894444059
0.219894444059


      fun: -14.637374487654853
 hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
      jac: array([ -1.81188398e-05,  -1.13686838e-05])
  message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 24
      nit: 5
   status: 0
  success: True
        x: array([ 0.52089876,  2.05148951])

In [10]:
# no initial input guess, make all 1's. 
res = minimize(line, np.array([1,1]), method='L-BFGS-B')
res

0.451106704495
0.451106711673
0.451106704495
1.07689185281
1.07689186086
1.07689185281
0.650436032598
0.650436040335
0.650436032598
0.614767786556
0.614767794233
0.614767786556
0.475787803612
0.475787810903
0.475787803612
0.268333869846
0.268333874557
0.268333869846
0.387980808061
0.387980814835
0.387980808061
0.307083147033
0.307083141294
0.307083147033
0.223991640153
0.223991641718
0.223991640153
7.72679840429
7.72679839608
7.72679840429
1.75182417684
1.75182416868
1.75182417684
0.345430887631
0.34543088129
0.345430887631
0.242438130779
0.242438127317
0.242438130779
0.231676518168
0.23167651558
0.231676518168
0.224186755889
0.22418675749
0.224186755889
0.220698848858
0.220698848156
0.220698848858
0.219920088539
0.219920088665
0.219920088539
0.219895357216
0.219895357193
0.219895357216
0.219894447161
0.21989444716
0.219894447161
0.219894444106
0.219894444106
0.219894444106
0.219894444059
0.219894444059
0.219894444059
0.219894444059
0.219894444059
0.219894444059


      fun: -14.637374487654958
 hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
      jac: array([  8.88178420e-07,   1.42108547e-06])
  message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
     nfev: 66
      nit: 13
   status: 0
  success: True
        x: array([ 0.52089876,  2.05148951])

---

---

# LogLikelihood that returns scatter (sigma).
The above LogLikelihood function does not allow scatter to vary and return it as a parameter to be estiamted. This version will do that. 

In [None]:
def LogLikelihood(estimate, true, n, sigma):
    '''
    LogLikelihood(estimate, true, n)
    -- this function calculates the log-likelihood. To maximize 
        the likelihood you minimize: -1 * max(likelihood)

    
    PARAMETERS:
    ----------
    
    estimate: float, is the estimate for y based on the model. 
    true:     float, is the y-data value. 
    n:        float, length of the estimate. 
    
    RETURNS:
    ----------
    returns the log-likelihood. 
    
    
    NOTES:
    ----------
    The error is the true - estimate, or ydata - ymodel. These are also known 
    as the residuals of the fit; the distances between the actual data point 
    along the y-axis and the model. Errors/residuals are along y-axis only
    meaning they are the vertical offsets and not perpendicular offsets. 
    Vertical offsets are the most common in linear regression. 
    
    The likelihood function is set up for chi-squared as the statistic to minimize. 
        
    It is in likelihood format:
    L = ((1./(2*pi* (sig^2)))^(n/2)) * exp(- ( np.dot(error.T, error)/(2 * (sig^2))))

    In log-likelihood format:
    LogLike = -(n/2)*log(2*pi*(sig^2)) - (1/(2*(sig^2))) * np.dot(error.T, error)
    
    L is called the likelihood funtion. LogLike is the log of the likelihood function;
    thus, LogLike = log(L).
    
    error is the error matrix:
    (Y - X * theta).T  (Y-X * theta)
    
    .T stands for the transpose 
    
    '''
    error       = true - estimate   # ydata - yModel
    L = ((1.0/(2.0*pi*sigma*sigma))**(n/2.)) * exp(-1*((np.dot(error.T, error))/(2.*sigma*sigma)))
    return log(L) # log-likelihood.

    
def line(parameters):
    '''
    line(parameters)
    -- Linear model. This is the function that will be minimized 
        by the LogLikelihood function. 
    
    PARAMETERS:
    ----------
    parameters: list, [m, b], where m and b are the slope and y-intercept. 
        
    '''
    m,b,sigma = parameters # m:slope, b:yintercept
    yModel    = m * x + b  # estimate of y based on model. 
    f         = LogLikelihood(yModel, y, len(yModel), sigma)
    return (-1*f)

In [None]:
# with approximate estimates for the parameters as inputs
res = minimize(line, np.array([0.5, 2, 0.3]), method='L-BFGS-B')
res
    

In [None]:
# with no initial input guess.
res = minimize(line, np.array([1,1,1]), method='L-BFGS-B')
res

### The third parameter in res.x is the 1$\sigma$ scatter about the linear relation. 

In [None]:
def calcLogLikelihood(estimate, true, n, sigma):
    '''
    This is already set up for the chi-squared as the statistic to minimize.
    
    It is in likelihood format:
    f = ((1./(2*pi* (sig^2)))^(n/2)) * exp(- ( np.dot(error.T, error)/(2 * (sig^2))))
    f is the likelihood funtion
    loglike = log(f)
    
    In log-likelihood format, this would be:
    
    l(theta) = -(n/2)*log(2*pi*(sig^2)) - (1/(2*(sig^2))) * np.dot(error.T, error)
    where error is the error matrix:
    (Y - X * theta).T (Y-X * theta)
    
    '''
    error = true - estimate   # ydata - yModel, aka residual
    f = (-0.5*n*log(2.0*pi*sigma*sigma)) - (sum(error**2)/(2.0*sigma*sigma))
    return f
    
#     f           = ((1.0/(2.0*pi*sigma*sigma))**(n/2))* \
#                     exp(-1*((np.dot(error.T, error))/(2*sigma*sigma)))
#     return np.log(f)

    
def line(parameters):
    '''
    
    Function to be minimized. This is a linear model. 
    
    '''
    m,b,sigma = parameters # m-slope, b-yintercept, scatter
    yModel    = m * x + b  # estimate of y based on model. 
    f         = calcLogLikelihood(yModel, y, len(yModel), sigma)
    return (-1*f)

In [None]:
res = minimize(LogLikelihood, np.array([0.52, 2, 0.3]), method='L-BFGS-B'); res

In [None]:
res = minimize(LogLikelihood, np.array([1,1,1]), method='L-BFGS-B'); res

### The above can be combined into one function. 

In [None]:
def LogLikelihood(parameters):
    '''
    LogLikelihood(parameters)
    
    PARAMETERS:
    ----------
    parameters:  list of floats containing m, b, and sigma. m is slope, b in yintercept, 
                 sigma is the scatter about the fit. In our case, we want this returned. 
                 We leave it as a free variable. 
                 
    NOTES:
    ----------
    Take the log of the likelihood first, and then return that as f. 
    
    '''
    m, b        = parameters
    ymodel      = m * x + b    # linear model
    n           = len(ymodel)
    error       = y - ymodel
    sigma       = np.std(error)
    f           = (-0.5*n*log(2.0*pi*sigma*sigma)) - (sum(error**2)/(2.0*sigma*sigma))
    return (-1*f)

In [None]:
res = minimize(LogLikelihood, np.array([0.52,2]), method='L-BFGS-B'); res

In [None]:
res = minimize(LogLikelihood, np.array([1,1]), method='L-BFGS-B'); res

### We really should be passing the x and y data to the function, instead of assuming the function will read the global x and y variables. 

In [None]:
del x, y

In [None]:
def LogLikelihood(parameters, x, y):
    '''
    LogLikelihood(parameters)
    
    PARAMETERS:
    ----------
    parameters:  list of floats containing m, b, and sigma. m is slope, b in yintercept, 
                 sigma is the scatter about the fit. In our case, we want this returned. 
                 We leave it as a free variable. 
                 
    NOTES:
    ----------
    Take the log of the likelihood first, and then return that as f. 
    
    '''
    m, b        = parameters
    ymodel      = m * x + b    # linear model
    n           = len(ymodel)
    error       = y - ymodel
    sigma       = np.std(error)
    f           = (-0.5*n*log(2.0*pi*sigma*sigma)) - (sum(error**2)/(2.0*sigma*sigma))
    return (-1*f)

In [None]:
data  = pd.read_csv('/Users/KimiZ/GRBs2/Sample/AmatiDataSample.txt', sep=',', header=0)
xdata = np.log10(data.eiso/(1.0E52))
ydata = np.log10(data.epeakRest)

In [None]:
res = minimize(LogLikelihood, np.array([0.52,2]), method='L-BFGS-B', args=(xdata, ydata)); res

In [None]:
res = minimize(LogLikelihood, np.array([1,1]), method='L-BFGS-B', args=(xdata, ydata)); res

# PLOTS

In [None]:
#   perform least squares fit using scikitlearn
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
model = Pipeline([('poly', PolynomialFeatures(degree=2)),
    ('linear', LinearRegression(fit_intercept=False))])

model = model.fit(x[:, np.newaxis], y)
coefs = model.named_steps['linear'].coef_