## Exercise: Show that for OLS Linear regression the formula above produces right answer.
    

As seen in the Generalized Linear Statistical Models course of Princeton University[http://data.princeton.edu/wws509/notes/a1.pdf] :

Calculation of the MLE often requires iterative procedures. Consider expanding
the score function evaluated at the mle θˆ around a trial value θ0 using
a first order Taylor series, so that

$$ u(\hat θ) ≈ u(θ_0) + \frac{∂u(θ)}{∂θ}*(\hat θ - θ_0) $$

Let H denote the Hessian or matrix of second derivatives of the log-likelihood
function:
$$ H(θ) = \frac{∂^2*log(L)}{∂θ∂θ'} = \frac{∂u(θ)}{∂θ} $$

Setting the left-hand-size of the first Equation to zero and solving for \hat θ gives
the first-order approximation

$$ \hat θ = θ_0 - H^{-1}(θ_0)u(θ_0)  $$
 This result provides the basis for an iterative approach for computing the mle known as the Newton-Raphson technique. Given a trial value, we use the previous Equation to obtain an improved estimate and repeat the process until differences between successive estimates are sufficiently close to zero. (Or until the elements of the vector of first derivatives are sufficiently close to zero.) This procedure tends to converge quickly if the log-likelihood is wellbehaved (close to quadratic) in a neighborhood of the maximum and if the
starting value is reasonably close to the mle. An alternative procedure first suggested by Fisher is to replace minus the Hessian by its expected value, the information matrix. The resulting procedure takes as our improved estimate:

$$ \hat θ = θ_0 + Ι^{-1}(θ_0)u(θ_0) $$

and is known as Fisher Scoring.
Example: Fisher Scoring in the Geometric Distribution. In this case setting
the score to zero leads to an explicit solution for the mle and no iteration is
needed. It is instructive, however, to try the procedure anyway. Using the
results we have obtained for the score and information, the Fisher scoring
procedure leads to the updating formula

$$ \hat π = ο_0 + ( 1 - π_0 - π_0\bar y)π_0 $$

If the sample mean is \bar y = 3 and we start from π_0 = 0.1, say, the procedure
converges to the mle \hat π = 0.25 in four iterations. ✷

## Importing the SUSY data set with Pandas
#### Excercise: In what follows, use Pandas to import the first 10,000 examples and call that the training data and import the next 1000 examples and call that the test data.

In [5]:
# Import Pandas

import pandas as pd

filename="SUSY.csv" # Download SUSY.csv.gz from UCI ML REPOSITORY 
                    # (https://archive.ics.uci.edu/ml/datasets/SUSY) 
                    # and extract it (!~1GB~!) in this local folder
        
column_names =["signal", "lepton 1 pT", "lepton 1 eta", "lepton 1 phi", "lepton 2 pT", "lepton 2 eta", 
         "lepton 2 phi", "missing energy magnitude", "missing energy phi", "MET_rel", 
         "axial MET", "M_R", "M_TR_2", "R", "MT2", "S_R", "M_Delta_R", "dPhi_r_b", "cos(theta_r1)"]
        
# Import first 10k as train data
training_data_frame = pd.read_csv(filename, sep =',', nrows=10000)  

# Import next 1k as test data
test_data_frame = pd.read_csv(filename, sep=',', skiprows = 10000, nrows = 1000) # Skip first 10k, load next 1k

# 'separator' argument was optional as the argument for the 'read_csv'
# function as the file is comma-separated itself

# Uncomment the following if you want to check the sets
# print training_data_frame
# print test_data_frame

# Uncomment the following if you want to check the sets' dimensions
# print training_data_frame.shape
# print test_data_frame.shape

FileNotFoundError: File b'SUSY.csv' does not exist