# Maximum Likelihood Estimation Lab

## Problem Description

In this lab, we'll explore _Maximum Likelihood Estimation_ and strategies for implementing it in python while making use of industry-standard tools such as the `scipy` library!

## Objectives

In this lab, we will:

* Demonstrate a conceptual understanding of Maximum Likelihood Estimation, and what it is used for
* Demonstrate synthetic datasets from multiple different distributions for practice with MLE
* Demonstrate understanding as to why we use Negative Log Likelihood instead of Likelihood for MLE in python
* Write a general-purpose function for Maximum Likelihood Estimation by using industry-standard packages such as `scipy`


Run the cell below to import everything we'll need for this lab. 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
import statsmodels.api as sm
from statsmodels.base.model import GenericLikelihoodModel
from scipy.optimize import minimize

### Probability vs. Likelihood

Explain the difference between **_Probability_** and **_Likelihood_** below the line.  Use the two graphs below as aids for your explanation..    

<center><h3>Probability</h3></center>
<img src='probability.jpg' height=50% width=50%>
<br>
<br>
<center><h3>Likelihood</h3></center>
<img src='likelihood.jpg' height=50% width=50%>

________________________________________________________________________________________________________________________________

**_Probability_** is the amount of a distribution that falls between two different values in a distribution. When visualized, this amounts to the area under the curve between the two values of interest.  

**_Likelihood_** is the value for a fixed data point given a distribution (note that the distribution can change).  When visualized, this amounts to the y-axis value for the data point given the distribution it is visualized against. 



### Generating Datasets From Different Distributions

We're going to generate two different datasets to test our MLE function.  In the cell below:

* Create a sample Gaussian Distribution using numpy with 10,000 values in it. 
* Create a sample Exponential Distribution using numpy with 10,000 values in it. 
* Use a distplot from seaborn to visualize the distribution of each. 

For each distribution, the scale parameter should equal 1.0 (this is the default parameter for both functions).  

In [11]:
# Generate Gaussian Distribution
gaussian_x = 4 * np.random.normal(scale=1.0, size=10000) + 5
gaussian_y = 3 + gaussian_x + np.random.normal(scale=1.0, size=10000)
# Generate Exponential Distribution
exponential_x = np.random.exponential(scale=1.0, size=(10000, 2))
exponential_y = np.random.exponential(scale=1.0, size=(10000, 2))
#

gaussian_df = pd.DataFrame(gaussian, columns=['x', 'y'])
exponential_df = pd.DataFrame(exponential, columns=['x', 'y'])
gaussian_df['constant'] = 1
exponential_df['constant'] = 1

gaussian_df.head()

# plt.plot()
# display(sns.distplot(gaussian))
# plt.show()
# plt.plot()
# sns.distplot(exponential)
# plt.show()

Unnamed: 0,x,y,constant
0,-0.559576,1.837071,1
1,0.267077,-0.497606,1
2,0.656983,0.033125,1
3,-0.101089,0.710458,1
4,0.135346,-1.154525,1


### Log Likelihood vs. Negative Log Likelihood

In your own words, answer the following questions:

Why do we use the log of likelihood rather than just likelihood?  In terms of optimization operations, what is the relationship between log likelihood and negative log likelihood?

Bonus question: Why do we typically use negative log likelihood in python instead of likelihood or log likelihood? (This question may take a little research)

Write your answer to these questions below this line:
________________________________________________________________________________________________________________________________

We use the log instead of the raw likelihood value to help us avoid integer overflow/underflow errors. Maximizing the Log Likelihood (LL) is the same as minimizing the Negative Log Likelihood(NLL).  This is the reason we use negative log likelihood in python--optimization libraries provide a minimize function, but not a maximize function.  Therefore, in order to use a minimizer, we must make use of NLL instead of LL. 

### Negative Log Likelihood

In the cell below, complete the following negative log likelihood function. This function should take in an array of theta parameters and return the negative log likelihood for those parameters.  

In [None]:
def neg_log_likelihood(theta):
    mu = theta[0] + x * theta[1]
    return -1*norm(mu, theta[2]).logpdf(y).sum()

### MLE from Scratch With Scipy

### MLE and MAP with PyMC3

### Conclusion