# Maximum Likelihood Estimation

In [1]:
# Import some helper functions (please ignore this!)
from utils import * 

**Context:** At this point, our modeling toolkit is already getting quite expressive. 
1. We can develop simple *predictive models* using *conditional distributions*: we can specify models of the form $p_{A | B}(a | b)$, which allow us to predict the probability that $A = a$ given that $B = b$. We do this by specifying a distribution over random variable (RV) $A$, whose parameters are a *function* of $b$.  
2. We can develop simple *generative models* using *joint distributions*: we can specify models of the form $p_{A, B}(a, b)$, which allow us to sample (or generate) data. We do this by factorizing this joint probability into a product of conditional and marginal distributions, e.g. $p_{A, B}(a, b) = p_{A | B}(a | b) \cdot p_B(b)$, which we already know how to specify.

Of course, the predictive and generative models you may have heard about in the news are capable of doing more than the instances we've covert so far---we will build up to these fancy models over the course of the semester. What's important for now, though, is that you understand how such models can be represented using probability distributions. 

**Challenge:** So what stands in our way of applying our modeling tools to real-world data? First, we've only instantiated our models with *discrete* distributions. Many real-world data, however, requires *continuous* distributions; that is, distributions over real numbers (e.g. blood pressure, body-mass index, time spent in REM sleep, etc.). We'll get more into the details of continuous modeling a bit later. Our second obstacle is: we still don't have a way of *automatically* fitting a model to data. So far, you've fit all models to data by hand via inspection---you looked at the data and tried to match the model to the data. With increasing model and data complexity, it becomes prohibitively difficult to fit the model to the data by hand. Today, we'll introduce one technique for doing this: maximum likelihood estimation (MLE). The idea behind MLE is to find a model under which the probability of the data is highest. 

**Outline:**
* Formally introduce and motivate the MLE
* Extend notation of directed graphical models to represent a full data-set instead of just one observation
* Implement MLE in `NumPyro`

## The MLE Goal

* Goal: describe probability of data given parameters, then maximize

## "Plate Notation" for Directed Graphical Models

* Introduce statistical independence (between observations)
* Introduce graphical representation

## MLE in `NumPyro`

* Vectorization in NumPyro distributions
* NumPyro primitives
* Implement model in NumPyro