# Likelihood of an Estimated Probability Distribution  
## Setting  
- let $p(y)$ be a probablility distribution on $y$
- $p(y)$ is unknown, and we want to estimate it  
- assume $p(y)$ is either a
  - probablility density function on a continuous space $y$, or  
  - probability mass function on a discrete space $y$  
- Typically $y$ is   
  - $y = R$, $y = R^d$  
  - $y = \{-1,1\}$, $\{0,1,2,...,K\}$  
  
## Evaluating a Probability Distribution Estimate
Somebody gives us an estimate of the probability distribution $\hat{p}(y)$, How can we evaluate how good it is?   

## Likelihood of a Predicted Distribution
- Suppose we have  
$$D = (y_1,...,y_n)$$  
sampled i.i.d. from true distribution $p(y)$.  
- Then the **likelihood** of $\hat{p}$ for Data $D$ is defined to be:  
$$\hat{p}(\mathcal{D})=\prod_{i=1}^{n} \hat{p}\left(y_{i}\right)$$  
- If $\hat{p}$ is a probability mass function, then likelihood is probability


  


# Parametric Families of Distribution  
## Parametric Function  
A **parametric model** is a set of probability distributions indexed by a parameter $\theta \in \Theta$. We denote this as  
$$\{p(y ; \theta) \mid \theta \in \Theta\}$$  
where $\theta$ is the parameter and $\Theta$ is the parameter space  
## Poisson Family
- Support $Y = \{0,1,2,3,...\}$.  
- Parameter space: $\{\lambda \in R | \lambda > 0\}$  
- Probability mass function on $ k \in Y$  
$$p(k ; \lambda)=\lambda^{k} e^{-\lambda} /(k !)$$

## Beta Family  
- Support $Y = (0,1)$ [The unit interval.]   
- Parameter space: $\{\theta = (\alpha,\beta) | \alpha,\beta > 0\}$   
- Probability density function on $y \in Y$:  
$$p(y ; a, b)=\frac{y^{\alpha-1}(1-y)^{\beta-1}}{B(\alpha, \beta)}$$  

## Gamma Family  
- Support $Y = (0,\infty)$ [Positive real numbers]   
- Parameter space: $\{\theta = (k,\theta) | k > 0,\theta > 0\}$  
- Probability density function on $y \in Y$:  
$$p(y ; k, \theta)=\frac{1}{\Gamma(k) \theta^{k}} x^{k-1} e^{-y / \theta}$$




# Maximum Likelihood Estimation  
## Likelihood in a Parametric Model
Suppose we have a parametric model $\{p(y;\theta) | \theta \in \Theta \}$ and a sample $D = \{y_1,...,y_n\}$.  
- The **likelihood** of parameter estimate $\hat{\theta}$ for sample $D$ is  
$$p(\mathcal{D} ; \hat{\theta})=\prod_{i=1}^{n} p\left(y_{i} ; \hat{\theta}\right)$$  
- In practice, we prefer to work with the log-likelihood. Same maximum but
$$\log p(\mathcal{D} ; \hat{\theta})=\sum_{i=1}^{n} \log p\left(y_{i} ; \theta\right)$$  
easier to work with sum  

## Maximum Likelihood Estimation
The maximum likelihood estimator (MLE) for $\theta$ in the model $\{p(y,\theta) | \theta \in \Theta\}$ is  
$$\begin{aligned}
\hat{\theta} &=\underset{\theta \in \Theta}{\arg \max } \log p(\mathcal{D}, \hat{\theta}) \\
&=\underset{\theta \in \Theta}{\arg \max } \sum_{i=1}^{n} \log p\left(y_{i} ; \theta\right)
\end{aligned}$$   

## MLE Existence  
- In certain situations, the MLE may not exist.  
- e.g. Gaussian family $\{N(\mu,\sigma^2) | \mu \in R,\sigma^2 > 0\}$	 
- We have a single observation $y$  
- Taking $\mu = y$ and $\sigma^2 \to 0$ drives likelihood to inﬁnity  
- MLE doesn’t exist

## Example: MLE for Poisson
Observed counts $D = (k_1,...,k_n)$ for taxi cab pickups over $n$ weeks  
We want to ﬁt a Poisson distribution to this data  
The Poisson log-likelihood for a single count is  
$$\begin{aligned}
\log [p(k ; \lambda)] &=\log \left[\frac{\lambda^{k} e^{-\lambda}}{k !}\right] \\
&=k \log \lambda-\lambda-\log (k !)
\end{aligned}$$  
The full log-likelihood is  
$$\log p(\mathcal{D}, \lambda)=\sum_{i=1}^{n}\left[k_{i} \log \lambda-\lambda-\log \left(k_{i} !\right)\right]$$  
First order condition gives
$$\begin{aligned}
0=\frac{\partial}{\partial \lambda}[\log p(\mathcal{D}, \lambda)] &=\sum_{i=1}^{n}\left[\frac{k_{i}}{\lambda}-1\right] \\
\Longrightarrow \lambda &=\frac{1}{n} \sum_{i=1}^{n} k_{i}
\end{aligned}$$  
So MLE $\hat{\lambda}$ is just the mean of the counts  

## Estimating Distributions, Overﬁtting, and Hypothesis Spaces
- Just as in classiﬁcation and regression, MLE can overﬁt!  
- Choose the model with the highest likelihood on validation set.
