# Optimization

In [1]:
# Import some helper functions (please ignore this!)
from utils import * 

**Context:** We can perform MLE on a class of models, composed of discrete distributions. 

**Challenge:**

**Outline:**


## Analytic Solutions to Optimization Problems



## Example of Analytic MLE

**Model.** Let's see how this works by analytically performing the MLE on a simple example. Suppose we want to model the probability of a patient being hospitalized overnight. We can do this using a Bernoulli distribution:
\begin{align}
H \sim p_H(\cdot; \rho) = \mathrm{Bern}(\rho).
\end{align}
Recall that the PMF of a Bernoulli RV is,
\begin{align}
p_H(h; \rho) = \rho^{\mathbb{I}(h = 1)} \cdot (1 - \rho)^{\mathbb{I}(h = 0)},
\end{align}
where $\mathbb{I}(\cdot)$ is an *indicator variable*---it evaluates to 1 if the condition in parentheses is true and 0 otherwise. 

**Joint Data Likelihood.** Now, let's write the joint data log-likelihood for our model:
\begin{align}
\log p(\mathcal{D}; \rho) &= \log \prod\limits_{n=1}^N p(\mathcal{D}_n; \rho) \quad (\text{since observations are i.i.d}) \\
&= \log \prod\limits_{n=1}^N p_H(h_n; \rho) \\
&= \log \prod\limits_{n=1}^N \rho^{\mathbb{I}(h_n = 1)} \cdot (1 - \rho)^{\mathbb{I}(h_n = 0)} \quad (\text{using the definition of Bernoulli PMF}) \\
&= \sum\limits_{n=1}^N \log \rho^{\mathbb{I}(h_n = 1)} + \log (1 - \rho)^{\mathbb{I}(h_n = 0)} \quad (\text{using the fact that } \log (x \cdot y) = \log x + \log y) \\
&= \sum\limits_{n=1}^N \mathbb{I}(h_n = 1) \cdot \log \rho + \mathbb{I}(h_n = 0) \cdot \log (1 - \rho) \quad (\text{using the fact that } \log x^y = y \cdot \log x)  \\
&= \underbrace{\left( \sum\limits_{n=1}^N \mathbb{I}(h_n = 1) \right)}_{\text{Total number of times $H = 1$}} \cdot \log \rho + \underbrace{\left( \sum\limits_{n=1}^N \mathbb{I}(h_n = 0) \right)}_{\text{Total number of times $H = 0$}} \cdot \log (1 - \rho) \quad (\text{moving terms that do not depend on the sums out}) \\
&= T \cdot \log \rho + (N - T) \cdot \log (1 - \rho)
\end{align}
where $T = \sum\limits_{n=1}^N \mathbb{I}(h_n = 1)$ is the total number of hospitalizations.

**MLE Objective.** Our MLE objective is therefore:
\begin{align}
\rho^\text{MLE} &= \mathrm{argmax}_{\rho} \text{ } \log p(\mathcal{D}; \rho) \\
&= \mathrm{argmax}_{\rho} \left( T \cdot \log \rho - (N - T) \cdot \log (1 - \rho) \right) \\
&= \mathrm{argmin}_{\rho} \underbrace{\left( -T \cdot \log \rho + (N - T) \cdot \log (1 - \rho) \right)}_{\text{Our loss function: } \mathcal{L}(\rho)} \quad (\text{maximizing a function is equivalent to minimizing its negative})
\end{align}

**Analytic Optimization.** We take the gradient of the above loss $\mathcal{L}(\rho)$ with respect to $\rho$, set it to $0$ and solve:
\begin{align}
0 &= \frac{d \mathcal{L}(\rho)}{d \rho} \\
&= -\frac{T}{\rho} + \frac{N - T}{1 - \rho} \quad (\text{taking the derivative of } \mathcal{L}(\rho)) \\
&= \frac{T - N \cdot \rho}{\rho \cdot (\rho - 1)} \quad (\text{bringing fractions under common denominator}) \\
&= T - N \cdot \rho \quad (\text{multiplying both sides by } \rho \cdot (\rho - 1)) \\
\rho &= \frac{T}{N}
\end{align}
The solution, $\rho = \frac{T}{N}$, is exactly the proportion of hospitalizations out of the total number of hospital visits!

## Challenges with Analytic Optimization


**Needs a specialized solution for every model.** 


**Cannot solve for the parameters for every model.**



## Numeric Solutions to Optimization Problems