# Analytic solutions to most likely hidden state by MLE/MAP estimation

This notebook shows how one can use maximum likelihood estimation (MLE) or maximum a posteriori (MAP) estimation to find the most likely hidden state given some sensory observation.

==========================================================================

* **Notebook dependencies**:
    * ...

* **Content**: Jupyter notebook accompanying Chapter 2 of the textbook "Fundamentals of Active Inference"

* **Author**: Sanjeev Namjoshi (sanjeev.namjoshi@gmail.com)

* **Version**: 0.1

In [1]:
import numpy as np
import os
import sys

from types import SimpleNamespace

module_path = os.path.abspath(os.path.join(os.pardir, os.pardir))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from src.utils import create_environment

In Chapter 2 we introduced a strategy for finding the most likely value of the hidden state given sensory data. This approach is non-Bayesian because it does not directly involve the calculation of Bayes' theorem. Rather, this approach exploits the algebraic manipulation of the Gaussian distributions for the likelihood (and/or prior) to determine an analytic update for the hidden state.

## Maximum likelihood of the hidden state

We will first examine the **maximum likelihood estimate** (MLE) of the hidden state. To obtain this quantity we first start with the likelihood function in our model.

$$
    \mathcal{L}(x) \triangleq \prod_{i=0}^{N} p(y^{(i)} \mid x)
$$

Here we have emphasized that the likelihood is a function of $x$. 


To obtain the MLE, what we want to know is: "Which value of $x$ is has the highest credibility for having produced the observation $y^{(i)}$?" We write this as

$$
    x^{MLE} \triangleq \underset{x}{\text{argmax}} \hspace{1mm} \mathcal{L}(x)
$$

This represents our **loss function** or **objective** for finding the optimal hidden state estimate. It is customary to take the log of this quantity to turn the product into a sum and obtain the log-likelihood for the MLE. We denote this with the $\ell$ symbol as follows:

$$
    \ell(x) \triangleq \log \mathcal{L}(x) = \sum_{i=0}^N \log p(y^{(i)} \mid x) 
$$

Finally, we will also want to take the negative of this quantity, as is tradition, so we minimize this loss function instead of maximizing it. The objective thus becomes

$$
    x^{MLE} \triangleq \underset{x}{\text{argmin}} \hspace{1mm} -\sum^N_{i=0} \log p(y^{(i)} \mid x) = -\ell(x)
$$

First, we will assume the likelihood is Gaussian with a linear generating function. Then we can use the equation for the normal distribution, substitute in the linear generating function for the mean of this distribution in the equation, and take the negative log. This produces an algebraic equation for the negative log likelihood, our loss function. Finally, we take the partial derivative of this loss function with respect to the unknown variable, in this case the hidden state $x$, set it equal to zero, and solve for it. In this way, we solve for the value that minimizes the objective which is the following:

$$
    x^{MLE} = \frac{\bar{y} - \beta_0}{\beta_1}
$$

where $\bar{y}$ denotes the average over samples of $y$. 

Note that in doing so we do not get the entire posterior distribution but instead the **posterior mode** (which in the case of the Gaussian is identical to the mean). This is why the method is non-Bayesian. Furthermore, note that the likelihood variance does not appear in the equation. If we are only interested in the maximum of the posterior we do not care about the variance. 

Although it is overkill we will place this operation into a class to continue our emphasis of the interaction between agent and environment.

In [2]:
class LinearMaximumLikelihoodAgent:
    def __init__(self, params: dict) -> None:
        self.params = SimpleNamespace(**params)
        
        self.posterior_mode = None
        
    def infer_state(self, y: float):
        self.posterior_mode = (np.mean(y) - self.params.beta_0) / self.params.beta_1

Now let's generate an observation from the environment (where $\beta_0^*=3$ and $\beta_1^*=2$) and infer the state with the agent. 

In [3]:
# Environment parameters
env_params = {
    "beta_0_star" : 3,    # Linear parameter intercept
    "beta_1_star" : 2,    # Linear parameter slope
    "y_star_std"  : 1e-5   # Standard deviation of sensory data
}

# Initialize environment
env = create_environment(name="static_linear", params=env_params)

# Generate data
x_range = np.linspace(start=0.01, stop=5, num=500)
x_star = 2

env.build(x_star)
y = env.generate()

In [4]:
# Agent parameters
agent_params = {
    "beta_0" : 3,    # Linear parameter intercept
    "beta_1" : 2,    # Linear parameter slope
}

agent = LinearMaximumLikelihoodAgent(params=agent_params)
agent.infer_state(y)
posterior_mode = agent.posterior_mode

print(f'The posterior mode is {posterior_mode}. This is the most likely (expected) food size, the highest probability hidden state estimate, when the observed light intensity is 7.')

The posterior mode is 1.9999948781850838. This is the most likely (expected) food size, the highest probability hidden state estimate, when the observed light intensity is 7.


As we can see, the hidden state matches with the true external state, $x^*=2$. Let's now use $N=30$ samples. We should not need to make any changes to our maximum likelihood agent class.

In [5]:
# Environment parameters
env_params = {
    "beta_0_star" : 3,    # Linear parameter intercept
    "beta_1_star" : 2,    # Linear parameter slope
    "y_star_std"  : 0.25   # Standard deviation of sensory data
}

# Initialize environment and agent
env = create_environment(name="static_linear", params=env_params)

# Generate data for three different x_star values
x_star  = 2                                          # 3 different external states
N       = 30                                         # Number of samples
y       = np.zeros(N)                                # Empty array for i=20 samples

# Generate
for i in range(N):
    env.build(x_star)
    y[i] = env.generate()

In [6]:
agent = LinearMaximumLikelihoodAgent(params=agent_params)
agent.infer_state(y)
posterior_mode = agent.posterior_mode

print(f'The posterior mode is {posterior_mode}. This is the most likely (expected) food size, the highest probability hidden state estimate, when the observed light intensity is 7 with N={N} samples.')

The posterior mode is 1.967439738223851. This is the most likely (expected) food size, the highest probability hidden state estimate, when the observed light intensity is 7 with N=30 samples.


## Maximum a posteriori estimate of the hidden state

The maximum a posteriori (MAP) estimate is similar to the MAP estimate except that instead of just the likelihood we also add in a prior. In other words, we have the following objective:

$$
    x^{MAP} \triangleq \underset{x}{\text{argmin}} \hspace{2mm} -\sum^n_{i=0} \log p(y^{(i)} \mid x) - \log p(x)  = -\ell(x).  
$$

where, as before, we have taken the negative log of the expression. Following the same procedure for obtaining $x$ in the MLE case above, we can obtain a solution for the MAP case. This is the posterior mode of $x$ given all data samples under any prior beliefs the agent may have about $x$. The resulting analytic solution is:

$$
    x^{MAP} = \frac{\beta_1(\bar{y} - \beta_0) + m_x}{\beta_1^2 + 1}
$$

Other than the linear model parameters and the data, we also need to take into account the prior mean. Below we represent this equation in code.

In [7]:
class LinearMaximumAprioriAgent:
    def __init__(self, params: dict) -> None:
        self.params = SimpleNamespace(**params)
        
        self.posterior_mode = None
        
    def infer_state(self, y: float):
        self.posterior_mode = (self.params.beta_1 * (np.mean(y) - self.params.beta_0) + self.params.m_x) / (self.params.beta_1**2 + 1)

In [8]:
# Environment parameters
env_params = {
    "beta_0_star" : 3,    # Linear parameter intercept
    "beta_1_star" : 2,    # Linear parameter slope
    "y_star_std"  : 1e-5   # Standard deviation of sensory data
}

# Initialize environment
env = create_environment(name="static_linear", params=env_params)

# Generate data
x_range = np.linspace(start=0.01, stop=5, num=500)
x_star = 2

env.build(x_star)
y = env.generate()

In [9]:
# Agent parameters
agent_params = {
    "beta_0" : 3,    # Linear parameter intercept
    "beta_1" : 2,    # Linear parameter slope
    "m_x"    : 4,    # Prior mean
}

agent = LinearMaximumAprioriAgent(params=agent_params)
agent.infer_state(y)
posterior_mode = agent.posterior_mode

print(f'The posterior mode is {posterior_mode}. This is the most likely (expected) food size, the highest probability hidden state estimate, when the observed light intensity is 7.')

The posterior mode is 2.4000024293255326. This is the most likely (expected) food size, the highest probability hidden state estimate, when the observed light intensity is 7.


This result is identical to the posterior mode from the linear probabilistic generating functions notebook which utilized Bayesian inference to come to the same solution. In this case, the prior mean at $m_x = 4$ biases the model away from the data. 