# Likelihood functions

Demonstration of how we can treat the conditional distribution $p(y \mid x)$ as a likelihood function.

==========================================================================

* **Notebook dependencies**:
    * ...

* **Content**: Jupyter notebook accompanying Chapter 2 of the textbook "Fundamentals of Active Inference"

* **Author**: Sanjeev Namjoshi (sanjeev.namjoshi@gmail.com)

* **Version**: 0.1

In [1]:
import numpy as np

from scipy.stats import norm

In Bayesian inference we treat the likelihood as a *function* rather than a probability distribution. The conditional distribution $p(y \mid x)$ can be used to generate the probability of the data, $y$, given some hidden state $x$. However, in Bayesian inference we are interested in the inverse relationship: having observed the data, we want to know the probability of the state that could have generated that data. This probability is encoded in the posterior distribution.

To obtain this distribution, we will treat $p(y \mid x)$ in a unique way which we call the **likelihood function**. Specifically, the likelihood function $\mathcal{L}(x)$, is a function of $x$ instead of $y$. What this means is that we start with an observation $y$ and then ask: "If we input the full range of $x$ into this function, what will the resulting curve look like?". Let's generate this curve first and then interpret its meaning.

In [3]:
# Set up modeling objects
y = 7

# We will take the likelihood at 5 different points of x. 
y                     = 7               # Observation from generative process
likelihoods           = np.zeros(5)     # Initialize empty likelihood array
std_y                 = 0.5             # Likelihood standard deviation

# Generating function of likelihood
def gm(x: float, beta_0: float, beta_1: float) -> float:
    return beta_1 * x + beta_0

# What is the likelihood for each of these five points?
for x in range(1, 6):
    # Likelihood at p(y = 7 | x = i)
    likelihood = norm.pdf(y, loc=gm(x, beta_0=3, beta_1=2), scale=std_y)
    likelihoods[x-1] = likelihood
    print(f"The likelihood when y={y} and x={x} is: {likelihood}.")

# for i in range(1, 6):
#     likelihood_mean = G(i, beta_0=3, beta_1=2)
#     log_likelihood = Normal(loc=likelihood_mean, scale=likelihood_noise)
#     likelihood = np.exp(log_likelihood.log_prob(obs))   # Likelihood at p(y = 7 | x = i)
#     likelihoods[i-1] = likelihood
#     print(f"The likelihood when y={obs.numpy()[0]} and x={i} is: {likelihood.numpy()[0]}.")

The likelihood when y=7 and x=1 is: 0.00026766045152977074.
The likelihood when y=7 and x=2 is: 0.7978845608028654.
The likelihood when y=7 and x=3 is: 0.00026766045152977074.
The likelihood when y=7 and x=4 is: 1.0104542167073785e-14.
The likelihood when y=7 and x=5 is: 4.292767471326121e-32.


Notice in the above output that we have held $y$ fixed at whatever the observation was and then we have passed in values of $x = \left \{1, 2, 3, 4, 5 \right \}$. This is why the likelihood is a function of $x$: we have varied $x$ while keeping $y$ constant. Next, lets plot these values.

In [None]:
plt.style.use("seaborn-whitegrid")
fig = plt.figure()
fig, ax = plt.subplots(1, 1, facecolor=(1,1,1))

# Plot linear generating function on canvas
for i in range(1, 6):
    ax.stem(i, likelihoods[i-1], linefmt="g--", markerfmt="go")

ax.set_xlabel("Food size", fontsize=18)
ax.set_ylabel("Credibility", fontsize=18)

# Axis labels and styling
ax.axes.grid(which="major", axis="both", c="#f2f2f2")
plt.setp(ax.spines.values(), color="black", linewidth=0.5)
ax.tick_params(
    labelsize=14,
    axis='both',          
    which='major',      
    bottom=True,
    left=True,
    color="black",
    width=0.5,
    length=3)