# Multiple linear regression

Using the agent to infer linear generating function parameters instead of hidden states via gradient descent instead of analytic calculation of the maximum likelihood estimate.

==========================================================================

* **Notebook dependencies**:
    * ...

* **Content**: Jupyter notebook accompanying Chapter 3 of the textbook "Fundamentals of Active Inference"

* **Author**: Sanjeev Namjoshi (sanjeev.namjoshi@gmail.com)

* **Version**: 0.1

In [35]:
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np

from scipy.stats import norm
from types import SimpleNamespace

mpl.style.use("seaborn-deep")

In this example, we show how we can estimate the parameter values for an arbitrary number of $\beta$'s. In other words, all of our $\beta$'s of interest will be collected into a parameter vector $\boldsymbol{\theta} = \left [\beta^{(0)}, \dots, \beta^{(C)}) \right ]^\top$ and we can solve for all $\beta$'s simultaneuously. We will be using matrix notation in this example. In matrix form, the generative process is

$$
    \mathscr{E} \triangleq
    \begin{cases}
        y^{(i)} = g_{\mathscr{E}}({{\boldsymbol{x}^*}^{(i)}}; \boldsymbol{\theta}^*) + \omega_y^*    & \text{Outcome generation} \\
        g_{\mathscr{E}}({{\boldsymbol{x}^*}^{(i)}}; \boldsymbol{\theta}^*) = {{\boldsymbol{x}^*}^{(i)}}^\top \boldsymbol{\theta^*} & \text{Linear generating function} \\
        \boldsymbol{\omega}_y^* \sim \mathcal{N}(\mu = 0, \sigma^2 = 1) & \text{Gaussian noise} \\
        \boldsymbol{\theta}^* = \left [{\beta^*}^{(0)} = 3, {\beta^*}^{(1)} = 2 \right ]^\top & \text{Linear parameters}
    \end{cases}
$$

As we can see, this generative process is different from the one we have been using previously. Instead of being restricted to just 2 parameters, we generalize the function to $P$ parameters.

We construct a class to encapsulate the above mathematical notation.

In [14]:
class StaticEnvironment:
    def __init__(self, params: dict) -> None:
        self.params = SimpleNamespace(**params)
        
    def _noise(self):
        return np.random.normal(loc=0, scale=self.params.y_star_std)
    
    def _generating_function(self, x_star: np.array) -> float:
        return x_star.T @ self.params.theta_star
    
    def generate(self, x_star: float) -> float:
        x_star = np.insert(x_star, 0, 1)
        return self._generating_function(x_star) + self._noise()

In [70]:
# Environment parameters
env_params = {
    "theta_star"  : np.array([3., 2.]),   # Linear slope and intercept
    "y_star_std"  : 1.,                   # Standard deviation of sensory data
    "C"           : 2                     # Number of parameters
}

# Initialize environment with parameters
env = StaticEnvironment(params=env_params)

# Generate data
N       = 10                                         # Number of samples
C       = env_params["theta_star"].shape[0]          # Number of parameters
x_range = np.linspace(start=0.01, stop=5, num=500)   # Support of x
X_star  = np.random.choice(x_range, size=(N, C-1))   # N random external states
y       = np.zeros(N)                                # Empty array for N data samples

# Generate N samples
for idx, x in enumerate(X_star):
    y[idx] = env.generate(x)

The generative model is

$$
    \mathcal{M} \triangleq 
    \begin{cases}
        p_{\theta, X, \sigma^2_y}(y^{(i)}) = \mathcal{N}(y^{(i)}; {\boldsymbol{x}^{(i)}}^\top \boldsymbol{\theta}, \sigma^2_y) & \text{Likelihood} \\
        \boldsymbol{\theta} = \left [\beta^{(0)}, \beta^{(1)} \right ]^\top  & \text{Linear parameters} \\
        \phi = \left \{\sigma^2_y \right \}  & \text{Other parameters}
    \end{cases}
$$

Thus, we have a generating function that is defined $g_{\mathcal{M}}(\boldsymbol{X}; \boldsymbol{\theta}) = {\boldsymbol{x}^{(i)}}^\top \boldsymbol{\theta}$, where each sample $x^{(i)}$ is a vector of food sizes (dimension $p$, and with a $1$ inserted in the front of the vector) and $\theta$ is the parameter vector governing the relationship between the linear combination of $x$ elements. 

To learn the parameters from the data we use the following equation, the **normal equation**:

$$
\boldsymbol{\theta} = \underbrace{(\boldsymbol{X}^T \boldsymbol{X})^{-1} \boldsymbol{X}^T}_{\text{pseudoinverse}} \boldsymbol{y} = \boldsymbol{X}^{+} \boldsymbol{y}.
$$

The agent's data consists of $N$ vectors of states of length $P$, $\mathcal{X} \triangleq \left \{ {\boldsymbol{x}^*}^{(0)}, \dots {\boldsymbol{x}^*}^{(N)} \right \}$. Together with $\mathcal{Y} \triangleq \left \{y^{(0)}, \dots, y^{(N)} \right \}$ we have our dataset $\mathcal{D} \triangleq \left \{\mathcal{X}, \mathcal{Y} \right \} = \left \{\boldsymbol{x}^{(i)}, y^{(i)} \right \}^N_{i=0}$. 

The agent must use the state vectors to construct a data matrix $\boldsymbol{X}$ that we can use in the normal equation. We make a helper function to do this for us:

In [4]:
def build_data_matrix(X_star: np.ndarray) -> np.ndarray:
    return np.insert(X_star, 0, 1, axis=1) 

Now we can construct our agent. We will use the linear regression agent from notebook 3.1 as our template. Let's start with a minimal agent that just learns parameters. Because we are using the analytic update for maximum likelihood estimation in `mle_theta()`, there is no need for specifying the generative model at all. There are also no parameters as the only thing we are interested in $X$ and $y$.

In [71]:
class MultipleLinearRegressionAgent:
    def __init__(self) -> None:
        ...
        
    def mle_theta(self, X: np.ndarray, y: np.ndarray) -> np.ndarray:
        return np.linalg.pinv(X) @ y
    
    def build_data_matrix(self, X_star: np.ndarray) -> np.ndarray:
        return np.insert(X_star, 0, 1, axis=1) 
    
    def learn_parameters(self, X_star: np.ndarray, y: np.ndarray) -> None:
        X = build_data_matrix(self, X_star)
        self.theta = self.mle_theta(X, y)

Now, let's try to learn the parameters.

In [72]:
agent = MultipleLinearRegressionAgent()
agent.learn_parameters(X_star, y)
theta = agent.theta

print(f"theta_star: {env_params['theta_star']}.")
print(f"theta: {np.round(theta, 3)}.")


theta_star: [3. 2.].
theta: [3.898 1.695].


We can easily extend this to a model with five parameters. First we redefine the environment and generate data. We also bump the samples to $N=1000$. With more parameters we want to ensure we have more data.

In [73]:
# Environment parameters
env_params = {
    "theta_star"  : np.array([3., 2., 4., 5., 6.]), 
    "y_star_std"  : 1.,                   # Standard deviation of sensory data
    "C"           : 5                     # Number of parameters
}

# Initialize environment with parameters
env = StaticEnvironment(params=env_params)

# Generate data 
N       = 1000                                        # Number of samples
C       = env_params["theta_star"].shape[0]          # Number of parameters
x_range = np.linspace(start=0.01, stop=5, num=500)   # Support of x
X_star  = np.random.choice(x_range, size=(N, C-1))   # N random external states
y       = np.zeros(N)                                # Empty array for N data samples

# Generate N samples
for idx, x in enumerate(X_star):
    y[idx] = env.generate(x)

Then we learn the parameters

In [74]:
agent = MultipleLinearRegressionAgent()
agent.learn_parameters(X_star, y)
theta = agent.theta

print(f"theta_star: {env_params['theta_star']}.")
print(f"theta: {np.round(theta, 3)}.")

theta_star: [3. 2. 4. 5. 6.].
theta: [2.959 2.023 4.    4.979 6.023].
