# Tutorial 1: Fitting a Straight Line with `lsbi`

In this tutorial, we'll walk through the most fundamental use case for `lsbi`: fitting a straight line to noisy data. This is a classic linear regression problem, and it provides a perfect introduction to the core concepts of the library.

We will cover:
1. Setting up a synthetic dataset with a known "ground truth".
2. Translating the model `y = mx + c` into the `lsbi` matrix formulation.
3. Defining priors on our parameters (`m` and `c`).
4. Performing inference to compute the posterior distribution.
5. Visualizing the results to understand the information gained.

## 1. Setup and Data Generation

First, let's import the necessary libraries and create some mock data. We'll define a true slope `m` and intercept `c`, generate some data points, and add Gaussian noise.

In [None]:
# Imports and Data Generation
import numpy as np
import matplotlib.pyplot as plt
from lsbi import LinearModel

# Use a fixed seed for reproducibility
np.random.seed(42)

# 1. Define the true model parameters
theta_true = np.array([2.5, -1.0])  # [slope, intercept]
n_params = len(theta_true)

# 2. Generate x-values and the true y-values
x_data = np.linspace(-2, 2, 20)
y_data_true = theta_true[0] * x_data + theta_true[1]

# 3. Add Gaussian noise to create the observed data
data_noise_std = 0.5
y_data_noisy = y_data_true + np.random.normal(0, data_noise_std, size=x_data.shape)
n_data = len(x_data)

# 4. Plot the data to see what we're working with
plt.figure(figsize=(8, 6))
plt.plot(x_data, y_data_true, 'k-', label='True Line')
plt.errorbar(x_data, y_data_noisy, yerr=data_noise_std, fmt='o', capsize=3, label='Noisy Data')
plt.title("Synthetic Dataset")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.grid(True, linestyle='--')
plt.show()

## 2. Formulating the `lsbi` Model

Now comes the crucial step: translating our problem into the language `lsbi` understands. The library is based on the linear model:
$$ D = M\theta + m $$

- `D`: The data vector. For us, this is `y_data_noisy`.
- `θ`: The parameter vector. For us, this is `[slope, intercept]`.
- `M`: The model matrix that maps parameters to data.
- `m`: An optional data-space offset (we'll leave it as zero).

We also need to define the prior `P(θ)` and the data covariance `C`.

In [None]:
# Building the lsbi model components

# Data vector D is our noisy y-values
D = y_data_noisy

# The model matrix M transforms [slope, intercept] into y-values.
# y = slope * x + intercept * 1
# This can be written in matrix form:
# [y_1]   [x_1, 1]   [slope    ]
# [y_2] = [x_2, 1] @ [intercept]
# [y_3]   [x_3, 1]
# ...     ...
M = np.vstack([x_data, np.ones_like(x_data)]).T

# Define the prior P(θ) = N(μ, Σ)
# We'll use a broad, uninformative prior centered at zero.
mu = np.zeros(n_params)
# A diagonal covariance means we assume slope and intercept are a priori independent.
# The large variance (10) means we are very uncertain.
Sigma = np.eye(n_params) * 10 

# Define the data covariance C
# Our noise was independent and identically distributed, so C is a diagonal matrix.
# The diagonal entries are the variance of the noise (std^2).
C = np.eye(n_data) * data_noise_std**2

print(f"Shape of D: {D.shape}")
print(f"Shape of M: {M.shape}")
print(f"Shape of mu: {mu.shape}")
print(f"Shape of Sigma: {Sigma.shape}")
print(f"Shape of C: {C.shape}")

**Common Pitfall:** The most common error is constructing the `M` matrix. Always double-check that `M @ theta` produces a vector with the same shape as your data `D`. In our case, `M.shape` is `(20, 2)` and `theta.shape` is `(2,)`, so `M @ theta` gives a vector of shape `(20,)`, which matches `D`.

## 3. Performing Inference

With all the components defined, performing inference is a single line of code. We instantiate `LinearModel` and then call the `.posterior()` method with our data.

In [None]:
# Running the inference
# Instantiate the model
model = LinearModel(M=M, mu=mu, Sigma=Sigma, C=C)

# Compute the posterior distribution given the data D
posterior = model.posterior(D)

print("True parameters:\t", theta_true)
print("Posterior mean:\t\t", np.round(posterior.mean, 2))
print("Posterior covariance:\n", np.round(posterior.cov, 3))

The posterior mean is very close to our true parameters! The posterior covariance matrix tells us the uncertainty in our estimate and the correlation between the slope and intercept parameters.

## 4. Analyzing and Visualizing the Results

Numbers are good, but plots are better. Let's visualize our result by drawing samples from the posterior distribution and plotting the corresponding lines. This shows our uncertainty in the fit.

In [None]:
# Visualization
plt.figure(figsize=(8, 6))
# Plot the original data
plt.errorbar(x_data, y_data_noisy, yerr=data_noise_std, fmt='o', capsize=3, label='Noisy Data', zorder=0)
# Plot the true line
plt.plot(x_data, y_data_true, 'k-', lw=3, label='True Line', zorder=2)

# Draw 200 samples from the posterior
for i in range(200):
    theta_sample = posterior.rvs()
    plt.plot(x_data, M @ theta_sample, 'r-', alpha=0.1, zorder=1)

# Plot the posterior mean line
plt.plot(x_data, M @ posterior.mean, 'r--', lw=3, label='Posterior Mean Fit', zorder=3)

plt.title("Posterior Fit to Data")
plt.xlabel("x")
plt.ylabel("y")
# Add a dummy artist for the posterior samples legend entry
from matplotlib.lines import Line2D
dummy_line = Line2D([0], [0], color='red', lw=1, alpha=0.5)
plt.legend([plt.gca().lines[0], plt.gca().lines[1], dummy_line, plt.gca().lines[-1]], 
           ['Noisy Data', 'True Line', 'Posterior Samples', 'Posterior Mean Fit'])
plt.grid(True, linestyle='--')
plt.show()

The red cloud of lines represents our posterior knowledge. It is tightly constrained where we have data and fans out where we don't, which is exactly what we expect. Our posterior mean (dashed red line) is an excellent match for the true line (solid black line).

## 5. Understanding the Information Gain

Let's compare our prior beliefs with our posterior beliefs to understand what we learned from the data.

In [None]:
# Compare prior and posterior
prior = model.prior()

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Plot parameter distributions
theta_range = np.linspace(-5, 5, 100)

# Slope (parameter 0)
ax1.plot(theta_range, prior.marginalise([1]).pdf(theta_range[:, None]).flatten(), 
         'b-', label='Prior (Slope)', lw=2)
ax1.plot(theta_range, posterior.marginalise([1]).pdf(theta_range[:, None]).flatten(), 
         'r-', label='Posterior (Slope)', lw=2)
ax1.axvline(theta_true[0], color='k', linestyle='--', label='True Value')
ax1.set_xlabel('Slope')
ax1.set_ylabel('Probability Density')
ax1.set_title('Slope Parameter')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Intercept (parameter 1)
ax2.plot(theta_range, prior.marginalise([0]).pdf(theta_range[:, None]).flatten(), 
         'b-', label='Prior (Intercept)', lw=2)
ax2.plot(theta_range, posterior.marginalise([0]).pdf(theta_range[:, None]).flatten(), 
         'r-', label='Posterior (Intercept)', lw=2)
ax2.axvline(theta_true[1], color='k', linestyle='--', label='True Value')
ax2.set_xlabel('Intercept')
ax2.set_ylabel('Probability Density')
ax2.set_title('Intercept Parameter')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate the information gain
information_gain = model.dkl(D)
print(f"\nInformation gained from data: {information_gain:.2f} nats")
print(f"Information gained from data: {information_gain / np.log(2):.2f} bits")

## Summary

In this tutorial, we covered the essential workflow of `lsbi`:

1. **Problem Setup**: We started with a simple linear model `y = mx + c`
2. **Matrix Formulation**: We translated this into the `lsbi` framework using the model matrix `M`
3. **Prior Definition**: We specified our prior beliefs about the parameters
4. **Inference**: We used `model.posterior(D)` to compute the updated beliefs
5. **Analysis**: We visualized and interpreted the results

Key takeaways:
- The model matrix `M` is crucial - it defines how parameters map to data
- `lsbi` provides exact analytical solutions (no sampling required)
- The posterior uncertainty naturally reflects the information content of your data
- Visualization helps interpret the meaning of the mathematical results

Next, try Tutorial 2 to learn about `lsbi`'s powerful broadcasting capabilities for analyzing many datasets simultaneously!