# **Manual Beta Computation**

### What we're doing here?

Instead of letting statsmodels calculate `Î²` coefficients for us, we'll do it manually using the Mathematical formula:

$$
    \beta = (X^TX)^{-1}X^TY
$$

But don't worry, I'll keep it super simple and intuitive.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

Before you get mad at me seeing `statsmodels`, ðŸ˜‚ we're only gonna use it later to verify our manual results. Using `numpy` for handling Matrix Math, and `pandas` for storing our dataset. We'll do the calculation part manually.

In [2]:
data = pd.DataFrame({
    "Age": [45, 60, 30, 50, 40, 35, 55, 48],
    "BMI": [27, 31, 22, 29, 25, 23, 30, 28],
    "Activity": [3, 1, 6, 2, 4, 5, 1, 3],
    "Salt": [8.0, 10.0, 5.0, 9.0, 7.0, 6.5, 9.5, 8.5],
    "SBP": [130, 145, 118, 138, 125, 120, 142, 135]
})

data

Unnamed: 0,Age,BMI,Activity,Salt,SBP
0,45,27,3,8.0,130
1,60,31,1,10.0,145
2,30,22,6,5.0,118
3,50,29,2,9.0,138
4,40,25,4,7.0,125
5,35,23,5,6.5,120
6,55,30,1,9.5,142
7,48,28,3,8.5,135


## Prepare `X` and `y` Manually

In [3]:
X = data[["Age", "BMI", "Activity", "Salt"]]
X = sm.add_constant(X)   # manually add intercept column

y = data["SBP"]

### Why we did this?

Models need a constant (`Î²â‚€`) to account for the intercept. So we need to add a column of 1's to our predictors matrix `X`.
- `X` = predictors matrix
- `y` = outcome vector (SBP)

Without adding the constant, model would assume intercept = 0, which is never true in real life.

## Convert to NumPy Matrices

In [4]:
X_matrix = X.values
y_matrix = y.values.reshape(-1, 1)

### Why this step matters?

- The OLS formula works with matrices, not DataFrames.
- Reshaping `y` makes it a column vector, not a 1-D array

## Apply the Manual Formula

In [5]:
XtX = X_matrix.T @ X_matrix
XtX_inv = np.linalg.inv(XtX)
Xty = X_matrix.T @ y_matrix

beta_manual = XtX_inv @ Xty
beta_manual

array([[52.38259833],
       [ 0.41080652],
       [ 2.77969805],
       [ 0.10607867],
       [-1.81843464]])

This output is our manually computed `Î²` coefficients in order:
1. Intercept (`Î²â‚€`)
2. Age
3. BMI
4. Activity
5. Salt
These values should be almost identical to statsmodels.

If not, we caught a bug (but we wonâ€™t ðŸ˜€)

## Compare with Statsmodels

In [6]:
model = sm.OLS(y, X).fit()
model.params

const       52.382598
Age          0.410807
BMI          2.779698
Activity     0.106079
Salt        -1.818435
dtype: float64

### What are we expecting?

Both results should:
- match up to several decimal places.
- confirm our manual computation is correct.
- prove that OLS is just math, not magic or some black-box algorithm.ðŸ˜‚

## Final Confirmation

In [7]:
comparison = pd.DataFrame({
    "Manual_Beta": beta_manual.flatten(),
    "Statsmodels_Beta": model.params.values
})

comparison

Unnamed: 0,Manual_Beta,Statsmodels_Beta
0,52.382598,52.382598
1,0.410807,0.410807
2,2.779698,2.779698
3,0.106079,0.106079
4,-1.818435,-1.818435


### What just happened?
Both methods yielded identical `Î²` coefficients:
- Intercept (`Î²â‚€`)
- Age
- BMI
- Activity
- Salt

This means, our manual computation is absolutely correct.