---
title: "Stats Models: Linear Regression"
author: "Damien Martin"
date: "2024-09-28"
categories: [pandas, statsmodels]
---

# Problem

Everytime I want to do a linear regression with statsmodels, I have to look up the imports for linear regressions and formulas, so here it is.

We want to do a linear regression of Y on X1 (numeric) interacting with a discrete feature `category`

In [5]:
import pandas as pd
import numpy as np

X_cat1 = np.random.uniform(low=0, high=10, size=100) 
Y_cat1 = 3 * X_cat1 + np.random.rand(100)

X_cat2 = np.random.uniform(low=0, high=10, size=100)
Y_cat2 = 2.5 * X_cat2 + 2*np.random.rand(100)

X_cat3 = np.random.uniform(low=0, high=10, size=100)
Y_cat3 = X_cat3 + 0.5*np.random.rand(100)

data = pd.concat(
    [
        pd.DataFrame({'X': X_cat1, 'category': 'cat1', 'Y': Y_cat1}),
        pd.DataFrame({'X': X_cat2, 'category': 'cat2', 'Y': Y_cat2}),
        pd.DataFrame({'X': X_cat3, 'category': 'cat3', 'Y': Y_cat3}),
    ]
).reset_index(drop=True)

data

Unnamed: 0,X,category,Y
0,9.178447,cat1,28.017036
1,5.934363,cat1,18.521566
2,9.746314,cat1,29.970701
3,3.484075,cat1,10.955776
4,1.067588,cat1,3.801292
...,...,...,...
295,7.024184,cat3,7.421889
296,7.752491,cat3,8.027998
297,5.215522,cat3,5.620281
298,8.841100,cat3,8.887814


In [7]:
# statsmodels imports
import statsmodels.formula.api as smf

model = smf.ols('Y ~ X:category', data=data).fit()

In [8]:
model.summary()

0,1,2,3
Dep. Variable:,Y,R-squared:,0.997
Model:,OLS,Adj. R-squared:,0.997
Method:,Least Squares,F-statistic:,35690.0
Date:,"Sun, 29 Sep 2024",Prob (F-statistic):,0.0
Time:,19:49:57,Log-Likelihood:,-185.9
No. Observations:,300,AIC:,379.8
Df Residuals:,296,BIC:,394.6
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.6070,0.049,12.304,0.000,0.510,0.704
X:category[cat1],2.9904,0.010,288.525,0.000,2.970,3.011
X:category[cat2],2.5620,0.011,238.462,0.000,2.541,2.583
X:category[cat3],0.9419,0.012,80.026,0.000,0.919,0.965

0,1,2,3
Omnibus:,24.061,Durbin-Watson:,2.022
Prob(Omnibus):,0.0,Jarque-Bera (JB):,28.088
Skew:,0.664,Prob(JB):,7.96e-07
Kurtosis:,3.695,Cond. No.,6.84
