## Interactions
In regression models, interactions refer to situations where the effect of one predictor variable on the outcome depends on the value of another predictor. Interactions help capture the complexity in the relationship between predictors and the response variable that cannot be explained by the main effects alone.

In simple terms, an interaction means that the relationship between two independent variables (predictors) and the dependent variable (outcome) is not purely additive

### Example interpretation:

Suppose you have a model where salary (Y) depends on education level (X1), experience (X2), and their interaction:


Salary=β0+β1Education+β2Experience+β3(Education×Experience)


If β3​ is positive and significant, it means that as experience increases, the return on education (i.e., the effect of education on salary) becomes stronger. Thus, a person with more education might earn more, but the effect is amplified for those with more years of experience.

In [66]:
import pandas as pd 
import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression

In [67]:
df =  pd.read_csv("./data/auto-mpg.csv")
df.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


## Interactions with categorical variables 

In [68]:
X = df[["cylinders","weight","horsepower"]]
y = df["mpg"]

In [69]:
def print_results(sk_model,ols_model):
    print(f"""

StatsModels intercept:    {ols_model.params["const"]}
scikit-learn intercept:   {sk_model.intercept_}

StatsModels coefficient:\n{ols_model.params}
scikit-learn coefficient: {sk_model.coef_}
""")

In [70]:
def build_ols_sk_model(X,y):
    
    ols_model = sm.OLS(y,sm.add_constant(X))
    ols_results = ols_model.fit()
    
    sk_model = LinearRegression()
    sk_results = sk_model.fit(X=X,y=y)
    
    print_results(sk_results,ols_results)
    
    return (ols_results,sk_results)

In [71]:
X_no_interaction = X.copy()

X_no_interaction = pd.get_dummies(X_no_interaction,columns=["cylinders"],dtype=int,drop_first=True)

X_no_interaction.head()

Unnamed: 0,weight,horsepower,cylinders_4,cylinders_5,cylinders_6,cylinders_8
0,3504,130,0,0,0,1
1,3693,165,0,0,0,1
2,3436,150,0,0,0,1
3,3433,150,0,0,0,1
4,3449,140,0,0,0,1


In [72]:
ols_no_interaction_results,sk_no_interaction_result = build_ols_sk_model(X=X_no_interaction,y=y)

ols_no_interaction_results.rsquared



StatsModels intercept:    37.70418097142376
scikit-learn intercept:   37.7041809714242

StatsModels coefficient:
const          37.704181
weight         -0.004636
horsepower     -0.060804
cylinders_4     7.025987
cylinders_5     9.055635
cylinders_6     3.286220
cylinders_8     5.959981
dtype: float64
scikit-learn coefficient: [-4.63595646e-03 -6.08044272e-02  7.02598690e+00  9.05563509e+00
  3.28621979e+00  5.95998125e+00]



np.float64(0.7420802416250425)

### With Interaction Term

In [73]:
X_interaction = X.copy()

X_interaction = pd.get_dummies(data=X,columns=["cylinders"],dtype=int,drop_first=True)

X_interaction.head()


Unnamed: 0,weight,horsepower,cylinders_4,cylinders_5,cylinders_6,cylinders_8
0,3504,130,0,0,0,1
1,3693,165,0,0,0,1
2,3436,150,0,0,0,1
3,3433,150,0,0,0,1
4,3449,140,0,0,0,1


In [74]:
X_interaction["cylinders_4 x weight"] = X_interaction["cylinders_4"] * X_interaction["weight"]
X_interaction.head()

Unnamed: 0,weight,horsepower,cylinders_4,cylinders_5,cylinders_6,cylinders_8,cylinders_4 x weight
0,3504,130,0,0,0,1,0
1,3693,165,0,0,0,1,0
2,3436,150,0,0,0,1,0
3,3433,150,0,0,0,1,0
4,3449,140,0,0,0,1,0


In [75]:
ols_interaction_results,sk_interaction_results = build_ols_sk_model(X=X_interaction,y=y)

ols_interaction_results.rsquared



StatsModels intercept:    32.86383358500352
scikit-learn intercept:   32.863833585003334

StatsModels coefficient:
const                   32.863834
weight                  -0.002771
horsepower              -0.057112
cylinders_4             17.605070
cylinders_5              7.803382
cylinders_6              1.778909
cylinders_8              2.540711
cylinders_4 x weight    -0.004480
dtype: float64
scikit-learn coefficient: [-2.77067150e-03 -5.71121208e-02  1.76050702e+01  7.80338158e+00
  1.77890866e+00  2.54071052e+00 -4.48024624e-03]



np.float64(0.7530740075429587)

for vehicles with 4 cylinders, there is an additional decrease of about 0.004 in miles per gallon for each additional pound of vehicle weight, above 

In [76]:
ols_interaction_results.summary()

0,1,2,3
Dep. Variable:,mpg,R-squared:,0.753
Model:,OLS,Adj. R-squared:,0.749
Method:,Least Squares,F-statistic:,167.3
Date:,"Wed, 27 Nov 2024",Prob (F-statistic):,1.83e-112
Time:,14:34:34,Log-Likelihood:,-1087.1
No. Observations:,392,AIC:,2190.0
Df Residuals:,384,BIC:,2222.0
Df Model:,7,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,32.8638,2.662,12.347,0.000,27.630,38.097
weight,-0.0028,0.001,-3.663,0.000,-0.004,-0.001
horsepower,-0.0571,0.012,-4.840,0.000,-0.080,-0.034
cylinders_4,17.6051,3.241,5.432,0.000,11.233,23.977
cylinders_5,7.8034,3.054,2.555,0.011,1.798,13.809
cylinders_6,1.7789,2.091,0.851,0.395,-2.333,5.891
cylinders_8,2.5407,2.356,1.079,0.281,-2.091,7.172
cylinders_4 x weight,-0.0045,0.001,-4.135,0.000,-0.007,-0.002

0,1,2,3
Omnibus:,56.141,Durbin-Watson:,0.861
Prob(Omnibus):,0.0,Jarque-Bera (JB):,102.35
Skew:,0.822,Prob(JB):,5.96e-23
Kurtosis:,4.888,Cond. No.,77200.0


### Interaction term with Numeric Variable

In [77]:
x_no_interaction_numerical = X.copy()
x_no_interaction_numerical.head()

Unnamed: 0,cylinders,weight,horsepower
0,8,3504,130
1,8,3693,165
2,8,3436,150
3,8,3433,150
4,8,3449,140


In [78]:
ols_num_no_inter_results,sk_num__no_inter_results =  build_ols_sk_model(X=x_no_interaction_numerical,y=y)
ols_no_interaction_results.rsquared



StatsModels intercept:    45.73681722345147
scikit-learn intercept:   45.736817223451965

StatsModels coefficient:
const         45.736817
cylinders     -0.388974
weight        -0.005272
horsepower    -0.042728
dtype: float64
scikit-learn coefficient: [-0.38897448 -0.0052723  -0.04272767]



np.float64(0.7420802416250425)

In [79]:
x_interaction_numerical = X.copy()
x_interaction_numerical["weight x horsepower"] = x_interaction_numerical["horsepower"] * x_interaction_numerical["weight"]
x_interaction_numerical.head()

Unnamed: 0,cylinders,weight,horsepower,weight x horsepower
0,8,3504,130,455520
1,8,3693,165,609345
2,8,3436,150,515400
3,8,3433,150,514950
4,8,3449,140,482860


In [80]:
ols_num_inter_results,sk_num_inter_results =  build_ols_sk_model(X=x_interaction_numerical,y=y)



StatsModels intercept:    63.475196864854084
scikit-learn intercept:   63.475196864821825

StatsModels coefficient:
const                  63.475197
cylinders              -0.213297
weight                 -0.010449
horsepower             -0.246788
weight x horsepower     0.000053
dtype: float64
scikit-learn coefficient: [-2.13297169e-01 -1.04485396e-02 -2.46787647e-01  5.31481340e-05]



In [81]:
ols_num_inter_results.rsquared

np.float64(0.7488166068991378)

In [82]:
ols_num_inter_results.summary()

0,1,2,3
Dep. Variable:,mpg,R-squared:,0.749
Model:,OLS,Adj. R-squared:,0.746
Method:,Least Squares,F-statistic:,288.4
Date:,"Wed, 27 Nov 2024",Prob (F-statistic):,1.15e-114
Time:,14:34:34,Log-Likelihood:,-1090.4
No. Observations:,392,AIC:,2191.0
Df Residuals:,387,BIC:,2211.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,63.4752,2.347,27.049,0.000,58.861,68.089
cylinders,-0.2133,0.278,-0.767,0.444,-0.760,0.334
weight,-0.0104,0.001,-11.847,0.000,-0.012,-0.009
horsepower,-0.2468,0.028,-8.877,0.000,-0.301,-0.192
weight x horsepower,5.315e-05,6.67e-06,7.964,0.000,4e-05,6.63e-05

0,1,2,3
Omnibus:,35.464,Durbin-Watson:,0.905
Prob(Omnibus):,0.0,Jarque-Bera (JB):,57.238
Skew:,0.591,Prob(JB):,3.72e-13
Kurtosis:,4.452,Cond. No.,4780000.0
