# Analysis of a multivariate experiment using multiple linear regression

## Running a multivariate experiment on the enjoyment of food as a function of food and topping. Food : ice cream and hotdog, and topping : between mustard and chocolate sauce

#### dataset borrowed from https://statisticsbyjim.com/regression/interaction-effects/



In [32]:
# import the modules
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf
from statsmodels.tools.eval_measures import rmse
import numpy as np

In [2]:
df = pd.read_csv('Interactions_Categorical.csv')

In [3]:
df.head()

Unnamed: 0,Enjoyment,Food,Condiment
0,81.926957,Hot Dog,Mustard
1,84.939774,Hot Dog,Mustard
2,90.286479,Hot Dog,Mustard
3,89.561802,Hot Dog,Mustard
4,97.676826,Hot Dog,Mustard


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Enjoyment  80 non-null     float64
 1   Food       80 non-null     object 
 2   Condiment  80 non-null     object 
dtypes: float64(1), object(2)
memory usage: 2.0+ KB


In [11]:
df.shape

(80, 3)

In [4]:
df.Food.value_counts()

Hot Dog      40
Ice Cream    40
Name: Food, dtype: int64

In [5]:
df.Condiment.value_counts()

Mustard            40
Chocolate Sauce    40
Name: Condiment, dtype: int64

In [80]:
new_df = df.copy()

In [68]:
new_df.Food = new_df.Food.str.split(' ').map(''.join)
new_df.Condiment = new_df.Condiment.str.split(' ').map(''.join)
new_df.head()

Unnamed: 0,Enjoyment,Food,Condiment
0,81.926957,HotDog,Mustard
1,84.939774,HotDog,Mustard
2,90.286479,HotDog,Mustard
3,89.561802,HotDog,Mustard
4,97.676826,HotDog,Mustard


In [69]:
new_df = pd.get_dummies(new_df,columns = ['Food','Condiment'])
new_df.head()

Unnamed: 0,Enjoyment,Food_HotDog,Food_IceCream,Condiment_ChocolateSauce,Condiment_Mustard
0,81.926957,1,0,0,1
1,84.939774,1,0,0,1
2,90.286479,1,0,0,1
3,89.561802,1,0,0,1
4,97.676826,1,0,0,1


In [78]:
new_df.corr()

Unnamed: 0,Enjoyment,Food_HotDog,Food_IceCream,Condiment_ChocolateSauce,Condiment_Mustard
Enjoyment,1.0,0.009453047,-0.009453047,0.1245843,-0.1245843
Food_HotDog,0.009453,1.0,-1.0,-3.552714e-16,2.803313e-16
Food_IceCream,-0.009453,-1.0,1.0,3.219647e-16,-3.247402e-16
Condiment_ChocolateSauce,0.124584,-3.552714e-16,3.219647e-16,1.0,-1.0
Condiment_Mustard,-0.124584,2.803313e-16,-3.247402e-16,-1.0,1.0


In [72]:
formula = 'Enjoyment ~ Food_HotDog*Condiment_ChocolateSauce + Food_IceCream*Condiment_ChocolateSauce + Food_HotDog*Condiment_Mustard + Food_IceCream*Condiment_Mustard'

model = smf.ols(formula, data = new_df).fit()

In [73]:
model.summary()

0,1,2,3
Dep. Variable:,Enjoyment,R-squared:,0.893
Model:,OLS,Adj. R-squared:,0.889
Method:,Least Squares,F-statistic:,212.4
Date:,"Sun, 22 May 2022",Prob (F-statistic):,7.41e-37
Time:,11:48:08,Log-Likelihood:,-240.33
No. Observations:,80,AIC:,488.7
Df Residuals:,76,BIC:,498.2
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,34.3644,0.249,138.129,0.000,33.869,34.860
Food_HotDog,17.2764,0.393,43.920,0.000,16.493,18.060
Condiment_ChocolateSauce,18.4239,0.393,46.837,0.000,17.640,19.207
Food_HotDog:Condiment_ChocolateSauce,-4.7480,0.622,-7.634,0.000,-5.987,-3.509
Food_IceCream,17.0880,0.393,43.441,0.000,16.305,17.871
Food_IceCream:Condiment_ChocolateSauce,23.1719,0.622,37.256,0.000,21.933,24.411
Condiment_Mustard,15.9405,0.393,40.524,0.000,15.157,16.724
Food_HotDog:Condiment_Mustard,22.0244,0.622,35.411,0.000,20.786,23.263
Food_IceCream:Condiment_Mustard,-6.0839,0.622,-9.782,0.000,-7.323,-4.845

0,1,2,3
Omnibus:,2.073,Durbin-Watson:,1.985
Prob(Omnibus):,0.355,Jarque-Bera (JB):,2.035
Skew:,0.325,Prob(JB):,0.362
Kurtosis:,2.566,Cond. No.,5.5700000000000004e+32


In [74]:
X = new_df.drop(columns='Enjoyment')
y = new_df.Enjoyment

prediction = model.predict(X)
rmse(y, prediction)

4.879929667933398

* we can see that coefficient indicate that, the main elements of food (Hot dog or Ice cream) actually do not have different influence on the enjoyment (17.27 and 17.08) - people don't necessarily prefer one over the other.

* The main elements of condiment (Chocolate Sauce or Mustard) have a different influence on the enjoyment (18.4 and 15.9) - people prefer more Chocolate Sauce.

* The interaction between food and topping is actually impactful, with Chocolate Sauce on Ice cream having the biggest positive impact (23.17), and mustard on ice cream having the biggest negative impact (-6.08).

* The model Root Mean squared error: 4.88
* Coefficient of determination: 89.3%