# Jupyter - Day 6 - Section 002
## Lecture 6 - Multiple Linear Regression

In the last few lectures, we have focused on single input linear regression, that is, fitting models of the form 

$$
Y =  \beta_0 +  \beta_1 X + \varepsilon
$$

In this lab, we will continue to use two different tools for linear regression. 
- [Scikit learn](https://scikit-learn.org/stable/index.html) is arguably the most used tool for machine learning in python 
- [Statsmodels](https://www.statsmodels.org) provides many of the statisitcial tests we've been learning in class

In [1]:
# As always, we start with our favorite standard imports. 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
%matplotlib inline
import seaborn as sns
import statsmodels.formula.api as smf
from sklearn.linear_model import LinearRegression

## Multiple linear regression 
Next we get some code up and running that can do linear regression with multiple input variables, that is when the model is of the form

$$
Y =  \beta_0 +  \beta_1 X_1 +  \beta_2 X_2 + \cdots +  \beta_pX_p + \varepsilon
$$

In [2]:
from sklearn.datasets import load_diabetes
diabetes = load_diabetes(as_frame=True)
diabetes_df = pd.DataFrame(diabetes.data, columns = diabetes.feature_names)
diabetes_df['target'] = pd.Series(diabetes.target)

diabetes_df

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
0,0.038076,0.050680,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646,151.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204,75.0
2,0.085299,0.050680,0.044451,-0.005670,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.025930,141.0
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362,206.0
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641,135.0
...,...,...,...,...,...,...,...,...,...,...,...
437,0.041708,0.050680,0.019662,0.059744,-0.005697,-0.002566,-0.028674,-0.002592,0.031193,0.007207,178.0
438,-0.005515,0.050680,-0.015906,-0.067642,0.049341,0.079165,-0.028674,0.034309,-0.018114,0.044485,104.0
439,0.041708,0.050680,-0.015906,0.017293,-0.037344,-0.013840,-0.024993,-0.011080,-0.046883,0.015491,132.0
440,-0.045472,-0.044642,0.039062,0.001215,0.016318,0.015283,-0.028674,0.026560,0.044529,-0.025930,220.0


We first model `target = beta_0 + beta_1 *s1 + beta_2 * s5` using `scikitlearn`.

In [3]:
X = diabetes_df[['s1','s5']].values
y = diabetes_df['target'].values

multireg = LinearRegression() #<----- notice I'm using exactly the same command as above
multireg.fit(X,y)

print(multireg.coef_)
print(multireg.intercept_)

[-175.71107989 1006.71695008]
152.13348416289585


&#9989; **<font color=red>Q:</font>** What are the values for $\beta_0$, $\beta_1$, and $\beta_2$? Write an interpretation for the $\beta_2$ value in this data set. 

Your answer here

We next model `target = beta_0 + beta_1 *s1 + beta_2 * s5` using `statsmodels`. Do you get the same model?

In [4]:
# multiple least squares with statsmodel
multiple_est = smf.ols('target ~ s1 + s5', diabetes_df).fit()
multiple_est.summary()

0,1,2,3
Dep. Variable:,target,R-squared:,0.329
Model:,OLS,Adj. R-squared:,0.326
Method:,Least Squares,F-statistic:,107.6
Date:,"Wed, 10 Sep 2025",Prob (F-statistic):,9.63e-39
Time:,12:38:33,Log-Likelihood:,-2459.0
No. Observations:,442,AIC:,4924.0
Df Residuals:,439,BIC:,4936.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,152.1335,3.011,50.528,0.000,146.216,158.051
s1,-175.7111,73.872,-2.379,0.018,-320.898,-30.524
s5,1006.7170,73.872,13.628,0.000,861.530,1151.904

0,1,2,3
Omnibus:,10.295,Durbin-Watson:,2.025
Prob(Omnibus):,0.006,Jarque-Bera (JB):,10.497
Skew:,0.356,Prob(JB):,0.00526
Kurtosis:,2.748,Cond. No.,30.2


&#9989; **<font color=red>Q:</font>** What is the predicted model? How much trust can we place in the estimates?

*Your answer here*

&#9989; **<font color=red>Q:</font>** Run the linear regression to predict `target` using all the other variables. What do you notice about the different terms? Are some more related than others? 

*Your answer here*

&#9989; **<font color=red>Q:</font>** Earlier you determined the p-value for the `s1` variable when we only used `s1` to predict `target`. What changed about the p-value for `s1` now where it is part of a regression using all the variables. Why?

In [None]:
# Your answer here

![Stop Icon](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Vienna_Convention_road_sign_B2a.svg/180px-Vienna_Convention_road_sign_B2a.svg.png)

Great, you got to here! Hang out for a bit, there's more lecture before we go on to the next portion. 

In [None]:
# We're going to use the (fake) data set used in the beginning of the book. 
# you may need to find the data on the course website and change the path below!
advertising_df = pd.read_csv('../../../DataSets/Advertising.csv', index_col = 0)
advertising_df

# Q1: Hypothesis Test

&#9989; **<font color=red>Do this:</font>** Use the `statsmodels` package to fit the model 
$$
\texttt{Sales} = \beta_0 + \beta_1 \cdot \texttt{TV} + \beta_2\cdot \texttt{Radio} + \beta_3\cdot \texttt{Newspaper}
$$
What is the equation for the model learned? 

In [None]:
sm

&#9989; **<font color=red>Do this:</font>** Use the `summary` command for the trained model class to determine the F-statistic for this model. 
- What are the null and alternative hypotheses for the test this statistic is used for? 
- What is your conclusion of the test given this F-score?

![Stop Icon](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Vienna_Convention_road_sign_B2a.svg/180px-Vienna_Convention_road_sign_B2a.svg.png)

Great, you got to here! Hang out for a bit, there's more lecture before we go on to the next portion. 

# Q2: Subsets of variables

&#9989; **<font color=red>Q:</font>** List all 6 subsets of the three variables being used. 

*Your answer here*

&#9989; **<font color=red>Do this:</font>** Below is a command to get the RSS for the statsmodel linear fit. For each of the subsets listed above, what is the RSS for the learned model? Which is smallest? 

In [None]:
est.ssr



-----
### Congratulations, we're done!

Initially Written by Dr. Liz Munch, Adapted by Dr. Mengsen Zhang, Michigan State University
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.