<a href="https://colab.research.google.com/github/kostistzim/Introduction-to-Statistical-Learning-with-Python-Solutions/blob/main/3.Lab_Linear_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**3.6.1 Importing Packages**

In [7]:
pip install ISLP



In [8]:
import numpy as np
import pandas as pd
from matplotlib.pyplot import subplots

In [9]:
import statsmodels.api as sm

In [10]:
from statsmodels.stats. outliers_influence \
import variance_inflation_factor as VIF
from statsmodels.stats.anova import anova_lm

In [11]:
from ISLP import load_data
from ISLP.models import (ModelSpec as MS ,
summarize ,
poly)

**3.6.2 Simple Linear Regression**

In this section we will construct model matrices (also called design matri-
ces) using the **ModelSpec()** transform from ISLP.models .
We will use the **Boston housing data set**, which is contained in the ISLP
package. The Boston dataset records medv (median house value) for 506
neighborhoods around Boston. We will build a regression model to pre-
dict medv using 13 predictors such as rmvar (average number of rooms per
house), age (proportion of owner-occupied units built prior to 1940), and
lstat (percent of households with low socioeconomic status). We will use
statsmodels for this task, a Python package that implements several com-
monly used regression methods.
We have included a simple loading function load_data() in the ISLP package:

In [12]:
Boston=load_data("Boston")
Boston.columns

Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
       'ptratio', 'lstat', 'medv'],
      dtype='object')

We start by using the sm.OLS() function to fit a simple linear regression
model. Our response will be **medv** and **lstat** will be the single predictor.
For this model, we can create the model matrix by hand.

In [16]:
X = pd.DataFrame({'intercept': np.ones(Boston.shape[0]), 'lstat':(Boston['lstat'])})

X[:4]

Unnamed: 0,intercept,lstat
0,1.0,4.98
1,1.0,9.14
2,1.0,4.03
3,1.0,2.94


In [18]:
#We extract the response, and fit the model.
y=Boston['medv']
model=sm.OLS(y, X)
results=model.fit()

In [19]:
summarize(results)

Unnamed: 0,coef,std err,t,P>|t|
intercept,34.5538,0.563,61.415,0.0
lstat,-0.95,0.039,-24.528,0.0


Our model above has a single predictor, and constructing X was straight-
forward. In practice we often fit models with more than one predictor,
typically selected from an array or data frame. We may wish to introduce
transformations to the variables before fitting the model, specify interac-
tions between variables, and expand some particular variables into sets of
variables (e.g. polynomials). The sklearn package has a particular notion
for this type of task: a transform. A transform is an object that is created
with some parameters as arguments. The object has two main methods:
**fit()** and **transform()** .



In [20]:
design = MS(['lstat'])
design = design.fit(Boston)
X = design.transform(Boston)
X[:4]

Unnamed: 0,intercept,lstat
0,1.0,4.98
1,1.0,9.14
2,1.0,4.03
3,1.0,2.94
