# Scikit-learn and statsmodels for linear regression
The goal of this notebook is to create the code necessary for using linear regression with scikit-learn and statsmodels. The result of the code is used in the blog post [Spring into Linear Regression](https://medium.com/datadriveninvestor/spring-into-linear-regression-17cf2c0813f8).

### Preliminaries
Import the necessary modules.

In [20]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm

Read and view the data.

In [21]:
df = pd.read_csv("../data/IRIS.csv")

In [22]:
df.sample(6)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
119,6.0,2.2,5.0,1.5,Iris-virginica
63,6.1,2.9,4.7,1.4,Iris-versicolor
73,6.1,2.8,4.7,1.2,Iris-versicolor
19,5.1,3.8,1.5,0.3,Iris-setosa
88,5.6,3.0,4.1,1.3,Iris-versicolor
79,5.7,2.6,3.5,1.0,Iris-versicolor


### Stats on the data---with statsmodels

We make a correlation matrix between the variables.

In [23]:
df.corr()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
sepal_length,1.0,-0.109369,0.871754,0.817954
sepal_width,-0.109369,1.0,-0.420516,-0.356544
petal_length,0.871754,-0.420516,1.0,0.962757
petal_width,0.817954,-0.356544,0.962757,1.0


Next, We define the independent and dependent variables.

In [24]:
x = df[['sepal_length','petal_width']]
y = df['petal_length']

We use statsmodels to create a summary of the data.

In [25]:
predictors = sm.add_constant(x)
model = sm.OLS(y, predictors).fit()
model.summary()

0,1,2,3
Dep. Variable:,petal_length,R-squared:,0.948
Model:,OLS,Adj. R-squared:,0.948
Method:,Least Squares,F-statistic:,1350.0
Date:,"Fri, 17 Apr 2020",Prob (F-statistic):,2.5499999999999998e-95
Time:,11:04:44,Log-Likelihood:,-75.26
No. Observations:,150,AIC:,156.5
Df Residuals:,147,BIC:,165.6
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-1.5024,0.337,-4.452,0.000,-2.169,-0.835
sepal_length,0.5425,0.069,7.815,0.000,0.405,0.680
petal_width,1.7444,0.075,23.157,0.000,1.596,1.893

0,1,2,3
Omnibus:,1.175,Durbin-Watson:,1.352
Prob(Omnibus):,0.556,Jarque-Bera (JB):,0.771
Skew:,-0.064,Prob(JB):,0.68
Kurtosis:,3.327,Cond. No.,64.7
