# Statsmodels

Links:

* [Introduction](https://www.statsmodels.org/stable/index.html)
* [User Guide](https://www.statsmodels.org/stable/user-guide.html)
* [Examples](https://www.statsmodels.org/stable/examples/index.html)

In [1]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

%matplotlib inline
np.random.seed(0)

## [Introduction](https://www.statsmodels.org/stable/index.html)

The *statsmodels* package supports defining models using the R-style formulas and pandas Data Frames.
Here is an example using OLS.

In [2]:
dat = sm.datasets.get_rdataset('Guerry', 'HistData').data

results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()

results.summary()

  return dataset_meta["Title"].item()


0,1,2,3
Dep. Variable:,Lottery,R-squared:,0.348
Model:,OLS,Adj. R-squared:,0.333
Method:,Least Squares,F-statistic:,22.2
Date:,"Fri, 24 Jan 2020",Prob (F-statistic):,1.9e-08
Time:,07:57:54,Log-Likelihood:,-379.82
No. Observations:,86,AIC:,765.6
Df Residuals:,83,BIC:,773.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,246.4341,35.233,6.995,0.000,176.358,316.510
Literacy,-0.4889,0.128,-3.832,0.000,-0.743,-0.235
np.log(Pop1831),-31.3114,5.977,-5.239,0.000,-43.199,-19.424

0,1,2,3
Omnibus:,3.713,Durbin-Watson:,2.019
Prob(Omnibus):,0.156,Jarque-Bera (JB):,3.394
Skew:,-0.487,Prob(JB):,0.183
Kurtosis:,3.003,Cond. No.,702.0


Also, Numpy arrays can be used.

In [3]:
nobs = 100
X = np.random.random((nobs, 2))
X = sm.add_constant(X)
beta = [1, .1, .5]
e = np.random.random(nobs)
X[:10, :]

array([[1.        , 0.5488135 , 0.71518937],
       [1.        , 0.60276338, 0.54488318],
       [1.        , 0.4236548 , 0.64589411],
       [1.        , 0.43758721, 0.891773  ],
       [1.        , 0.96366276, 0.38344152],
       [1.        , 0.79172504, 0.52889492],
       [1.        , 0.56804456, 0.92559664],
       [1.        , 0.07103606, 0.0871293 ],
       [1.        , 0.0202184 , 0.83261985],
       [1.        , 0.77815675, 0.87001215]])

In [4]:
y = np.dot(X, beta) + e
y[:10]

array([1.72427192, 2.02906142, 1.74306438, 1.6692489 , 1.31276576,
       1.4108696 , 2.19899555, 1.5043651 , 1.95491097, 2.40949304])

In [5]:
results = sm.OLS(y, X).fit()
results.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.268
Model:,OLS,Adj. R-squared:,0.252
Method:,Least Squares,F-statistic:,17.72
Date:,"Fri, 24 Jan 2020",Prob (F-statistic):,2.76e-07
Time:,07:57:54,Log-Likelihood:,-21.221
No. Observations:,100,AIC:,48.44
Df Residuals:,97,BIC:,56.26
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1.4538,0.084,17.253,0.000,1.287,1.621
x1,0.0821,0.108,0.759,0.450,-0.133,0.297
x2,0.6335,0.107,5.942,0.000,0.422,0.845

0,1,2,3
Omnibus:,40.281,Durbin-Watson:,1.963
Prob(Omnibus):,0.0,Jarque-Bera (JB):,6.406
Skew:,0.032,Prob(JB):,0.0406
Kurtosis:,1.762,Cond. No.,5.58
