# Statsmodels

statsmodels.org

>`statmodels` is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator.

## Import the relevant packages

In [None]:
# conda install statsmodels
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

## Let's take a look using the IRIS dataset

In [None]:
iris = pd.read_csv('./data/IRIS.csv')
iris.head()

In [None]:
iris.plot.scatter(x = "sepal_length", y = "sepal_width")

## Simple linear regression

In [None]:
X = iris["sepal_length"]
X = sm.add_constant(X) # adding a constant

Y = iris["sepal_width"]

In [None]:
model = sm.OLS(endog= Y, exog= X).fit()
predictions = model.predict(X) 

model.summary()

### Visualizing the model

In [None]:
iris["predicted_sepal_width"] = predictions
iris.head()

In [None]:
ax = iris.plot.scatter(x = "sepal_length", y = "sepal_width", label = "Values")
iris.plot.scatter(x = "sepal_length", y = "predicted_sepal_width", c='r', ax = ax, label = "Predicted")

### Same model, defined using a R-like formulas

In [None]:
f = 'sepal_width ~ sepal_length'
m = smf.ols(formula = f, data = iris).fit()

m.summary()

## Linear mixed effects models

In [None]:
for spcs in iris["species"].unique():
    idx = iris["species"]==spcs
    (iris[idx].plot.scatter(x = "sepal_length", y = "sepal_width", title = spcs))

In [None]:
f_mlm = 'sepal_width ~ sepal_length'
m_mlm = smf.mixedlm(formula=f_mlm, data=iris, groups = iris["species"]).fit()
m_mlm.summary()

## Sci-kit learn

An alternate (and potentially more advanced) alternative is **sci-kit learn**.
Sci-kit is the leading package for (mostly) supervised machine-learning. This type of machine learning relies often on variants of regression analysis. Therefore, sci-kit has multiple advanced types of regression built-in.