# Using _isthissignif_: mtcars

### Importing mtcars dataset 

In [32]:
import statsmodels.api as sm

#importing mtcars dataset from statsmodels
mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data

### Preview mtcars dataset

In [33]:
mtcars.head() #preview mtcars data

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


### Creating mtcars model: mpg ~ cyl

In [34]:
import statsmodels.formula.api as smf

#create regression model for mtcars data using statsmodels.formula.api package
mtcars_model = smf.ols(formula='mpg ~ cyl', data=mtcars).fit()

### Typical mtcars regression model summary

In [35]:
mtcars_model.summary() #view built-in regression summary output

0,1,2,3
Dep. Variable:,mpg,R-squared:,0.726
Model:,OLS,Adj. R-squared:,0.717
Method:,Least Squares,F-statistic:,79.56
Date:,"Wed, 14 Dec 2022",Prob (F-statistic):,6.11e-10
Time:,12:47:49,Log-Likelihood:,-81.653
No. Observations:,32,AIC:,167.3
Df Residuals:,30,BIC:,170.2
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,37.8846,2.074,18.268,0.000,33.649,42.120
cyl,-2.8758,0.322,-8.920,0.000,-3.534,-2.217

0,1,2,3
Omnibus:,1.007,Durbin-Watson:,1.67
Prob(Omnibus):,0.604,Jarque-Bera (JB):,0.874
Skew:,0.38,Prob(JB):,0.646
Kurtosis:,2.72,Cond. No.,24.1


###### Note: The standard OLS Regression Results summary from statsmodels gives all output; however all interpretations of these measures are up to whomever is analyzing the output.

### _isthissignif_ model summary for mtcars

In [36]:
from isthissignif import isthissignif 

#view isthissignif regression summary/interpretation output
isthissignif.isthissignif(mtcars_model)

('R-sqr: Variables predict 50% or more of variability in Y',
 'P-value(s): There is more than a 5% chance that the p-values of these variables do not happen by chance',
 'CIs: 95% confidence interval does not contain zero')

###### Using _isthissignif_ to supplement the statsmodels regression summary provides a quick overview of the model's statistical measures' implications: is there a relatively high R-squared value? Are the models' p-values smaller than the alpha significance threshold of 0.05? Is zero in the confidence intervals?

###### These quick regression model interpretations give a jumping-off point for further investigation. For example: if 0 is in one of your variable's confidence intervals, this means that the true value of the coefficient could be zero -- which might imply that your explanatory variable has a neglible effect, if any at all, on your response variable. 

# Using _isthissignif_: pandas df

### Creating dataframe for model

In [37]:
import pandas as pd

#create dataframe
df = pd.DataFrame({'x': [1, 1, 2, 2, 2, 2, 4, 4, 2, 2, 1, 1, 5, 5, 4, 4, 2, 2, 4, 4, 4, 4,],
                   'y': [76, 76, 78, 78, 85, 85, 88, 88, 72, 72, 69, 69, 94, 94, 94, 94, 88, 88, 92, 92, 90, 90]})

df.head() #preview dataframe

Unnamed: 0,x,y
0,1,76
1,1,76
2,2,78
3,2,78
4,2,85


### Creating model: y ~ x

In [29]:
import statsmodels.formula.api as smf

#create regression model for data using statsmodels.formula.api package
model2 = smf.ols(formula='y ~ x', data=df).fit()

### _isthissignif_ model summary

In [31]:
#view isthissignif regression summary/interpretation output
isthissignif.isthissignif(model2)

('R-sqr: Variables predict 50% or more of variability in Y',
 'P-value(s): There is more than a 5% chance that the p-values of these variables do not happen by chance',
 'CIs: 95% confidence interval does not contain zero')

###### Reviewing the numeric output below, we can confirm that the _isthissignif_ interpretations are, in fact, reliable.

### Typical statsmodels regression model summary

In [30]:
model2.summary() #view built-in regression summary output

0,1,2,3
Dep. Variable:,y,R-squared:,0.732
Model:,OLS,Adj. R-squared:,0.719
Method:,Least Squares,F-statistic:,54.7
Date:,"Wed, 14 Dec 2022",Prob (F-statistic):,3.83e-07
Time:,12:43:37,Log-Likelihood:,-63.897
No. Observations:,22,AIC:,131.8
Df Residuals:,20,BIC:,134.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,68.7731,2.306,29.829,0.000,63.964,73.583
x,5.4676,0.739,7.396,0.000,3.926,7.010

0,1,2,3
Omnibus:,0.215,Durbin-Watson:,0.633
Prob(Omnibus):,0.898,Jarque-Bera (JB):,0.407
Skew:,0.136,Prob(JB):,0.816
Kurtosis:,2.391,Cond. No.,7.9
