# Benchmark dos pacotes Statsmodels, Scipy.stats & Pingouin usando modelos lineares

## Install packages

### What is Statsmodels?
#### In according with Statsmodels documentation, statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
#####[https://www.statsmodels.org/stable/index.html]

In [None]:
!pip install statsmodels

### What is SciPy?
#### SciPy is a fundamental algorithms for scientific computing in python. scipy provides algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics and many other classes of problems.
##### [https://scipy.org/]

In [None]:
!pip install scipy

### What is Pingouin?
#### Pingouin is an open source statistical package written in Python 3 and based mostly on Pandas and Numpy. Pingouin is designed for users who want simple yet exhaustive stats functions.
##### [https://pingouin-stats.org/]

In [None]:
!pip install pingouin

## Loading packages

In [None]:
from matplotlib import pyplot as plt
from scipy import stats

import statsmodels.api as sm
import pingouin as pg
import numpy as np
import warnings

warnings.simplefilter("ignore")

## Creating Synthetic Data

### What is Synthetic Data?
#### Synthetic data is information that is artificially manufactured rather than generated by real world events. Synthetic data is created algorithmically, and it is used as a stand in for test datasets of production or operational data, to validate mathematical models and, increasingly, to train machine learning models.
##### [https://tinyurl.com/mr2eysv6]

In [None]:
rng = np.random.default_rng()
X = rng.random(100)
y = 1.6*X + rng.random(100)

In [None]:
X

In [None]:
y

## Statsmodels

In [None]:
sm_model = sm.OLS(y, X)
sm_res = sm_model.fit()
print(sm_res.summary())

## Scipy.stats

In [None]:
sp_res = stats.linregress(X, y)

In [None]:
print("R-squared: {:.2f}".format(sp_res.rvalue**2))

In [None]:
plt.plot(X, y, 'o', label='original data')
plt.plot(X, sp_res.intercept + sp_res.slope*X, 'r', label='fitted line')
plt.title('Analysing data with SciPy')
plt.xlabel('X - values')
plt.ylabel('y - values')
plt.legend()
plt.show()

## Pingouin

In [None]:
lm = pg.linear_regression(X, y)

In [None]:
lm.round(2)

### What is R-square (r²)?
#### R-square is the squared value of this correlation coefficient, and it has a very interesting interpretation. It represents the proportion of variability in the response variable explained by the predictor variable or explanatory variable. Also known as coefficient of determination.
##### [http://www.leg.ufpr.br]

Link on GitHub: https://github.com/lobo-death/Statsmodels_Scipy-stats_Pingouin 