
# Some useful libraries/tools

Very very brief introduction to the following:
- statsmodel
- scikit-learn API
- seaborn
- plotly

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## statsmodels

A powerful package for estimation of different statistical models.

Provides a straightforward way to perform linear regression (and other things)

Let us take our good old pendulum example and perform a quick linear
regression.

In [None]:
l, t = np.loadtxt('../data/pendulum.txt', unpack=True)
df = pd.DataFrame(dict(l=l, t=t))
df['tsq'] = df.t**2

In [None]:
import statsmodels.api as sm
import statsmodels.formula.api as smf


Create a linear model where `tsq` is linearly related to `l`. To learn more
about the syntax of formulae like this, see the [statsmodel
documentation](https://www.statsmodels.org/stable/)

In [None]:
# OLS below stands for Ordinary Least Squares
model = smf.ols('tsq ~ l', data=df)
# Now fit the model to the data.
results = model.fit()

In [None]:
print(results.summary())


Provides a lot of statistical information and could be useful later.

## Scikit-learn

- Extremely popular library
- Ton of ML methods to use
- Easy to install
- Very simple API


In [None]:
from sklearn.linear_model import LinearRegression


- Scikit-learn requires data in the form of a particular shape
- The main functions are `model.fit(X, Y)` and `model.predict(x)`
- Here `X` is in the form of a 2D array of shape `(n_sample, n_variables)`
- In our case we have one variable, `l` but its shape is `(90,)`, it should
  be `(90, 1)`

In [None]:
model = LinearRegression()
l = l.reshape(-1, 1)  # Note the use of -1
tsq = t*t
print(l.shape)

In [None]:
model.fit(l, tsq)

In [None]:
pred_t = model.predict(l)

In [None]:
plt.scatter(l, tsq)
plt.plot(l, pred_t, 'r', lw=3);

## Quick introduction to Seaborn

- Very powerful library for making statistical plots.
- Uses matplotlib underneath.
- See the [seaborn documentation](https://seaborn.pydata.org/) for more.
- Seaborn works with different data structures and natively supports pandas
  DataFrames.
- Seaborn also comes bundled with many datasets.
- Here are some sample plots from the seaborn tutorial.

In [None]:
import seaborn as sns

# This sets the matplotlib theme to a nicer set of defaults.
sns.set_theme()

In [None]:
# Example similar to what we did in our regression class.
# We are
df = pd.DataFrame(dict(l=l.ravel()[::5], tsq=tsq[::5]))
sns.lmplot(data=df, x='l', y='tsq');

In [None]:
# Load up the tips data.
tips = sns.load_dataset('tips')
tips.head()

In [None]:
sns.displot(data=tips, x="total_bill", col="time", kde=True);

In [None]:
sns.jointplot(data=tips, x="total_bill", y="tip", kind='reg');

In [None]:
# Load up the iris data.
iris = sns.load_dataset('iris')
iris.head()

In [None]:
sns.pairplot(iris, hue='species', height=2.5);


This is just a small sampler. Look at the documentation for more details.

## Quick introduction to Plotly

- In particular we will look quickly at `plotly.express`.
- Very convenient for web-based plots.
- See the [plotly express
  documentation](https://plotly.com/python/plotly-express/) for more
  information.
- The link above also has a link to a video that you can check out for an
  overview.
- Like seaborn, plotly also has several standard datasets for you to play
  with.

In [None]:
import plotly.express as px

In [None]:
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species")
fig


Notice the interactivity of the plotly widget. This is a major advantage on
the browser interface/jupyter lab when you use plotly.

In [None]:
# Plotting the iris dataset loaded earlier.
fig = px.scatter(
    df, x="sepal_width", y="sepal_length", color="species", marginal_y="violin",
    marginal_x="box", trendline="ols", template="simple_white"
)
fig.show()


- Some really cool examples from the documentation!
- If you've never seen or heard of gapminder before, see this video:
  https://www.youtube.com/watch?v=jbkSRLYSojo

In [None]:
df = px.data.gapminder()
fig = px.scatter(
    df.query("year==2007"), x="gdpPercap", y="lifeExp", size="pop", color="continent",
    hover_name="country", log_x=True, size_max=60
)
fig.show()

In [None]:
fig = px.scatter(
    df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
    size="pop", color="continent", hover_name="country", facet_col="continent",
    log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90]
)
fig.show()


Our hope is that these examples have piqued your interest, please check out
the respective documentation links given above to learn more.