# 4. Modeling with Lines

In this chapter, we look at a common motif in statistics: **linear models**.

This chapter will cover the following topics:

- Simple linear regression
- NegativeBinomial regression
- Robust regression
- Logistic regression
- Variable variance
- Hierarchical linear regression
- Multiple linear regression


## 4.1 Simple linear regression

Many problems in science, engineering, and business have a linear form.
That is, we have a variable, $X$, and we want to model or predict a
variable, $Y$, and the relationship between element of $X$ and $Y$
is a **linear** relationship.

In the simplest scenario, simple linear regression, both $X$ and $Y$
are uni-dimensional continuous **random** variables.

Typically, we use the following terminology:

- $Y$ is the **dependent**, **predicted**, or **outcome** variable
- $X$ is the **independent**, **predictor** or **input** variable

Some typical situations where the linear regression model can be use are:

- Model the relationship between soil salinity and crop productivity.
Then answer questions like "Is this relationship linear?" or "How
strong is this relationship?"
- Find a relationship between the average chocolate consumption by
country and the number of Nobel laureates in that country, and then
understand why that relationship could be **spurious**.
- Predict the gas bill (used for heating and cooking) of your house
by using the solar radiation from the local weather report. How
accurate is this prediction?

In _Chapter 2_, we saw the Normal model. We can think of this model
as follows:

$$
\begin{align*}
\mu &\sim some\ prior \\
\sigma &\sim some\ other\ prior \\
Y &\sim \mathcal{N}(\mu, \sigma)
\end{align*}
$$

The main idea of linear regression is to extend this model by adding
a predictor variable $X$ to the estimation of mean, $\mu$.

$$
\begin{align*}
\alpha &\sim a\ prior \\
\beta &\sim another\ prior \\
\sigma &\sim some\ other\ prior \\
\mu &= \alpha + \beta X \\
Y &\sim \mathcal{N}(\mu, \sigma)
\end{align*}
$$

This model says that a linear relationship exists between $X$
and $Y$. However, that relationship is **not deterministic**
because of the noise term, $\sigma$.

Additionally, this model states that the mean of $Y$ is a linear
function of $X$ with **intercept** $\alpha$ and **slope** $\beta$.
However, because we **do not know** the values of $\alpha$, $\beta$,
or $\sigma$, we set prior distributions over them.

Typically, when setting priors for linear models, we **assume** that
the priors are **independent**. Because the priors are independent, we
model the problem using three different priors instead of a single,
joint prior.

Additionally, because $\sigma$ is a positive number, it is common
to use the distribution:

- HalfNormal
- Exponential
- HalfCauchy
- And so on

The values for the intercept can vary widely from one problem to
another and for different domains. In the experience of the author,

- $\alpha$ is usually centered around 0 and has a standard deviation
no larger than 1
- It is easier to have an informed prior for the slope, $\beta$
- For $\sigma$, we can set it to a large value on the scale of $Y$
For example, twice the value of its standard deviation.

We should be cautious of using observed data to determine
("guesstimate") priors. It is usually fine to use the observed data
if we want to avoid very restrictive priors; however, a more general
principle is that if we don't have much knowledge of a parameter,
it makes sense to ensure that our prior is **vague**.

How do we make our priors more informative? We need to get informative
priors from our **domain knowledge**.

**Extending the Normal Model**

In summary, "a linear regression model is an extension of the
Normal model where the mean is computed as a linear function of a
predictor variable."

## 4.2 Linear bikes

We now have a general idea of Bayesian linear models. Let's try
to cement these ideas with an example.

We have a record of temperatures and the number of bikes rented in a
city. We want to model the relationship between temperature and the
number of bikes rented.

Here's a scatter plot of these two variables from the bike sharing
dataset from the UCI Machine Learning Repository.

The full dataset contains 17,379 records. Each record has 17 variables.

We use a smaller dataset: 359 records and only two variables:
`temperature` (in degrees Celsius) and `rented` (the number of
rented bikes.

In [None]:
# Begin with our general imports
import cytoolz.curried as ctc

In [None]:
# Import our general data analysis tools
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
# Import our SciPy tools
from scipy.interpolate import PchipInterpolator
from scipy.stats import linregress

In [None]:
# Import PyMC and ancillary tools
import arviz as az
import pymc as pm
import preliz as pz
import xarray as xr

In [None]:
# Set values for plotting and for generating random variables
az.style.use('arviz-grayscale')

# Plotting defaults
from cycler import cycler
default_cycler = cycler(color=['#000000', '#6a6a6a', '#bebebe', '#2a2eec'])
plt.rc('axes', prop_cycle=default_cycler)
plt.rc('figure', dpi=300)

# Set a random seed
rng = np.random.default_rng(seed=123)

Let's create a scatter plot of `rented` versus `temperature`.

In [None]:
bikes = pd.read_csv('./data/bikes.csv')
bikes.plot(x='temperature', y='rented', figsize=(12, 3), kind='scatter')
plt.show()

By "squinting", one can visualize a linear relationship between
the number of bikes rented and outdoor temperature; however, we
want to understand that relationship better.

For our first model, we'll create a linear model using PyMC.

In [None]:
with pm.Model() as model_lb:
    alpha = pm.Normal('alpha', mu=0, sigma=100) ## Very flat
    beta = pm.Normal('beta', mu=0, sigma=10) ## Pretty flat
    sigma = pm.HalfCauchy('sigma', 10)
    mu = pm.Deterministic('mu', alpha + beta * bikes.temperature)
    y_pred = pm.Normal('y_pred', mu=mu, sigma=sigma, observed=bikes.rented)
    idata_lb = pm.sample()

Here's a Kruschke diagram of a proposed model

In [None]:
pm.model_to_graphviz(model_lb)

This model is like a Normal model; however, the mean is modeled
as a **linear function** of the temperature. The intercept of the
linear model is $\alpha$, the slope of the linear model is $\beta$,
and the "noise" term is $\sigma$.

The new aspect to this model is that the model for $\mu$ is
**deterministic**; that is, given values for $\alpha$ and $\beta$
(and our temperature data), the value for $\mu$ is computed.

This "deterministic variable" technique may seem useless; however,
by defining this variable in our model (even though it is "unnecessary")
we can include it in our `InferenceData` for later use.