## Bayesian techniques

In Bayesian statistics, the main idea is to make certain assumptions about the probability distributions of a model's parameters before being fitted on data. These initial distribution assumptions are called priors for the model's parameters.

In a Bayesian ridge regression model, there are two hyperparameters to optimize: α and λ.
- The α hyperparameter serves the same exact purpose as it does for regular ridge regression; namely, it acts as a scaling factor for the penalty term.
- The λ hyperparameter acts as the precision(https://en.wikipedia.org/wiki/Precision_(statistics)) of the model's weights. Basically, the smaller the λ value, the greater the variance between the individual weight values.

## Hyperparameter priors

Both the α and λ hyperparameters have gamma distribution(https://en.wikipedia.org/wiki/Gamma_distribution) priors, meaning we assume both values come from a gamma probability distribution.

There's no need to know the specifics of a gamma distribution, other than the fact that it's a probability distribution defined by a shape parameter(https://en.wikipedia.org/wiki/Shape_parameter) and scale parameter(https://en.wikipedia.org/wiki/Scale_parameter).

Specifically, the α hyperparameter has prior:

Γ(α
​1
​​ ,α
​2
​​ )

and the λ hyperparameter has prior:

Γ(λ
​1
​​ ,λ
​2
​​ )

where Γ(k, θ) represents a gamma distribution with shape parameter k and scale parameter θ.

## Tuning the model

When finding the optimal weight settings of a Bayesian ridge regression model for an input dataset, we also concurrently optimize the α and λ hyperparameters based on their prior distributions and the input data.

This can all be done with the BayesianRidge object (part of the linear_model module). Like all the previous regression objects, this one can be initialized with no required arguments.

We can manually specify the α1 and α2 gamma parameters for α with the alpha_1 and alpha_2 keyword arguments when initializing BayesianRidge. Similarly, we can manually set λ1 and λ2 with the lambda_1 and lambda_2 keyword arguments. The default value for each of the four gamma parameters is 10-6.

In [3]:
from sklearn import linear_model
import numpy as np

data = np.array([
 [5.1,3.5,1.4,0.2],
 [4.9,3., 1.4,0.2],
 [4.7,3.2,1.3,0.2],
 [4.6,3.1,1.5,0.2],
 [5., 3.6,1.4,0.2],
 [5.4,3.9,1.7,0.4],
 [4.6,3.4,1.4,0.3],
 [5., 3.4,1.5,0.2],
 [4.4,2.9,1.4,0.2],
 [4.9,3.1,1.5,0.1],
 [5.4,3.7,1.5,0.2],
 [4.8,3.4,1.6,0.2],
 [4.8,3., 1.4,0.1],
 [4.3,3., 1.1,0.1],
 [5.8,4., 1.2,0.2],
 [5.7,4.4,1.5,0.4],
 [5.4,3.9,1.3,0.4],
 [5.1,3.5,1.4,0.3],
 [5.7,3.8,1.7,0.3],
 [5.1,3.8,1.5,0.3],
 [5.4,3.4,1.7,0.2],
 [5.1,3.7,1.5,0.4],
 [4.6,3.6,1., 0.2],
 [5.1,3.3,1.7,0.5],
 [4.8,3.4,1.9,0.2],
 [5., 3., 1.6,0.2],
 [5., 3.4,1.6,0.4],
 [5.2,3.5,1.5,0.2],
 [5.2,3.4,1.4,0.2],
 [4.7,3.2,1.6,0.2],
 [4.8,3.1,1.6,0.2],
 [5.4,3.4,1.5,0.4],
 [5.2,4.1,1.5,0.1],
 [5.5,4.2,1.4,0.2],
 [4.9,3.1,1.5,0.2],
 [5., 3.2,1.2,0.2],
 [5.5,3.5,1.3,0.2],
 [4.9,3.6,1.4,0.1],
 [4.4,3., 1.3,0.2],
 [5.1,3.4,1.5,0.2],
 [5., 3.5,1.3,0.3],
 [4.5,2.3,1.3,0.3],
 [4.4,3.2,1.3,0.2],
 [5., 3.5,1.6,0.6],
 [5.1,3.8,1.9,0.4],
 [4.8,3., 1.4,0.3],
 [5.1,3.8,1.6,0.2],
 [4.6,3.2,1.4,0.2],
 [5.3,3.7,1.5,0.2],
 [5., 3.3,1.4,0.2],
 [7., 3.2,4.7,1.4],
 [6.4,3.2,4.5,1.5],
 [6.9,3.1,4.9,1.5],
 [5.5,2.3,4., 1.3],
 [6.5,2.8,4.6,1.5],
 [5.7,2.8,4.5,1.3],
 [6.3,3.3,4.7,1.6],
 [4.9,2.4,3.3,1. ],
 [6.6,2.9,4.6,1.3],
 [5.2,2.7,3.9,1.4],
 [5., 2., 3.5,1. ],
 [5.9,3., 4.2,1.5],
 [6., 2.2,4., 1. ],
 [6.1,2.9,4.7,1.4],
 [5.6,2.9,3.6,1.3],
 [6.7,3.1,4.4,1.4],
 [5.6,3., 4.5,1.5],
 [5.8,2.7,4.1,1. ],
 [6.2,2.2,4.5,1.5],
 [5.6,2.5,3.9,1.1],
 [5.9,3.2,4.8,1.8],
 [6.1,2.8,4., 1.3],
 [6.3,2.5,4.9,1.5],
 [6.1,2.8,4.7,1.2],
 [6.4,2.9,4.3,1.3],
 [6.6,3., 4.4,1.4],
 [6.8,2.8,4.8,1.4],
 [6.7,3., 5., 1.7],
 [6., 2.9,4.5,1.5],
 [5.7,2.6,3.5,1. ],
 [5.5,2.4,3.8,1.1],
 [5.5,2.4,3.7,1. ],
 [5.8,2.7,3.9,1.2],
 [6., 2.7,5.1,1.6],
 [5.4,3., 4.5,1.5],
 [6., 3.4,4.5,1.6],
 [6.7,3.1,4.7,1.5],
 [6.3,2.3,4.4,1.3],
 [5.6,3., 4.1,1.3],
 [5.5,2.5,4., 1.3],
 [5.5,2.6,4.4,1.2],
 [6.1,3., 4.6,1.4],
 [5.8,2.6,4., 1.2],
 [5., 2.3,3.3,1. ],
 [5.6,2.7,4.2,1.3],
 [5.7,3., 4.2,1.2],
 [5.7,2.9,4.2,1.3],
 [6.2,2.9,4.3,1.3],
 [5.1,2.5,3., 1.1],
 [5.7,2.8,4.1,1.3],
 [6.3,3.3,6., 2.5],
 [5.8,2.7,5.1,1.9],
 [7.1,3., 5.9,2.1],
 [6.3,2.9,5.6,1.8],
 [6.5,3., 5.8,2.2],
 [7.6,3., 6.6,2.1],
 [4.9,2.5,4.5,1.7],
 [7.3,2.9,6.3,1.8],
 [6.7,2.5,5.8,1.8],
 [7.2,3.6,6.1,2.5],
 [6.5,3.2,5.1,2. ],
 [6.4,2.7,5.3,1.9],
 [6.8,3., 5.5,2.1],
 [5.7,2.5,5., 2. ],
 [5.8,2.8,5.1,2.4],
 [6.4,3.2,5.3,2.3],
 [6.5,3., 5.5,1.8],
 [7.7,3.8,6.7,2.2],
 [7.7,2.6,6.9,2.3],
 [6., 2.2,5., 1.5],
 [6.9,3.2,5.7,2.3],
 [5.6,2.8,4.9,2. ],
 [7.7,2.8,6.7,2. ],
 [6.3,2.7,4.9,1.8],
 [6.7,3.3,5.7,2.1],
 [7.2,3.2,6., 1.8],
 [6.2,2.8,4.8,1.8],
 [6.1,3., 4.9,1.8],
 [6.4,2.8,5.6,2.1],
 [7.2,3., 5.8,1.6],
 [7.4,2.8,6.1,1.9],
 [7.9,3.8,6.4,2. ],
 [6.4,2.8,5.6,2.2],
 [6.3,2.8,5.1,1.5],
 [6.1,2.6,5.6,1.4],
 [7.7,3., 6.1,2.3],
 [6.3,3.4,5.6,2.4],
 [6.4,3.1,5.5,1.8],
 [6., 3., 4.8,1.8],
 [6.9,3.1,5.4,2.1],
 [6.7,3.1,5.6,2.4],
 [6.9,3.1,5.1,2.3],
 [5.8,2.7,5.1,1.9],
 [6.8,3.2,5.9,2.3],
 [6.7,3.3,5.7,2.5],
 [6.7,3., 5.2,2.3],
 [6.3,2.5,5., 1.9],
 [6.5,3., 5.2,2. ],
 [6.2,3.4,5.4,2.3],
 [5.9,3., 5.1,1.8]
])

labels = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
2,2])

print('Data shape: {}\n'.format(data.shape))
print('Labels shape: {}\n'.format(labels.shape))

reg = linear_model.BayesianRidge()
reg.fit(data, labels)

print('Coefficients: {}\n'.format(repr(reg.coef_)))
print('Intercept: {}\n'.format(reg.intercept_))
print('R2: {}\n'.format(reg.score(data, labels)))
print('Alpha: {}\n'.format(reg.alpha_))
print('Lambda: {}\n'.format(reg.lambda_))

Data shape: (150, 4)

Labels shape: (150,)

Coefficients: array([-0.11362625, -0.03526763,  0.24468776,  0.57300547])

Intercept: 0.16501980374056524

R2: 0.9303174820768508

Alpha: 20.97570570114465

Lambda: 9.533562071762871

