## The Bayesian Models

Perhaps one of the most interesting functionality in the library is the access to full Bayesian models in almost exactly the same way one would use any of the other models in the library. 

Note however that the Bayesian models are **ONLY** available for tabular data and, at the moment, we do not support combining them to form a Wide and Deep model. 

The implementation in this library is based on the publication: [Weight Uncertainty in Neural Networks](https://arxiv.org/pdf/1505.05424.pdf), by Blundell et al., 2015. Code-wise, our implementation is inspired by a number of source: 

1. https://joshfeldman.net/WeightUncertainty/
2. https://www.nitarshan.com/bayes-by-backprop/
3. https://github.com/piEsposito/blitz-bayesian-deep-learning
4. https://github.com/zackchase/mxnet-the-straight-dope/tree/master/chapter18_variational-methods-and-uncertainty

The two Bayesian models available in the library are: 

1. BayesianWide: this is a linear model where the non-linearities are captured via crossed-columns
2. BayesianMLP: this is a standard MLP that receives categorical embeddings and continuous cols (embedded or not) which are the passed through a series of dense layers. All parameters in the model are probabilistic.

In [1]:
import numpy as np
import torch
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from pytorch_widedeep.metrics import Accuracy
from pytorch_widedeep.datasets import load_adult
from pytorch_widedeep.callbacks import EarlyStopping, ModelCheckpoint
from pytorch_widedeep.preprocessing import TabPreprocessor, WidePreprocessor
from pytorch_widedeep.bayesian_models import BayesianWide, BayesianTabMlp
from pytorch_widedeep.training.bayesian_trainer import BayesianTrainer

  from .autonotebook import tqdm as notebook_tqdm


The first few things to do we know them very well, like with any other model described in any of the other notebooks

In [2]:
df = load_adult(as_frame=True)
df.columns = [c.replace("-", "_") for c in df.columns]
df["age_buckets"] = pd.cut(
    df.age, bins=[16, 25, 30, 35, 40, 45, 50, 55, 60, 91], labels=np.arange(9)
)
df["income_label"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
df.drop("income", axis=1, inplace=True)
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,educational_num,marital_status,occupation,relationship,race,gender,capital_gain,capital_loss,hours_per_week,native_country,age_buckets,income_label
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,0,0
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,3,0
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,1,1
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,4,1
4,18,?,103497,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,0,0


In [3]:
train, test = train_test_split(df, test_size=0.2, stratify=df.income_label)

In [4]:
wide_cols = [
    "age_buckets",
    "education",
    "relationship",
    "workclass",
    "occupation",
    "native_country",
    "gender",
]
crossed_cols = [("education", "occupation"), ("native_country", "occupation")]

cat_embed_cols = [
    "workclass",
    "education",
    "marital_status",
    "occupation",
    "relationship",
    "race",
    "gender",
    "capital_gain",
    "capital_loss",
    "native_country",
]
continuous_cols = ["age", "hours_per_week"]

target = train["income_label"].values

### 1. `BayesianWide`

In [5]:
wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_tab = wide_preprocessor.fit_transform(train)

In [6]:
model = BayesianWide(
    input_dim=np.unique(X_tab).shape[0],
    prior_sigma_1=1.0,
    prior_sigma_2=0.002,
    prior_pi=0.8,
    posterior_mu_init=0,
    posterior_rho_init=-7.0,
    pred_dim=1,  # here the models are NOT passed to a WideDeep constructor class so the output dim MUST be specified
)

In [7]:
trainer = BayesianTrainer(
    model,
    objective="binary",
    optimizer=torch.optim.Adam(model.parameters(), lr=0.01),
    metrics=[Accuracy],
)

In [8]:
trainer.fit(
    X_tab=X_tab,
    target=target,
    val_split=0.2,
    n_epochs=2,
    batch_size=256,
)

epoch 1: 100%|████████████████████████████████████████████████████| 123/123 [00:00<00:00, 159.35it/s, loss=152, metrics={'acc': 0.8099}]
valid: 100%|████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 259.47it/s, loss=136, metrics={'acc': 0.8283}]
epoch 2: 100%|█████████████████████████████████████████████████████| 123/123 [00:00<00:00, 155.27it/s, loss=137, metrics={'acc': 0.834}]
valid: 100%|█████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 262.92it/s, loss=134, metrics={'acc': 0.837}]


### 2. `BayesianTabMlp`

In [9]:
tab_preprocessor = TabPreprocessor(
    cat_embed_cols=cat_embed_cols, continuous_cols=continuous_cols
)
X_tab = tab_preprocessor.fit_transform(train)

In [10]:
model = BayesianTabMlp(
    column_idx=tab_preprocessor.column_idx,
    cat_embed_input=tab_preprocessor.cat_embed_input,
    continuous_cols=continuous_cols,
    #     embed_continuous=True, # as with the TabMlp, you can choose to embed the continuous features
    #     cont_embed_activation="leaky_relu",
    mlp_hidden_dims=[128, 64],
    prior_sigma_1=1.0,
    prior_sigma_2=0.002,
    prior_pi=0.8,
    posterior_mu_init=0,
    posterior_rho_init=-7.0,
    pred_dim=1,
)

In [11]:
trainer = BayesianTrainer(
    model,
    objective="binary",
    optimizer=torch.optim.Adam(model.parameters(), lr=0.01),
    metrics=[Accuracy],
)

In [12]:
trainer.fit(
    X_tab=X_tab,
    target=target,
    val_split=0.2,
    n_epochs=2,
    batch_size=256,
)

epoch 1: 100%|█████████████████████████████████████████████████| 123/123 [00:04<00:00, 28.12it/s, loss=1.95e+3, metrics={'acc': 0.8538}]
valid: 100%|████████████████████████████████████████████████████| 31/31 [00:00<00:00, 178.91it/s, loss=1.72e+3, metrics={'acc': 0.8711}]
epoch 2: 100%|█████████████████████████████████████████████████| 123/123 [00:04<00:00, 28.88it/s, loss=1.71e+3, metrics={'acc': 0.8722}]
valid: 100%|████████████████████████████████████████████████████| 31/31 [00:00<00:00, 182.22it/s, loss=1.68e+3, metrics={'acc': 0.8691}]


These models are powerful beyond the success metrics because they give us a sense of uncertainty as we predict. Let's have a look

In [13]:
X_tab_test = tab_preprocessor.transform(test)

In [14]:
preds = trainer.predict(X_tab_test, return_samples=True, n_samples=5)

predict: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:01<00:00, 29.47it/s]


In [15]:
preds.shape

(5, 9769)

as we can see the prediction have shape `(5, 9769)`, one set of predictions each time we have internally run predict (i.e. sample the network and predict, defined by the parameter `n_samples`). This gives us an idea of how certain the model is about a certain prediction.

Similarly, we could obtain the probabilities

In [16]:
probs = trainer.predict_proba(X_tab_test, return_samples=True, n_samples=5)

predict: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:01<00:00, 28.48it/s]


In [17]:
probs.shape

(5, 9769, 2)

And we could see how the model performs each time we sampled the network

In [18]:
for p in preds:
    print(accuracy_score(p, test["income_label"].values))

0.8699969290613164
0.8690756474562391
0.8689732828334528
0.8693827413245983
0.8687685535878801
