# Testing BNN Stacking
### Evan Edwards
#### The Basic structure of this document is as follows:


## Note: BNN for weight calculation still needs work 


#### Notes:
 - A simple BNN performs regression better than any of the created HLMS - (given these arbitrarily chosen input variables)
    - May want to inquire on a hierarchial BNN to further improve predictions - aleady done, but stacking and use in large-scale clustered data is unseen
    - A deep BNN performs very well, maybe also expirement with a hierarhcial structure - also use with mixing methods
    - Still need to get credible intervals 
 - Need to implement 90/10 cross-validation to test for generalization and overfitting


 #### Ideas
  - Hierarchial BNNS exist already - maybe use an ensemble method on a collection of them?
    - Maybe use them in a study similar to the PISA study
  - Work on a sum of neural networks stacking model - 
    - w/ varying structures
    - Still need to flesh out this idea
    - Could also potentially be used for regression - very similar to random forest
  - Use BART on hierarichal Deep BNNS - may yield accurate results, yet be costly

#### Imports

In [1]:
import time
import bambi as bmb
import numpy as np
import pandas as pd
import arviz as az
from scipy.stats import zscore
from numpy import mean, std
import matplotlib.pyplot as plt
from math import sqrt
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

In [2]:
import torch
import pyro
import torch.nn as nn
import pyro.distributions as dist
from pyro.nn import PyroModule, PyroSample
from pyro.infer.autoguide import AutoDiagonalNormal
from pyro.infer import SVI, Trace_ELBO, Predictive
from tqdm.auto import trange, tqdm
from pyro.infer import MCMC, NUTS


  from .autonotebook import tqdm as notebook_tqdm


In [3]:
from Taweret.mix.gaussian import Multivariate
from Taweret.core.base_model import BaseModel

In [4]:
# Fixed random seed to ensure reproducibility and the possiblility for optimization
RANDOM_SEED = 9572404
rng = np.random.default_rng(RANDOM_SEED)


Standardize data - z-score -> rescale

#### Data Processing

In [5]:
# Load dataset
PISA2018 = pd.read_csv("pisa2018.BayesBook.csv")

In [6]:
# Data processing: converting categorical values to numerical values
PISA2018['Female'] = PISA2018['Female'].replace({'Female': 1, 'Male': 0})
# Converting numerical to categorical values
PISA2018['SchoolID'] = pd.Categorical(PISA2018['SchoolID'])

  PISA2018['Female'] = PISA2018['Female'].replace({'Female': 1, 'Male': 0})


In [7]:
# Standardization
PISA2018["PV1READ_unscaled"] = PISA2018["PV1READ"] 
PISA2018["PV1READ"] = zscore(PISA2018["PV1READ"])
PISA2018_train, PISA2018_test = train_test_split(PISA2018, test_size=0.1, random_state=RANDOM_SEED)

In [8]:
def RMSE(pred, true):
    rescaled_pred = pred * std(true) + mean(true)
    return np.sqrt(np.sum(np.power(np.subtract(true, rescaled_pred),2))/len(true))

#### Model Definitions

In [9]:
%%time
#PV1READ ~ Female + ESCS + HOMEPOS + ICTRES + (1 + ICTRES | SchoolID)
model1 = bmb.Model("PV1READ ~ Female + ESCS + HOMEPOS + ICTRES + (1 + ICTRES | SchoolID)", PISA2018_train, categorical = ["SchoolID"])
priors = {"Intercept": bmb.Prior("Normal", mu=0, sigma=100),
          "Female": bmb.Prior("Normal", mu=0, sigma=10),
          "ESCS": bmb.Prior("Normal", mu=np.mean(PISA2018_train["ESCS"]), sigma=np.std(PISA2018_train["ESCS"])),
          "HOMEPOS": bmb.Prior("Normal", mu=np.mean(PISA2018_train["HOMEPOS"]), sigma=100),
          "ICTRES": bmb.Prior("Normal", mu=np.mean(PISA2018_train["ICTRES"]), sigma=np.std(PISA2018_train["ICTRES"])),
          "1|SchoolID": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("HalfNormal", sigma=100)),
          "ICTRES|SchoolID": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("HalfNormal", sigma=100)),
          "sigma": bmb.Prior("HalfNormal", sigma=10)}
model1.set_priors(priors = priors)

trace1 = model1.fit(draws=2000, random_seed=RANDOM_SEED)

post_pred1 = model1.predict(trace1,data = PISA2018_train, inplace=False).posterior["PV1READ_mean"]
mean_pred1 = np.array(post_pred1.mean(dim=["chain", "draw"]))
print(f'The RMSE for model 1 - train set is: {RMSE(mean_pred1, PISA2018_train["PV1READ_unscaled"])}')

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [PV1READ_sigma, Intercept, Female, ESCS, HOMEPOS, ICTRES, 1|SchoolID_sigma, 1|SchoolID_offset, ICTRES|SchoolID_sigma, ICTRES|SchoolID_offset]


Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 114 seconds.


The RMSE for model 1 - train set is: 93.82698991880818
CPU times: total: 25.5 s
Wall time: 2min 14s


In [10]:
%%time
#PV1READ ~ JOYREAD + PISADIFF + SCREADCOMP + SCREADDIFF + (1|SchoolID)
model2 = bmb.Model("PV1READ ~ JOYREAD + PISADIFF + SCREADCOMP + SCREADDIFF + (1|SchoolID)", PISA2018_train, categorical = ["SchoolID"])

priors = {"Intercept": bmb.Prior("Normal", mu=0, sigma=100),
          "JOYREAD": bmb.Prior("Normal", mu=np.mean(PISA2018_train["JOYREAD"]), sigma=np.std(PISA2018_train["JOYREAD"])),
          "PISADIFF": bmb.Prior("Normal", mu=0, sigma=100),
          "SCREADCOMP": bmb.Prior("Normal", mu=np.mean(PISA2018_train["SCREADCOMP"]), sigma=10),
          "SCREADDIFF": bmb.Prior("Normal", mu=np.mean(PISA2018_train["SCREADDIFF"]), sigma=np.std(PISA2018_train["SCREADDIFF"])),
          "1|SchoolID": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("HalfNormal", sigma=100)),
          "sigma": bmb.Prior("HalfNormal", sigma=10)}
model2.set_priors(priors = priors)

trace2 = model2.fit(draws=2000, random_seed=RANDOM_SEED)

post_pred2 = model2.predict(trace2,data = PISA2018_train, inplace=False).posterior["PV1READ_mean"]
mean_pred2 = np.array(post_pred2.mean(dim=["chain", "draw"]))
print(f'The RMSE for model 2 - train set is: {RMSE(mean_pred2, PISA2018_train["PV1READ_unscaled"])}')

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [PV1READ_sigma, Intercept, JOYREAD, PISADIFF, SCREADCOMP, SCREADDIFF, 1|SchoolID_sigma, 1|SchoolID_offset]


Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 90 seconds.


The RMSE for model 2 - train set is: 85.96626975169873
CPU times: total: 21.5 s
Wall time: 1min 38s


In [11]:
%%time
#PV1READ ~ METASUM + GFOFAIL + MASTGOAL + SWBP + WORKMAST + ADAPTIVITY + COMPETE + (1|SchoolID)
model3 = bmb.Model("PV1READ ~ METASUM + GFOFAIL + MASTGOAL + SWBP + WORKMAST + ADAPTIVITY + COMPETE + (1|SchoolID)", PISA2018_train, categorical = ["SchoolID"])

priors = {"Intercept": bmb.Prior("Normal", mu=0, sigma=100),
          "METASUM": bmb.Prior("Normal", mu=np.mean(PISA2018_train["METASUM"]), sigma=np.std(PISA2018_train["METASUM"])),
          "GFOFAIL": bmb.Prior("Normal", mu=0, sigma=100),
          "MASTGOAL": bmb.Prior("Normal", mu=np.mean(PISA2018_train["MASTGOAL"]), sigma=10),
          "SWBP": bmb.Prior("Normal", mu=0, sigma=100),
          "WORKMAST": bmb.Prior("Normal", mu=np.mean(PISA2018_train["WORKMAST"]), sigma=10),
          "ADAPTIVITY": bmb.Prior("Normal", mu=np.mean(PISA2018_train["ADAPTIVITY"]), sigma=100),
          "COMPETE": bmb.Prior("Normal", mu=np.mean(PISA2018_train["COMPETE"]), sigma=np.std(PISA2018_train["COMPETE"])),
          "1|SchoolID": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("HalfNormal", sigma=100)),
          "sigma": bmb.Prior("HalfNormal", sigma=10)}
model3.set_priors(priors = priors)

trace3 = model3.fit(draws=2000, random_seed=RANDOM_SEED)

post_pred3 = model3.predict(trace3,data = PISA2018_train, inplace=False).posterior["PV1READ_mean"]
mean_pred3 = np.array(post_pred3.mean(dim=["chain", "draw"]))
print(f'The RMSE for model 3 - train set is: {RMSE(mean_pred3, PISA2018_train["PV1READ_unscaled"])}')

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [PV1READ_sigma, Intercept, METASUM, GFOFAIL, MASTGOAL, SWBP, WORKMAST, ADAPTIVITY, COMPETE, 1|SchoolID_sigma, 1|SchoolID_offset]


Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 96 seconds.


The RMSE for model 3 - train set is: 88.39194098260263
CPU times: total: 23.2 s
Wall time: 1min 45s


In [12]:
%%time
#PV1READ ~ PERFEED + TEACHINT + BELONG + (1 + TEACHINT | SchoolID)
model4 = bmb.Model("PV1READ ~ PERFEED + TEACHINT + BELONG + (1 + TEACHINT | SchoolID)", PISA2018_train, categorical = ["SchoolID"])

priors = {"Intercept": bmb.Prior("Normal", mu=0, sigma=100),
          "PERFEED": bmb.Prior("Normal", mu=np.mean(PISA2018_train["PERFEED"]), sigma=np.std(PISA2018_train["PERFEED"])),
          "TEACHINT": bmb.Prior("Normal", mu=np.mean(PISA2018_train["TEACHINT"]), sigma=np.std(PISA2018_train["TEACHINT"])),
          "BELONG": bmb.Prior("Normal", mu=np.mean(PISA2018_train["BELONG"]), sigma=100),
          "1|SchoolID": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("HalfNormal", sigma=100)),
          "TEACHINT|SchoolID": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("HalfNormal", sigma=100)),
          "sigma": bmb.Prior("HalfNormal", sigma=10)}
model4.set_priors(priors = priors)

trace4 = model4.fit(draws=2000, random_seed=RANDOM_SEED)

post_pred4 = model4.predict(trace4,data = PISA2018_train, inplace=False).posterior["PV1READ_mean"]
mean_pred4 = np.array(post_pred4.mean(dim=["chain", "draw"]))
print(f'The RMSE for model 4 - train set is: {RMSE(mean_pred4, PISA2018_train["PV1READ_unscaled"])}')

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [PV1READ_sigma, Intercept, PERFEED, TEACHINT, BELONG, 1|SchoolID_sigma, 1|SchoolID_offset, TEACHINT|SchoolID_sigma, TEACHINT|SchoolID_offset]


Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 107 seconds.


The RMSE for model 4 - train set is: 95.01608735134099
CPU times: total: 23.8 s
Wall time: 1min 57s


#### Prediction Processing

In [13]:
X = torch.tensor(np.array([mean_pred1, mean_pred2, mean_pred3, mean_pred4])).float()
y = torch.tensor(np.array(PISA2018_train["PV1READ"])).float()

#### Simple BNN Definition, Used as an indivudal regressor

In [14]:
# A set of standardised inputs for prediction, abritrarily chosen
inputs_features = torch.tensor(np.array([zscore(PISA2018_train["Female"]),zscore(PISA2018_train["ESCS"]), zscore(PISA2018_train["HOMEPOS"]), zscore(PISA2018_train["ICTRES"])])).float()

In [16]:
#https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/DL2/Bayesian_Neural_Networks/dl2_bnn_tut1_students_with_answers.html
class BNNSimple(PyroModule):
    def __init__(self, in_dim=len(inputs_features), out_dim=1, hid_dim=6, prior_scale=10.):
        super().__init__()

        self.activation = nn.ReLU()
        self.layer1 = PyroModule[nn.Linear](in_dim, hid_dim)
        self.layer2 = PyroModule[nn.Linear](hid_dim, out_dim)

        self.layer1.weight = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim, in_dim]).to_event(2))
        self.layer1.bias = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim]).to_event(1))
        self.layer2.weight = PyroSample(dist.Normal(0., prior_scale).expand([out_dim, hid_dim]).to_event(2))
        self.layer2.bias = PyroSample(dist.Normal(0., prior_scale).expand([out_dim]).to_event(1))

    def forward(self, x, y=None):
        x = x.reshape(-1, len(inputs_features))
        x = self.activation(self.layer1(x))
        mu = self.layer2(x).squeeze()
        sigma = pyro.sample("sigma", dist.HalfNormal(10)) 

        with pyro.plate("data", x.shape[0]):
            obs = pyro.sample("obs", dist.Normal(mu, sigma * sigma), obs=y)
        return mu

model = BNNSimple()
nuts_kernel = NUTS(model, jit_compile=False)
mcmc = MCMC(nuts_kernel, num_samples=200)
mcmc.run(inputs_features, y)    

predictive = Predictive(model=model, posterior_samples=mcmc.get_samples())
preds = predictive(X) 

y_pred = preds['obs'].T.detach().numpy().mean(axis=1)
y_std = preds['obs'].T.detach().numpy().std(axis=1)
print(f'The RMSE (IS) is: {RMSE(PISA2018_train["PV1READ_unscaled"], y_pred)}')

Sample: 100%|██████████| 400/400 [31:07,  4.67s/it, step size=2.48e-03, acc. prob=0.877]


The RMSE (IS) is: 36.73967067068268


The results are better than any individual HLM, may want to inquiry into a hierarchial model -> then use in stacking

#### Deep BNN Definition, Used as an individual regressor

In [17]:
class BNNDEEP(PyroModule):
    def __init__(self, in_dim=len(inputs_features), out_dim=1, hid_dim=6, n_hid_layers=2, prior_scale=10):
        super().__init__()
 
        self.activation = nn.Tanh()

        self.layer_sizes = [in_dim] + n_hid_layers * [hid_dim] + [out_dim]
        layer_list = [PyroModule[nn.Linear](self.layer_sizes[idx - 1], self.layer_sizes[idx]) for idx in
                      range(1, len(self.layer_sizes))]
        self.layers = PyroModule[torch.nn.ModuleList](layer_list)

        for layer_idx, layer in enumerate(self.layers):
            layer.weight = PyroSample(dist.Normal(0., prior_scale).expand(
                [self.layer_sizes[layer_idx + 1], self.layer_sizes[layer_idx]]).to_event(2))
            layer.bias = PyroSample(dist.Normal(0., prior_scale).expand([self.layer_sizes[layer_idx + 1]]).to_event(1))

    def forward(self, x, y=None):
        x = x.reshape(-1, len(inputs_features))
        x = self.activation(self.layers[0](x))
        for layer in self.layers[1:-1]:
            x = self.activation(layer(x))
        mu = self.layers[-1](x).squeeze()
        sigma = pyro.sample("sigma", dist.HalfNormal(10))

        with pyro.plate("data", x.shape[0]):
            obs = pyro.sample("obs", dist.Normal(mu, sigma * sigma), obs=y)
        return mu
    

model = BNNDEEP()
nuts_kernel = NUTS(model, jit_compile=False)
mcmc = MCMC(nuts_kernel, num_samples=200)
mcmc.run(inputs_features, y)

predictive = Predictive(model=model, posterior_samples=mcmc.get_samples())
preds = predictive(inputs_features)

y_pred = preds['obs'].T.detach().numpy().mean(axis=1)
y_std = preds['obs'].T.detach().numpy().std(axis=1)
print(f'The RMSE (IS) is: {RMSE(PISA2018_train["PV1READ_unscaled"], y_pred)}')

Sample: 100%|██████████| 400/400 [40:06,  6.02s/it, step size=1.40e-03, acc. prob=0.768]


The RMSE (IS) is: 36.76303833717518


#### Simple BNN Definition, Used as a meta-learner - the model outputs are used as the BNN inputs

In [18]:
#https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/DL2/Bayesian_Neural_Networks/dl2_bnn_tut1_students_with_answers.html
class BNN1(PyroModule):
    def __init__(self, in_dim=4, out_dim=1, hid_dim=8, prior_scale=10.):
        super().__init__()

        self.activation = nn.Tanh()
        self.layer1 = PyroModule[nn.Linear](in_dim, hid_dim)
        self.layer2 = PyroModule[nn.Linear](hid_dim, out_dim)

        self.layer1.weight = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim, in_dim]).to_event(2))
        self.layer1.bias = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim]).to_event(1))
        self.layer2.weight = PyroSample(dist.Normal(0., prior_scale).expand([out_dim, hid_dim]).to_event(2))
        self.layer2.bias = PyroSample(dist.Normal(0., prior_scale).expand([out_dim]).to_event(1))

    def forward(self, x, y=None):
        x = x.reshape(-1, 4)
        x = self.activation(self.layer1(x))
        mu = self.layer2(x).squeeze()
        sigma = pyro.sample("sigma", dist.HalfNormal(10))  # Infer the response noise

        with pyro.plate("data", x.shape[0]):
            obs = pyro.sample("obs", dist.Normal(mu, sigma * sigma), obs=y)
        return mu

model = BNN1()
nuts_kernel = NUTS(model, jit_compile=False)
mcmc = MCMC(nuts_kernel, num_samples=200)
mcmc.run(X, y)

predictive = Predictive(model=model, posterior_samples=mcmc.get_samples())
preds = predictive(X)

y_pred = preds['obs'].T.detach().numpy().mean(axis=1)
y_std = preds['obs'].T.detach().numpy().std(axis=1)
print(f'The RMSE (IS) is: {RMSE(PISA2018_train["PV1MATH"], y_pred)}')

Sample: 100%|██████████| 400/400 [38:02,  5.71s/it, step size=3.51e-03, acc. prob=0.801]


The RMSE (IS) is: 34.980526351430036


#### Deep BNN Definition, Used as a meta-learner

In [179]:
class BNN2(PyroModule):
    def __init__(self, in_dim=4, out_dim=1, hid_dim=6, n_hid_layers=3, prior_scale=10.):
        super().__init__()
 
        self.activation = nn.Tanh()

        self.layer_sizes = [in_dim] + n_hid_layers * [hid_dim] + [out_dim]
        layer_list = [PyroModule[nn.Linear](self.layer_sizes[idx - 1], self.layer_sizes[idx]) for idx in
                      range(1, len(self.layer_sizes))]
        self.layers = PyroModule[torch.nn.ModuleList](layer_list)

        for layer_idx, layer in enumerate(self.layers):
            layer.weight = PyroSample(dist.Normal(0., prior_scale).expand(
                [self.layer_sizes[layer_idx + 1], self.layer_sizes[layer_idx]]).to_event(2))
            layer.bias = PyroSample(dist.Normal(0., prior_scale).expand([self.layer_sizes[layer_idx + 1]]).to_event(1))

    def forward(self, x, y=None):
        x = x.reshape(-1, 4)
        x = self.activation(self.layers[0](x))
        for layer in self.layers[1:-1]:
            x = self.activation(layer(x))
        mu = self.layers[-1](x).squeeze()
        sigma = pyro.sample("sigma", dist.HalfNormal(10.))

        with pyro.plate("data", x.shape[0]):
            obs = pyro.sample("obs", dist.Normal(mu, sigma * sigma), obs=y)
        return mu
    

model = BNN2()
nuts_kernel = NUTS(model, jit_compile=False)
mcmc = MCMC(nuts_kernel, num_samples=200)
mcmc.run(X, y)

predictive = Predictive(model=model, posterior_samples=mcmc.get_samples())
preds = predictive(X)

y_pred = preds['obs'].T.detach().numpy().mean(axis=1)
y_std = preds['obs'].T.detach().numpy().std(axis=1)
print(f'The RMSE (IS) is: {RMSE(PISA2018_train["PV1MATH"], y_pred)}')

Warmup:   0%|          | 0/4 [12:34:20, ?it/s]s/it, step size=1.48e-03, acc. prob=0.781]
Warmup:  50%|█████     | 10/20 [12:32:27, 4514.77s/it, step size=3.67e-04, acc. prob=0.898]
Warmup:   0%|          | 0/4 [12:15:20, ?it/s]
Warmup:   0%|          | 0/4 [12:15:19, ?it/s]
Warmup:   0%|          | 0/4 [12:14:42, ?it/s]
Warmup:   0%|          | 0/4 [12:12:03, ?it/s]
Sample: 100%|██████████| 400/400 [51:11,  7.68s/it, step size=1.48e-03, acc. prob=0.788]


The RMSE (IS) is: 35.11588520738445


BNNS Work better as individual regressors - at least w/ these tested configs

#### BNN - Stacking

In [82]:
class BNNSimpleStack(PyroModule):
    def __init__(self, in_dim=4, out_dim=1, hid_dim=4, prior_scale=10.):
        super().__init__()

        self.activation = nn.ReLU()
        self.layer1 = PyroModule[nn.Linear](in_dim, hid_dim)
        self.layer2 = PyroModule[nn.Linear](hid_dim, out_dim)

        self.weight_layer = nn.Linear(in_dim, in_dim, bias=False)

        self.layer1.weight = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim, in_dim]).to_event(2))
        self.layer1.bias = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim]).to_event(1))
        self.layer2.weight = PyroSample(dist.Normal(0., prior_scale).expand([out_dim, hid_dim]).to_event(2))
        self.layer2.bias = PyroSample(dist.Normal(0., prior_scale).expand([out_dim]).to_event(1))

    def forward(self, x, y=None):
        x = x.reshape(-1, len(x))
        x = self.activation(self.layer1(x))
        weights = torch.softmax(self.weight_layer.weight, dim=1)
        weighted_inputs = torch.matmul(x, weights.t())
        mu = self.layer2(weighted_inputs).squeeze()
        sigma = pyro.sample("sigma", dist.HalfNormal(10)) 

        with pyro.plate("data", x.shape[0]):
            obs = pyro.sample("obs", dist.Normal(mu, sigma * sigma), obs=y)
        return mu

model = BNNSimpleStack()
nuts_kernel = NUTS(model, jit_compile=False)
mcmc = MCMC(nuts_kernel, num_samples=2)
mcmc.run(X, y)

predictive = Predictive(model=model, posterior_samples=mcmc.get_samples())
preds = predictive(X)

y_pred = preds['obs'].T.detach().numpy().mean(axis=1)
y_std = preds['obs'].T.detach().numpy().std(axis=1)
print(f'The RMSE (IS) is: {RMSE(PISA2018_train["PV1MATH"], y_pred)}')

Sample: 100%|██████████| 4/4 [00:00, 45.71it/s, step size=1.12e-01, acc. prob=0.000]

The RMSE (IS) is: 916.7573435729306





In [178]:
class BNNSimpleStack(PyroModule):
    def __init__(self, in_dim=4, out_dim=1, hid_dim=6, prior_scale=10.):
        super().__init__()

        self.activation = nn.Tanh()
        self.layer1 = PyroModule[nn.Linear](in_dim, hid_dim)
        self.layer2 = PyroModule[nn.Linear](hid_dim, hid_dim)
        self.layer3 = PyroModule[nn.Linear](hid_dim, in_dim)
        self.weight_layer = PyroModule[nn.Linear](1, in_dim, bias=False)

        self.weight_layer.weight = PyroSample(dist.Normal(0., prior_scale).expand([1, in_dim]).to_event(2))

        self.layer1.weight = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim, in_dim]).to_event(2))
        self.layer1.bias = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim]).to_event(1))
        
        self.layer2.weight = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim, hid_dim]).to_event(2))
        self.layer2.bias = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim]).to_event(1))

        self.layer3.weight = PyroSample(dist.Normal(0., prior_scale).expand([in_dim, hid_dim]).to_event(2))
        self.layer3.bias = PyroSample(dist.Normal(0., prior_scale).expand([in_dim]).to_event(1))


    def forward(self, x, y=None):
        x = x.reshape(-1, len(x))
        x = self.activation(self.layer1(x))
        x = self.activation(self.layer2(x))
        x = self.activation(self.layer3(x)).squeeze()
                            
        weights = torch.softmax(self.weight_layer.weight, dim=1)
        weighted_inputs = torch.matmul(x, weights.t())
        mu = weighted_inputs.sum(dim=1)
        sigma = pyro.sample("sigma", dist.HalfNormal(10)) 

        with pyro.plate("data", x.shape[0]):
            obs = pyro.sample("obs", dist.Normal(mu, sigma * sigma), obs=y)
        return mu

model = BNNSimpleStack()
nuts_kernel = NUTS(model, jit_compile=False)
mcmc = MCMC(nuts_kernel, num_samples=200)
mcmc.run(X, y)

predictive = Predictive(model=model, posterior_samples=mcmc.get_samples())
preds = predictive(X)

y_pred = preds['obs'].T.detach().numpy().mean(axis=1)
y_std = preds['obs'].T.detach().numpy().std(axis=1)
print(f'The RMSE (IS) is: {RMSE(PISA2018_train["PV1MATH"], y_pred)}')

Sample: 100%|██████████| 400/400 [9:49:38, 88.45s/it, step size=1.00e-03, acc. prob=0.897] 


The RMSE (IS) is: 34.91980322421185


In [137]:
samples = mcmc.get_samples()

# Extract weights from the samples
weight_samples = samples['weight_layer.weight']

# Get the mean and standard deviation of the weight samples
weight_mean = weight_samples.mean(dim=0)
weight_std = weight_samples.std(dim=0)

print("Mean of the weight samples:")
print(weight_mean)

print("Standard deviation of the weight samples:")
print(weight_std)
weight_samples

Mean of the weight samples:
tensor([[-0.2911,  1.5804, -0.7624, -0.5089]])
Standard deviation of the weight samples:
tensor([[0., 0., 0., 0.]])


tensor([[[-0.2911,  1.5804, -0.7624, -0.5089]],

        [[-0.2911,  1.5804, -0.7624, -0.5089]]])

In [177]:

class BNNSimpleStack(PyroModule):
    def __init__(self, in_dim=4, out_dim=1, hid_dim=6, prior_scale=10.):
        super().__init__()

        self.activation = nn.Tanh()
        self.layer1 = PyroModule[nn.Linear](in_dim, hid_dim)
        self.layer2 = PyroModule[nn.Linear](hid_dim, in_dim)
        self.weight_layer = PyroModule[nn.Linear](in_dim, in_dim, bias=False)

        self.weight_layer.weight = PyroSample(dist.Normal(0., prior_scale).expand([1, in_dim]).to_event(2))

        self.layer1.weight = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim, in_dim]).to_event(2))
        self.layer1.bias = PyroSample(dist.Normal(0., prior_scale).expand([hid_dim]).to_event(1))
        
        self.layer2.weight = PyroSample(dist.Normal(0., prior_scale).expand([in_dim, hid_dim]).to_event(2))
        self.layer2.bias = PyroSample(dist.Normal(0., prior_scale).expand([in_dim]).to_event(1))


    def forward(self, x, y=None):
        x = x.reshape(-1, len(x))
        x = self.activation(self.layer1(x))
        x = self.activation(self.layer2(x)).squeeze()
                            
        weights = torch.softmax(self.weight_layer.weight, dim=1)
        weighted_inputs = torch.matmul(x, weights.t())
        mu = weighted_inputs.sum(dim=1)
        sigma = pyro.sample("sigma", dist.HalfNormal(10))
        with pyro.plate("data", x.shape[0]):
            obs = pyro.sample("obs", dist.Normal(mu, sigma * sigma), obs=y)
        return weights

model = BNNSimpleStack()
nuts_kernel = NUTS(model, jit_compile=False)
mcmc = MCMC(nuts_kernel, num_samples=40)
mcmc.run(X, y)

predictive = Predictive(model=model, posterior_samples=mcmc.get_samples())
preds = predictive(X)

y_pred = preds['obs'].T.detach().numpy().mean(axis=1)
y_std = preds['obs'].T.detach().numpy().std(axis=1)
print(f'The RMSE (IS) is: {RMSE(PISA2018_train["PV1MATH"], y_pred)}')

Warmup:   0%|          | 0/80 [00:00, ?it/s]

Sample: 100%|██████████| 80/80 [1:17:43, 58.29s/it, step size=7.37e-03, acc. prob=0.685]  

The RMSE (IS) is: 79.52750780956471





#### BNN - Weighting by Variance - WIP

#### Sum-of Neural Networks Model - WIP

#### Stacking with Taweret - Multivariate

#### Stacking with Taweret (BART) - WIP