# Demonstration Notebook
This notebook is meant to produce a demo usage of the proposed Staged-aware learning framework. 

In this notebook, we will show the entire pipeline of fitting SAL, including calculating propensity scores and finding stage weights. Since the models will be based on neural networks, the performance will largely depend on the specifications of hyper-parameters (learning rate, hidden layers, activation functions, etc). To tune your own hyperparameters, please refer to the config.json file.

In [1]:
import json
import model

import torch
import torch.utils.data as Data
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

CONFIG = json.load(open('config.json'))

## Define a toy example:
Training set:
- X: [num_samples, num_stages, num_variables]
- A: [num_samples, num_stages] 
- R: [num_samples]

Note that in this demonstration example, the purely random assigned treatments are not related to the covariates or rewards. As a result, we don't expect the proposed methods will perform well. For well-selected examples, please refer to the simulations.

In [2]:
# number of sample, decision points, features
n, T, v = 1000, 5, 20

# set the random seed
torch.manual_seed(0)

# Randon Treatment assignments A_t ~ binom(0.5)
A = torch.randint(0, 2, (n, T)) * 2 - 1

# Define the rewards
reward_beta = torch.randn(T, v)
r = torch.zeros(n, T) # save the rewards

# Define the underlying optimal treatment
optimal_treatment_beta = torch.randn(T, v) # linear decision rule
O = torch.zeros(n, T) # save the ground-true optimal treatments

X = torch.randn(n, T, v)
for t in range(1, T):
    noise = torch.randn(n, v)
    A0_noise, A1_noise = noise[A[:, t-1] == -1], noise[A[:, t-1] == 1]
    X[A[:, t-1] == -1, t] = 0.6 * X[A[:, t-1] == -1, t-1] + 0.8 * A0_noise
    X[A[:, t-1] == 1, t] = 0.8 * X[A[:, t-1] == 1, t-1] + 0.6 * A1_noise
    # optimal decision
    O[:,t] = torch.sign(X[:,t,:] @ optimal_treatment_beta[t,])
    # immediate rewards
    r[:,t] = (X[:,t,:] @ reward_beta[t,] + torch.randn(n)) / 10 + A[:,t] * O[:,t]

R = r.sum(dim=1)

idx = torch.randperm(n)
train_size = int(n * 0.8)
idxTr, idxTe = idx[:train_size], idx[train_size:]

X, A, R, O = X.to(device), A.to(device), R.to(device), O.to(device)
Xtr, Xte, Atr, Ate, Rtr, Rte = X[idxTr], X[idxTe], A[idxTr], A[idxTe], R[idxTr], R[idxTe]

Otr, Ote = O[idxTr], O[idxTe]

## Calculate Propensity
- Calculate the probability of each assignment at each stage

In [3]:
propen = model.PropensityModel(v, T, CONFIG['propensity']['hidden_dim'], 48, device=device).to(device)
model.trainNN(propen, Data.TensorDataset(Xtr, Atr), **CONFIG['propensity'])

Ahat = (propen(X, A) > 0.5).long() * 2 - 1
print("Train accuracy: %.3f%%" % ((Ahat[idxTr] == A[idxTr]).float().mean().item() * 100))
print("Train accuracy: %.3f%%" % ((Ahat[idxTe] == A[idxTe]).float().mean().item() * 100))

pi = propen.calculate_propensity(X, A)
piTr, piTe = pi[idxTr], pi[idxTe]

Epoch 0 (lr:    0.005000):  training-Loss: 4.873
Epoch 50 (lr:    0.004268):  training-Loss: 4.814
Epoch 100 (lr:    0.002500):  training-Loss: 4.783
Epoch 150 (lr:    0.000732):  training-Loss: 4.800
Epoch 200 (lr:    0.000000):  training-Loss: 4.796
Epoch 250 (lr:    0.000732):  training-Loss: 4.786
Train accuracy: 52.300%
Train accuracy: 51.100%


## Stage-Aware Learning (SAL)

In [4]:
from importlib import reload
reload(model)

SAL = model.SALModel(v, T, CONFIG['SAL']['hidden_dim'], 48, device=device).to(device)
model.trainNN(SAL, Data.TensorDataset(Xtr, Atr, Rtr, piTr), R_hat=Rtr.mean(), **CONFIG['SAL'])

print("SAL training accuracy:  %.3f%%" % ((SAL.predict(Xtr) == Otr).float().mean() * 100))
print("SAL testing accuracy:  %.3f%%" % ((SAL.predict(Xte) == Ote).float().mean() * 100))

Epoch 0 (lr:    0.010000):  training-Loss: -21449.387
Epoch 100 (lr:    0.005000):  training-Loss: -31547.859
Epoch 200 (lr:    0.000000):  training-Loss: -31593.971
Epoch 300 (lr:    0.005000):  training-Loss: -31939.652
Epoch 400 (lr:    0.010000):  training-Loss: -33181.480
Epoch 500 (lr:    0.005000):  training-Loss: -33629.250
Epoch 600 (lr:    0.000000):  training-Loss: -33753.637
Epoch 700 (lr:    0.005000):  training-Loss: -33859.547
SAL training accuracy:  65.225%
SAL testing accuracy:  62.200%


### Stage-Weighted Learning (SWL)

In [5]:
# Calculate stage weights
stage_weights_model = model.WeightNNModel(v, T, CONFIG['stage_weights']['hidden_dim'], 48, device=device).to(device)
print("Training the stage weights model:")
model.trainNN(stage_weights_model, Data.TensorDataset(Xtr, Atr, Rtr), **CONFIG['stage_weights'])
sw = stage_weights_model.getWeights()
print("Stage weights:  ", sw, '\n')

# Fit SWL model
SWL = model.SWLModel(v, T, CONFIG['SWL']['hidden_dim'], 48, device=device).to(device)
print("Training SWL: ")
model.trainNN(SWL, Data.TensorDataset(Xtr, Atr, Rtr, piTr), R_hat=Rtr.mean(), sw=sw, **CONFIG['SWL'])

print("SWL training accuracy:  %.3f%%" % ((SWL.predict(Xtr) == Otr).float().mean() * 100))
print("SWL testing accuracy:  %.3f%%" % ((SWL.predict(Xte) == Ote).float().mean() * 100))

Training the stage weights model:
Epoch 0 (lr:    0.001000):  training-Loss: 31.883
Epoch 50 (lr:    0.000854):  training-Loss: 29.786
Epoch 100 (lr:    0.000500):  training-Loss: 30.676
Epoch 150 (lr:    0.000146):  training-Loss: 29.923
Epoch 200 (lr:    0.000000):  training-Loss: 29.675
Epoch 250 (lr:    0.000146):  training-Loss: 30.233
Stage weights:   tensor([[0.2955, 0.1115, 0.1868, 0.2059, 0.2003]], device='cuda:0') 

Training SWL: 
Epoch 0 (lr:    0.010000):  training-Loss: -21552.664
Epoch 100 (lr:    0.005000):  training-Loss: -30411.086
Epoch 200 (lr:    0.000000):  training-Loss: -30524.781
Epoch 300 (lr:    0.005000):  training-Loss: -30746.986
Epoch 400 (lr:    0.010000):  training-Loss: -32101.652
Epoch 500 (lr:    0.005000):  training-Loss: -32832.008
Epoch 600 (lr:    0.000000):  training-Loss: -32903.707
Epoch 700 (lr:    0.005000):  training-Loss: -33466.656
SWL training accuracy:  64.325%
SWL testing accuracy:  63.600%
