# BEMB Paper Code Demonstration
This file contains code demonstrated in our BEMB paper, readers can run the code in this file to reproduce the results in our paper or modify the code to fit their own needs.

Readers can refer to the `torch-choice` paper or `torch-choice` documentation website for more details about the `ChoiceDataset` data structure.

Code for simulation studies is in another separate notebook.

> Author: Tianyu Du
>
> Date: Sept. 12, 2023

In [1]:
import random
import numpy as np
import torch

In [2]:
import torch_choice
from torch_choice.data import ChoiceDataset

import bemb
from bemb.model import LitBEMBFlex
from bemb.utils.run_helper import run

In [3]:
print(f"{torch.__version__=:}")
print(f"{torch.cuda.is_available()=:}")
print(f"{torch_choice.__version__=:}")
print(f"{bemb.__version__=:}")

torch.__version__=2.0.1
torch.cuda.is_available()=False
torch_choice.__version__=1.0.4a
bemb.__version__=0.1.6


In [4]:
# for reproducibility, fix random seeds.
random.seed(1234)
np.random.seed(1234)
torch.random.manual_seed(1234)

<torch._C.Generator at 0x115985190>

In [5]:
if torch.cuda.is_available():
    DEVICE = "cuda"
else:
    DEVICE = "cpu"

# Generate Random Information for Demonstration
Here we will use randomly generated information to illustrate the usage of `ChoiceDataset`. Observable tensors are classified by how they vary by user, item, and session. The package is expecting particular shapes of these observable tensors based on their types.

In [6]:
# Feel free to modify it as you want.
num_users = 10  # $U$
num_items = 4  # $I$
num_sessions = 500  # $S$

length_of_dataset = 10000  # $N$
# create observables/features, the number of parameters are arbitrarily chosen.
# generate 128 features for each user, e.g., race, gender.
# these variables should have shape (num_users, *)
user_obs = torch.randn(num_users, 128)
# generate 64 features for each user, e.g., quality.
item_obs = torch.randn(num_items, 64)
# generate 10 features for each session, e.g., weekday indicator. 
session_obs = torch.randn(num_sessions, 10)
# generate 12 features for each session user pair, e.g., the budget of that user at the shopping day.
itemsession_obs = torch.randn(num_sessions, num_items, 12)
# generate 12 features for each user item pair, e.g., the user's preference on that item.
useritem_obs = torch.randn(num_users, num_items, 12)
# generate 10 user-session specific observables, e.g., the historical spending amount of that user at that session.
usersession_obs = torch.randn(num_users, num_sessions, 10)
# generate 8 features for each user session item triple, e.g., the user's preference on that item at that session.
# since `U*S*I` is potentially huge and may cause identifiability issues, we rarely use this kind of observable in practice.
usersessionitem_obs = torch.randn(num_users, num_sessions, num_items, 8)

# generate the array of item[n].
item_index = torch.LongTensor(np.random.choice(num_items, size=length_of_dataset))
# generate the array of user[n].
user_index = torch.LongTensor(np.random.choice(num_users, size=length_of_dataset))
# generate the array of session[n].
session_index = torch.LongTensor(np.random.choice(num_sessions, size=length_of_dataset))

# assume all items are available in all sessions.
item_availability = torch.ones(num_sessions, num_items).bool()

In [7]:
dataset = ChoiceDataset(
    # pre-specified keywords of __init__
    item_index=item_index,  # required.
    num_items=num_items,
    # optional:
    user_index=user_index,
    num_users=num_users,
    session_index=session_index,
    item_availability=item_availability,
    # additional keywords of __init__
    user_obs=user_obs,
    item_obs=item_obs,
    session_obs=session_obs,
    itemsession_obs=itemsession_obs,
    useritem_obs=useritem_obs,
    usersession_obs=usersession_obs,
    usersessionitem_obs=usersessionitem_obs)

In [8]:
def print_dict(d):
    for k, v in d.items():
        if torch.is_tensor(v):
            print(f"{k}: {v.shape}")
print_dict(dataset.x_dict)

user_obs: torch.Size([10000, 4, 128])
item_obs: torch.Size([10000, 4, 64])
session_obs: torch.Size([10000, 4, 10])
itemsession_obs: torch.Size([10000, 4, 12])
useritem_obs: torch.Size([10000, 4, 12])
usersession_obs: torch.Size([10000, 4, 10])
usersessionitem_obs: torch.Size([10000, 4, 8])


In [9]:
# we can subset the dataset by conventional python indexing.
dataset_train = dataset[:8000].to(DEVICE)
dataset_val = dataset[8000:9000].to(DEVICE)
dataset_test = dataset[9000:].to(DEVICE)

# Conduct the ELBO Optimization

In [10]:
import warnings
warnings.filterwarnings("ignore")

In [11]:
bemb = LitBEMBFlex(
    learning_rate=0.1,  # set the learning rate, feel free to play with different levels.
    pred_item=True,  # let the model predict item_index, don't change this one.
    num_seeds=32,  # number of Monte Carlo samples for estimating the ELBO.
    utility_formula="alpha_item + beta_user * gamma_item + delta_user * item_obs + eta_item * pi_user * session_obs",  # the utility formula.
    num_users=num_users,
    num_items=num_items,
    num_sessions=num_sessions,
    num_user_obs=dataset.user_obs.shape[1],
    num_item_obs=dataset.item_obs.shape[1],
    # we use obs2prior on all coefficients, simply change them to False if you want to disable the obs2prior for a particular coefficient.
    obs2prior_dict={"alpha_item": True, 
                    "beta_user": True,
                    "gamma_item": True,
                    "delta_user": True,
                    "eta_item": True,
                    "pi_user": True},
    # the dimension of latents, since the utility is an inner product of theta and alpha, they should have
    # the same dimension.
    coef_dim_dict={"alpha_item": 1,  # fix effect should always have dimension of 1.
                   # the matrix decomposition term beta_user * gamma_item indicates that beta_user and gamma_item should have the same dimension.
                   # we choose the latent dimension to 10 here.
                   "beta_user": 10,
                   "gamma_item": 10,
                   # delta_user * item_obs term indicates that delta_user and item_obs should have the same dimension.
                   # and we generated 64 item features above.
                   "delta_user": 64,
                   # eta_item * pi_user* session_obs suggests that both of eta_item and pi_user should have the same dimension.
                   # the dimension of them should be the dimension of session_obs (which is 10) multiplied by the latent dimension.
                   # we choose the latent dimension to be 3 here.
                   "eta_item": 10*3,
                   "pi_user": 10*3},
)

# move the model to the computing device (e.g., GPU if available).
bemb = bemb.to(DEVICE)

# estimate the model for 3 epochs.
bemb = bemb.fit_model([dataset_train, dataset_val, dataset_test],
                      batch_size=128, num_epochs=3, num_workers=0, device=DEVICE, enable_progress_bar=True)

GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name  | Type     | Params
-----------------------------------
0 | model | BEMBFlex | 34.3 K
-----------------------------------
34.3 K    Trainable params
0         Non-trainable params
34.3 K    Total params
0.137     Total estimated model params size (MB)


BEMB: utility formula parsed:
[{'coefficient': ['alpha_item'], 'observable': None},
 {'coefficient': ['beta_user', 'gamma_item'], 'observable': None},
 {'coefficient': ['delta_user'], 'observable': 'item_obs'},
 {'coefficient': ['eta_item', 'pi_user'], 'observable': 'session_obs'}]
Bayesian EMBedding Model with U[user, item, session] = alpha_item + beta_user * gamma_item + delta_user * item_obs + eta_item * pi_user * session_obs
Total number of parameters: 34280.
With the following coefficients:
ModuleDict(
  (alpha_item): BayesianCoefficient(num_classes=4, dimension=1, prior=N(H*X_obs(H shape=torch.Size([1, 64]), X_obs shape=64), Ix1.0))
  (beta_user): BayesianCoefficient(num_classes=10, dimension=10, prior=N(H*X_obs(H shape=torch.Size([10, 128]), X_obs shape=128), Ix1.0))
  (gamma_item): BayesianCoefficient(num_classes=4, dimension=10, prior=N(H*X_obs(H shape=torch.Size([10, 64]), X_obs shape=64), Ix1.0))
  (delta_user): BayesianCoefficient(num_classes=10, dimension=64, prior=N(H*X_o

`Trainer.fit` stopped: `max_epochs=3` reached.


Epoch 2: 100%|██████████| 71/71 [00:02<00:00, 33.17it/s, loss=3.44e+03, v_num=45, val_acc=0.248, val_ll=-1.99]
time taken: 6.626070022583008
Testing DataLoader 0: 100%|██████████| 84/84 [00:00<00:00, 182.67it/s]


In [12]:
print("The paper demon notebook has run successfully.")

The paper demon notebook has run successfully.
