# Using `torch-choice` as Benchmark Model in Machine Learning Setting: MNIST Dataset

This tutorial demonstrate the usage of `torch-choice`'s logit model as a benchmark multinominal model in machine learning setting. We will use the MNIST dataset as an example.

In [1]:
from time import time
import torch, torchvision

from torch_choice.data import ChoiceDataset
from torch_choice.model import ConditionalLogitModel
from torch_choice import run

In [2]:
print("PyTorch Version: ",torch.__version__)
print("GPU Available: ",torch.cuda.is_available())
# use GPU is available else use CPU.
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using device:', DEVICE)

PyTorch Version:  1.13.0+cu117
GPU Available:  True
Using device: cuda


In [3]:
# download MNIST dataset.
mnist_train = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=None)
mnist_test = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=None)

In [4]:
print(f'{mnist_train.data.shape=:}')
print(f'{mnist_train.targets.shape=:}')
print(f'{mnist_test.data.shape=:}')
print(f'{mnist_test.targets.shape=:}')

mnist_train.data.shape=torch.Size([60000, 28, 28])
mnist_train.targets.shape=torch.Size([60000])
mnist_test.data.shape=torch.Size([10000, 28, 28])
mnist_test.targets.shape=torch.Size([10000])


In [5]:
X = torch.cat([mnist_train.data.reshape(60000, -1), mnist_test.data.reshape(10000, -1)], dim=0)
y = torch.cat([mnist_train.targets, mnist_test.targets], dim=0)
print(f'{X.shape=:}')
print(f'{y.shape=:}')
N_train = 60000
N_test = 10000
N = N_train + N_test

X.shape=torch.Size([70000, 784])
y.shape=torch.Size([70000])


We assume each image in the MNIST dataset is corresponding to a session, and we are predicting the "item" chosen in this session. The chosen "item" is the digit in the image.

In [6]:
dataset = ChoiceDataset(session_index=torch.arange(N), item_index=y, session_image=X)
train_index = torch.arange(60000)
test_index = torch.arange(60000, 60000 + 10000)
# we don't have a validation set.
dataset_train = dataset[train_index].to(DEVICE)
dataset_test = dataset[test_index].to(DEVICE)

For each digit $i \in \{0, 1, \dots 9\}$, for each image indexed $n \in \{1, 2, \dots, 70000\}$, let $X^{(n)} \in \mathbb{R}^{768}$ denote image $n$'s feature vector. The potential of image $n$ to represent digit $i$ is captured by:
$$
U_{i}^{(n)} = \alpha_i + (X^{(n)})^T \beta_i
$$

The predicted probability of image $n$ being digit $i$ is given by the soft-max transformation of above potentials:

$$
P_{i}^{(n)} = \frac{\exp(U_{i}^{(n)})}{\sum_{j=0}^9 \exp(U_{j}^{(n)})}
$$

In [7]:
model = ConditionalLogitModel(
    formula='(session_image|item-full) + (1|item-full)',
    dataset=dataset_train,
    num_items=10)
model = model.to(DEVICE)

In [8]:
start_time = time()
run(model, dataset_train=dataset_train, dataset_test=dataset_test, num_epochs=300, learning_rate=0.003, model_optimizer="LBFGS", batch_size=-1, device=DEVICE, report_std=False)
print('Time taken:', time() - start_time)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
  rank_zero_warn(
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type                  | Params
------------------------------------------------
0 | model | ConditionalLogitModel | 7.9 K 
------------------------------------------------
7.9 K     Trainable params
0         Non-trainable params
7.9 K     Total params
0.031     Total estimated model params size (MB)


ConditionalLogitModel(
  (coef_dict): ModuleDict(
    (session_image[item-full]): Coefficient(variation=item-full, num_items=10, num_users=None, num_params=784, 7840 trainable parameters in total, device=cuda:0).
    (intercept[item-full]): Coefficient(variation=item-full, num_items=10, num_users=None, num_params=1, 10 trainable parameters in total, device=cuda:0).
  )
)
Conditional logistic discrete choice model, expects input features:

X[session_image[item-full]] with 784 parameters, with item-full level variation.
X[intercept[item-full]] with 1 parameters, with item-full level variation.
device=cuda:0
[Train dataset] ChoiceDataset(label=[], item_index=[60000], user_index=[], session_index=[60000], item_availability=[], session_image=[70000, 784], device=cuda:0)
[Validation dataset] None
[Test dataset] ChoiceDataset(label=[], item_index=[10000], user_index=[], session_index=[10000], item_availability=[], session_image=[70000, 784], device=cuda:0)


  rank_zero_warn(
  rank_zero_warn(


Training: 0it [00:00, ?it/s]

`Trainer.fit` stopped: `max_epochs=300` reached.
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Time taken for training: 113.30555725097656


  rank_zero_warn(


Testing: 0it [00:00, ?it/s]

Time taken: 113.3552496433258


In [9]:
model = model.to(DEVICE)

In [10]:
train_acc = torch.mean((model.forward(dataset_train).argmax(dim=1) == dataset_train.item_index).float())
test_acc = torch.mean((model.forward(dataset_test).argmax(dim=1) == dataset_test.item_index).float())
print(f"Training Accuracy: {train_acc*100:.2f}%.")
print(f"Test Accuracy: {test_acc*100:.2f}%.")

Training Accuracy: 94.33%.
Test Accuracy: 91.95%.
