# Titanic: Survival Model

Build and train a model to predict survival on the Titanic using a [cleaned and split dataset](https://huggingface.co/datasets/jamieoliver/titanic-2410), and upload the model to Hugging Face.

Based on https://github.com/fastai/course22/blob/master/clean/05-linear-model-and-neural-net-from-scratch.ipynb using the dataset from https://www.kaggle.com/competitions/titanic.

Plan
- [x] Download [cleaned and split dataset](https://huggingface.co/datasets/jamieoliver/titanic-2410) from Hugging Face
- [x] Prepare data for model
    - [x] Load training dataset as PyTorch tensors
    - [x] Normalise training dataset
- [x] Train linear model
    - [x] Set up coefficients
    - [x] Set up gradient descent step
    - [x] Run training loop
- [x] Train neural network
    - [x] Set up coefficients
    - [x] Run training loop
- [x] Train deep neural network
    - [x] Set up coefficients
    - [x] Run training loop
- [x] Recreate as PyTorch module
- [ ] Test model
- [ ] Upload model to Hugging Face

##  Download Dataset from Hugging Face

In [1]:
from datasets import *

datasetDict = load_dataset('jamieoliver/titanic-2410')
datasetDict

DatasetDict({
    train: Dataset({
        features: ['survived', 'name', 'age', 'sibsp', 'parch', 'ticket', 'fare', 'cabin', 'log_fare', 'pclass_1', 'pclass_2', 'pclass_3', 'sex_female', 'sex_male', 'embarked_C', 'embarked_Q', 'embarked_S'],
        num_rows: 1047
    })
    validation: Dataset({
        features: ['survived', 'name', 'age', 'sibsp', 'parch', 'ticket', 'fare', 'cabin', 'log_fare', 'pclass_1', 'pclass_2', 'pclass_3', 'sex_female', 'sex_male', 'embarked_C', 'embarked_Q', 'embarked_S'],
        num_rows: 131
    })
    test: Dataset({
        features: ['survived', 'name', 'age', 'sibsp', 'parch', 'ticket', 'fare', 'cabin', 'log_fare', 'pclass_1', 'pclass_2', 'pclass_3', 'sex_female', 'sex_male', 'embarked_C', 'embarked_Q', 'embarked_S'],
        num_rows: 131
    })
})

## Prepare Data for Model

### Load Training Dataset as PyTorch Tensors

In [2]:
import torch
from torch import tensor

torch.set_default_device('cuda')
torch.set_printoptions(linewidth=120, edgeitems=10)

In [3]:
train_dataset = datasetDict['train']
train_dataset

Dataset({
    features: ['survived', 'name', 'age', 'sibsp', 'parch', 'ticket', 'fare', 'cabin', 'log_fare', 'pclass_1', 'pclass_2', 'pclass_3', 'sex_female', 'sex_male', 'embarked_C', 'embarked_Q', 'embarked_S'],
    num_rows: 1047
})

The dependent variable is the variable we are predicting i.e. `survived`.

In [4]:
dependent_var = tensor(train_dataset.to_pandas()['survived'].values, dtype=torch.float)
dependent_var

tensor([1., 0., 0., 0., 0., 0., 0., 1., 0., 0.,  ..., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0.], device='cuda:0')

In [5]:
dependent_var.shape

torch.Size([1047])

The independent variables are the variables we will use to make the prediction. Note that we use a trick in mutiplying the Pandas DataFrame by 1 to convert booleans to integers.

In [6]:
independent_cols = ['age', 'sibsp', 'parch', 'log_fare', 'pclass_1', 'pclass_2', 'pclass_3', 'sex_female', 'sex_male',
                    'embarked_C', 'embarked_Q', 'embarked_S']

independent_vars = tensor((train_dataset.to_pandas()*1)[independent_cols].values, dtype=torch.float)
independent_vars

tensor([[ 4.0000,  1.0000,  1.0000,  3.1781,  0.0000,  1.0000,  0.0000,  1.0000,  0.0000,  0.0000,  0.0000,  1.0000],
        [20.0000,  0.0000,  0.0000,  2.1889,  0.0000,  0.0000,  1.0000,  0.0000,  1.0000,  0.0000,  0.0000,  1.0000],
        [32.5000,  0.0000,  0.0000,  5.3589,  1.0000,  0.0000,  0.0000,  0.0000,  1.0000,  1.0000,  0.0000,  0.0000],
        [23.0000,  0.0000,  0.0000,  2.7754,  0.0000,  1.0000,  0.0000,  0.0000,  1.0000,  1.0000,  0.0000,  0.0000],
        [47.0000,  0.0000,  0.0000,  3.9703,  1.0000,  0.0000,  0.0000,  0.0000,  1.0000,  0.0000,  0.0000,  1.0000],
        [30.0000,  1.0000,  0.0000,  3.0910,  0.0000,  1.0000,  0.0000,  0.0000,  1.0000,  0.0000,  0.0000,  1.0000],
        [40.0000,  1.0000,  0.0000,  3.2958,  0.0000,  1.0000,  0.0000,  0.0000,  1.0000,  0.0000,  0.0000,  1.0000],
        [18.0000,  0.0000,  2.0000,  4.3901,  1.0000,  0.0000,  0.0000,  1.0000,  0.0000,  0.0000,  0.0000,  1.0000],
        [32.0000,  0.0000,  0.0000,  2.1856,  0.0000,  0

In [7]:
independent_vars.shape

torch.Size([1047, 12])

### Normalise Training Dataset

In [8]:
max_vals,indices = independent_vars.max(dim=0)
independent_vars /= max_vals
independent_vars

tensor([[0.0526, 0.1250, 0.1111, 0.5092, 0.0000, 1.0000, 0.0000, 1.0000, 0.0000, 0.0000, 0.0000, 1.0000],
        [0.2632, 0.0000, 0.0000, 0.3507, 0.0000, 0.0000, 1.0000, 0.0000, 1.0000, 0.0000, 0.0000, 1.0000],
        [0.4276, 0.0000, 0.0000, 0.8587, 1.0000, 0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.0000],
        [0.3026, 0.0000, 0.0000, 0.4447, 0.0000, 1.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.0000],
        [0.6184, 0.0000, 0.0000, 0.6362, 1.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 1.0000],
        [0.3947, 0.1250, 0.0000, 0.4953, 0.0000, 1.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 1.0000],
        [0.5263, 0.1250, 0.0000, 0.5281, 0.0000, 1.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 1.0000],
        [0.2368, 0.0000, 0.2222, 0.7034, 1.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 0.0000, 1.0000],
        [0.4211, 0.0000, 0.0000, 0.3502, 0.0000, 0.0000, 1.0000, 0.0000, 1.0000, 0.0000, 0.0000, 1.0000],
        [0.2105, 0.2500, 0.0000, 0.4718, 0.000

## Train Linear Model

### Set Up Coefficients

Initialise random coefficients as a column vector.

In [9]:
num_coeffs = independent_vars.shape[1]
torch.manual_seed(42)
coeffs = torch.rand(num_coeffs, 1) - 0.5
coeffs

tensor([[ 0.1130],
        [-0.4899],
        [-0.1016],
        [-0.4597],
        [-0.3437],
        [-0.0175],
        [ 0.2362],
        [-0.0940],
        [ 0.0189],
        [-0.2133],
        [-0.2584],
        [ 0.4228]], device='cuda:0')

Transpose the dependent variable into a column vector.

In [10]:
dependent_var = dependent_var[:,None]
dependent_var[:10]

tensor([[1.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.]], device='cuda:0')

In [11]:
predictions = independent_vars@coeffs
predictions[:10]

tensor([[ 0.0106],
        [ 0.5465],
        [-0.8845],
        [-0.3822],
        [-0.1246],
        [ 0.1799],
        [ 0.1797],
        [-0.3341],
        [ 0.5646],
        [ 0.3624]], device='cuda:0')

In [12]:
loss = torch.abs(predictions - dependent_var).mean()
loss

tensor(0.6817, device='cuda:0')

In [13]:
def calc_predictions(coeffs, independent_vars):
    return torch.sigmoid(independent_vars@coeffs).sum(axis=1)

def calc_loss(coeffs, independent_vars, dependent_var):
    return torch.abs(calc_predictions(coeffs, independent_vars) - dependent_var).mean()
    

### Set Up Gradient Descent Step

In [14]:
coeffs.requires_grad_()

tensor([[ 0.1130],
        [-0.4899],
        [-0.1016],
        [-0.4597],
        [-0.3437],
        [-0.0175],
        [ 0.2362],
        [-0.0940],
        [ 0.0189],
        [-0.2133],
        [-0.2584],
        [ 0.4228]], device='cuda:0', requires_grad=True)

In [15]:
loss = calc_loss(coeffs, independent_vars, dependent_var)
loss

tensor(0.5017, device='cuda:0', grad_fn=<MeanBackward0>)

In [16]:
loss.backward()
coeffs.grad

tensor([[0.0195],
        [0.0033],
        [0.0022],
        [0.0248],
        [0.0122],
        [0.0117],
        [0.0282],
        [0.0188],
        [0.0333],
        [0.0104],
        [0.0051],
        [0.0365]], device='cuda:0')

In [17]:
loss = calc_loss(coeffs, independent_vars, dependent_var)
loss.backward()
with torch.no_grad():
    coeffs.sub_(coeffs.grad * 0.1)
    coeffs.grad.zero_()
    print(calc_loss(coeffs, independent_vars, dependent_var))

tensor(0.5007, device='cuda:0')


### Run Training Loop

In [18]:
def update_coeffs(coeffs, learning_rate):
    coeffs.sub_(coeffs.grad * learning_rate)
    coeffs.grad.zero_()

In [19]:
def one_epoch(coeffs, learning_rate):
    loss = calc_loss(coeffs, independent_vars, dependent_var)
    loss.backward()
    with torch.no_grad(): update_coeffs(coeffs, learning_rate)
    print(f'{loss:.3f}', end='; ')

In [20]:
def init_coeffs():
    return (torch.rand(num_coeffs, 1) - 0.5).requires_grad_()

In [21]:
def train_model(epochs=30, learning_rate=0.01):
    torch.manual_seed(442)
    coeffs = init_coeffs()
    for i in range (epochs):
        one_epoch(coeffs, learning_rate)
        
    return coeffs

In [22]:
coeffs = train_model(epochs=20, learning_rate=100)

0.514; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 0.391; 

In [23]:
def show_coeffs():
    return dict(zip(independent_cols, coeffs.requires_grad_(False)))

show_coeffs()

{'age': tensor([-1.6475], device='cuda:0'),
 'sibsp': tensor([-0.8166], device='cuda:0'),
 'parch': tensor([-0.0429], device='cuda:0'),
 'log_fare': tensor([-2.3991], device='cuda:0'),
 'pclass_1': tensor([-1.1872], device='cuda:0'),
 'pclass_2': tensor([-1.5524], device='cuda:0'),
 'pclass_3': tensor([-3.3380], device='cuda:0'),
 'sex_female': tensor([-1.9546], device='cuda:0'),
 'sex_male': tensor([-3.1221], device='cuda:0'),
 'embarked_C': tensor([-0.8663], device='cuda:0'),
 'embarked_Q': tensor([-0.3180], device='cuda:0'),
 'embarked_S': tensor([-3.6144], device='cuda:0')}

## Train Neural Network

### Set up Coefficients

Initialisation of the coefficients for the neural network is similar to the linear model. However, we need to initialise the coefficients for each layer in the network. We do this by creating a list of tensors, one for each layer. The number of columns in the first layer is equal to the number of independent variables. The number of columns in each subsequent layer is equal to the number of hidden coefficients, which we set to a constant value. The number of columns in the last layer is equal to the number of dependent variables i.e. 1 in our case.

We divide the number of hidden coefficients by the number of columns in the first layer to ensure that the weights are distributed evenly across each layer. This ensures that the gradients for each layer will be similar and have a similar impact on the model's performance.

We add a constant to the last layer of coefficients so that the model can learn a bias term.

In [24]:
def init_coeffs(num_hidden_coeffs=20):
    layer_1 = (torch.rand(num_coeffs, num_hidden_coeffs) - 0.5) / num_hidden_coeffs
    layer_2 = torch.rand(num_hidden_coeffs, 1) - 0.3
    const = torch.rand(1)[0]
    return layer_1.requires_grad_(), layer_2.requires_grad_(), const.requires_grad_()

Calculation of predictions is also similar to the linear model except that we take the outputs from the first layer and pass them to the second layer. Note the use of `relu` to rectify the outputs of the first layer to ensure that they are positive.

In [25]:
import torch.nn.functional as F

def calc_predictions(coeffs, independent_vars):
  layer_1, layer_2, const = coeffs
  result = F.relu(independent_vars@layer_1)
  result = result@layer_2 + const
  return torch.sigmoid(result)
     

In [26]:
def update_coeffs(coeffs, learning_rate):
  for layer_coeffs in coeffs:
    layer_coeffs.sub_(layer_coeffs.grad * learning_rate)
    layer_coeffs.grad.zero_()

### Run Training Loop

In [27]:
coeffs = train_model(epochs=100, learning_rate=20)

0.536; 0.433; 0.273; 0.371; 0.268; 0.219; 0.219; 0.218; 0.217; 0.213; 0.210; 0.209; 0.207; 0.207; 0.206; 0.206; 0.206; 0.206; 0.205; 0.205; 0.205; 0.205; 0.205; 0.205; 0.205; 0.205; 0.205; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 

## Train Deep Neural Network

### Set up Coefficients

We now genericise the number of hidden layers and their sizes. We also add a constant term to each layer.

In [28]:
def init_coeffs(num_hidden_coeffs=[10, 10]):
  layer_sizes = [num_coeffs] + num_hidden_coeffs + [1]
  num_layers = len(layer_sizes)

  layers = [(torch.rand(layer_sizes[i], layer_sizes[i+1]) - 0.5) / layer_sizes[i + 1] * 4 for i in range(num_layers - 1)]
  consts = [(torch.rand(1)[0] - 0.5) * 0.1 for i in range(num_layers - 1)]

  for layer in layers + consts:
    layer.requires_grad_()

  return layers, consts

Calculation of predictions proceeds largely as before but instead we loop over each layer rather than explicitly writing them out.

In [29]:
def calc_predictions(coeffs, independent_vars):
  layers, consts = coeffs
  num_layers = len(layers)
  result = independent_vars
  
  for i, layer in enumerate(layers):
    result = result@layer + consts[i]
    if i != num_layers - 1:
      result = F.relu(result)

  return torch.sigmoid(result)

A minor change to to the training loop is required as we now have a list of coefficients rather than a single set.

In [30]:
def update_coeffs(coeffs, learning_rate):
  layers, consts = coeffs
  for layer in layers + consts:
    layer.sub_(layer.grad * learning_rate)
    layer.grad.zero_()

### Run Training Loop

In [31]:
coeffs = train_model(epochs=100, learning_rate=4)

0.505; 0.488; 0.454; 0.380; 0.358; 0.343; 0.337; 0.325; 0.408; 0.307; 0.289; 0.266; 0.239; 0.221; 0.215; 0.212; 0.210; 0.210; 0.209; 0.209; 0.207; 0.207; 0.206; 0.205; 0.205; 0.205; 0.204; 0.204; 0.204; 0.204; 0.204; 0.204; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.203; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.201; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 0.200; 

## Recreate as PyTorch Module

We subclass the PyTorch Sequential class to create a model that we can train. The layers from the previous example are recreated as members of the class, and are initialised with random weights as before.

In [32]:
import torch.nn as nn

class TitanicModel(nn.Sequential):
  def __init__(self):
    super(TitanicModel, self).__init__(nn.Linear(num_coeffs, 20),
                                       nn.ReLU(),
                                       nn.Linear(20, 20),
                                       nn.ReLU(),
                                       nn.Linear(20, 1),
                                       nn.Sigmoid())
    
    self.apply(self._init_weights)
    
  def _init_weights(self, module):
    if isinstance(module, nn.Linear):
      module.weight.data.normal_(mean=0.0, std=0.5)

The training loop is similar to before with the exception that we use the PyTorch optimisation machinery rather than creating our own.  

In [33]:
import torch.optim as optim

def train_model(epochs=30, learning_rate=10):
  torch.manual_seed(442)
  model = TitanicModel()
  calc_loss = nn.L1Loss()
  optimizer = optim.SGD(model.parameters(), lr=learning_rate)

  for epoch in range(epochs):
    optimizer.zero_grad()

    output = model(independent_vars)
    loss = calc_loss(output, dependent_var)

    loss.backward()
    optimizer.step()
    print(f'{loss:.3f}', end='; ')

  return model

Training progresses similar to before:

In [34]:
train_model(epochs=25, learning_rate=15)

0.479; 0.390; 0.390; 0.389; 0.378; 0.232; 0.389; 0.358; 0.214; 0.208; 0.263; 0.232; 0.219; 0.218; 0.206; 0.206; 0.203; 0.203; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 0.202; 

TitanicModel(
  (0): Linear(in_features=12, out_features=20, bias=True)
  (1): ReLU()
  (2): Linear(in_features=20, out_features=20, bias=True)
  (3): ReLU()
  (4): Linear(in_features=20, out_features=1, bias=True)
  (5): Sigmoid()
)