# Training the small model

The code below will load training data and train the so-called "small" model. For details, see the main [README.md](https://github.com/gbordyugov/pndapetzim/blob/main/README.md) for the project.

The following code sets up some parameters and loads the train/test data.

In [1]:
from pndapetzim.data import load_datasets, LABEL_FILE_NAME, ORDER_FILE_NAME

# Length of user history to consider.
seq_len = 20

# Relative weight of training samples with returning customers, plays the role of oversampling.
returning_weight = 5.0

# Load train/test data and encodings for the model.
train, test, encodings = load_datasets(
    order_path='../data/' + ORDER_FILE_NAME,
    label_path='../data/' + LABEL_FILE_NAME,
    seq_len=seq_len,
    train_ratio=100,
    returning_weight=returning_weight,
)

The following code builds the model.

In [2]:
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.metrics import AUC, Recall
from tensorflow.keras.optimizers import Adam

from pndapetzim.models import build_small_model

lossm = BinaryCrossentropy()
optimiser = Adam(learning_rate=0.01)

model = build_small_model(seq_len, 5)
aucm = AUC()
recallm = Recall()
metrics = ['accuracy', aucm, recallm]
model.compile(loss=lossm, optimizer=optimiser, metrics=metrics)
batch_size = 128

Now we can start training (takes a couple of minutes, though). You can safely ignore the warning as the data set contains more features that are needed for the small model.

In [3]:
model.fit(train.batch(batch_size).prefetch(10), epochs=1)

  [n for n in tensors.keys() if n not in ref_input_names])




<tensorflow.python.keras.callbacks.History at 0x10c3d3dd0>

And now let us evaluate the model on the test dataset:

In [4]:
loss, accuracy, auc, recall = model.evaluate(test.batch(batch_size))
print(f'loss: {loss}, accuracy: {accuracy}, AUC: {auc}, recall: {recallm.result()}')

loss: 0.9733473658561707, accuracy: 0.6872428059577942, AUC: 0.8037197589874268, recall: 0.7669903039932251
