# Training a BERT4REC model with category id feature

Nei Rec Sys è possibile sfruttare feature aggiuntive relative ai metadati degli item e il contesto degli utenti, dando al modello più informazioni al fine di ottenere predizioni con più significato. 
In questo notebook si adranno ad utilizzare feature aggiuntive per il train del modello BERT. Le feature aggiuntive da utilizzare sono quelle prodotte nella fase di preprocessing ETL. 


Installazione libreria 

In [None]:
!pip install transformers4rec[pytorch,nvtabular] -U

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers4rec[nvtabular,pytorch]
  Downloading transformers4rec-23.2.0.tar.gz (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting transformers<4.19
  Downloading transformers-4.18.0-py3-none-any.whl (4.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.0/4.0 MB[0m [31m93.5 MB/s[0m eta [36m0:00:00[0m
Collecting betterproto<2.0.0
  Downloading betterproto-1.2.5.tar.gz (26 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting torchmetrics>=0.10.0
  Downloading tor

Installazione cudf e dask_cudf

In [None]:
!pip install cudf-cu11==22.12 rmm-cu11==22.12 --extra-index-url=https://pypi.ngc.nvidia.com
!pip install cugraph-cu11==22.12 dask-cuda==22.12 dask-cudf-cu11==22.12  pylibcugraph-cu11==22.12 --extra-index-url=https://pypi.ngc.nvidia.com/
!pip install cuml-cu11==22.12 raft_dask_cu11==22.12 dask-cudf-cu11==22.12  pylibraft_cu11==22.12 ucx-py-cu11==0.29.0 --extra-index-url=https://pypi.ngc.nvidia.com


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://pypi.ngc.nvidia.com
Collecting cudf-cu11==22.12
  Downloading https://developer.download.nvidia.com/compute/redist/cudf-cu11/cudf_cu11-22.12.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (442.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m442.8/442.8 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting rmm-cu11==22.12
  Downloading https://developer.download.nvidia.com/compute/redist/rmm-cu11/rmm_cu11-22.12.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m32.0 MB/s[0m eta [36m0:00:00[0m
Collecting nvtx>=0.2.1
  Downloading nvtx-0.2.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (441 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m441.3/441.3 KB[0m [31m30.0 MB/s[0m eta [36m0:00:00[0m
Collecting ptx

In [None]:
import cudf

In [None]:
import os
import glob

import torch 
import transformers4rec.torch as tr

from transformers4rec.torch.ranking_metric import NDCGAt, RecallAt
from transformers4rec.torch.utils.examples_utils import wipe_memory

  warn(f"cuDF dtype mappings did not load successfully due to an error: {exc.msg}")
  warn(f"Triton dtype mappings did not load successfully due to an error: {exc.msg}")


Vengono definite le feature selezionate come input del modello, che verranno poi concatenate successivamente

In [None]:
# Define categorical and continuous columns to fed to training model
selected_features = ['product_id-list_seq', 'category_id-list_seq']

from merlin_standard_lib import Schema

# Define schema object to pass it to the TabularSequenceFeatures class
SCHEMA_PATH ='/content/drive/MyDrive/dataset_rees46/processed_nvt/schema.pbtxt'
schema = Schema().from_proto_text(SCHEMA_PATH)
schema = schema.select_by_name(selected_features)

In [None]:
!head -50 $SCHEMA_PATH

feature {
  name: "product_id-count"
  type: INT
  int_domain {
    name: "product_id"
    max: 166795
    is_categorical: true
  }
  annotation {
    tag: "categorical"
    extra_metadata {
      type_url: "type.googleapis.com/google.protobuf.Struct"
      value: "\nG\n\017embedding_sizes\0224*2\n\030\n\013cardinality\022\t\021\000\000\000\000`\\\004A\n\026\n\tdimension\022\t\021\000\000\000\000\000\000\200@\n\034\n\017dtype_item_size\022\t\021\000\000\000\000\000\000@@\n\025\n\010max_size\022\t\021\000\000\000\000\000\000\000\000\n\030\n\013start_index\022\t\021\000\000\000\000\000\000\360?\n\r\n\007is_list\022\002 \000\n\033\n\016freq_threshold\022\t\021\000\000\000\000\000\000\000\000\n\021\n\013num_buckets\022\002\010\000\n\017\n\tis_ragged\022\002 \000\n5\n\010cat_path\022)\032\'.//categories/unique.product_id.parquet"
    }
  }
}
feature {
  name: "user_session"
  type: INT
  int_domain {
    name: "user_session"
    max: 9244422
    is_categorical: true
  }
  annotation {
    t

Come si può notare dalla definizione del modulo input, è stato specificata una funzione di aggregazione di tipo "concat", ossia una funzione di concatenation merge. 

Ogni sessione $s^{(u)}$ dell'utente è rappresenata come una sequenza di $n_u$ item $x^{(u)} = x_{1:n_u}^{(u)}$ e $I$ sequenze di feature $f^{(u)} = {f_{i, 1:n_u}^{(u)}: i \in 1,...,I}$.

La concatenation merge consiste semplicemente nel concatenare l'item id $x_k^(u)$ con le altre input feature disponibili per l'interazione alla posizione k: $m_k = concat(x_k^{(u)}, f_{1,k}^{(u)}, ... , f_{I,k}^{(u)})$. 




In [None]:
#Input 
sequence_length, d_model = 20, 320

# Define input module to process tabular input-features and to prepare masked inputs
inputs= tr.TabularSequenceFeatures.from_schema(
    schema,
    max_sequence_length=sequence_length,
    aggregation="concat",
    d_output=d_model,
    masking="mlm",
)

In [None]:
inputs

TabularSequenceFeatures(
  (_aggregation): ConcatFeatures()
  (to_merge): ModuleDict(
    (categorical_module): SequenceEmbeddingFeatures(
      (filter_features): FilterFeatures()
      (embedding_tables): ModuleDict(
        (product_id-list_seq): Embedding(166796, 64, padding_idx=0)
        (category_id-list_seq): Embedding(626, 64, padding_idx=0)
      )
    )
  )
  (projection_module): SequentialBlock(
    (0): DenseBlock(
      (0): Linear(in_features=128, out_features=320, bias=True)
      (1): ReLU(inplace=True)
    )
  )
  (_masking): MaskedLanguageModeling()
)

In [None]:
#import transformers4rec.config.transformer as hf

transformer_config = tr.AlbertConfig.build(
    d_model=d_model, 
    item_embedding_dim = 320,
    n_head=8, 
    n_layer=2, 
    total_seq_length=sequence_length, 
    stochastic_shared_embeddings_replacement_prob = 0.06, #regularization
    input_dropout = 0.1,
    dropout = 0.0, #regularization
    label_smoothing = 0.2, #regularization (proved to be useful in train/val accuracy)
    weight_decay = 9.565968888623912e-05, #regularization,
    item_id_embeddings_init_std = 0.11,
    mlm_probability = 0.6,
    eval_on_last_item_seq_only = True,
    mf_constrained_embeddings = True,
    layer_norm_featurewise = True,
    num_hidden_groups = 1,
    inner_group_num = 1
)

# Define the model block including: inputs, masking, projection and transformer block.
body = tr.SequentialBlock(
    inputs,
    tr.MLPBlock([d_model]),
    tr.TransformerBlock(transformer_config, masking=inputs.masking)
)

# Define the head for to next item prediction task 
head = tr.Head(
    body,
    tr.NextItemPredictionTask(weight_tying=True,
                              metrics=[NDCGAt(top_ks=[10, 20], labels_onehot=True),  
                                       RecallAt(top_ks=[10, 20], labels_onehot=True)]),
)

# Get the end-to-end Model class 
model = tr.Model(head)



In [None]:
model

Model(
  (heads): ModuleList(
    (0): Head(
      (body): SequentialBlock(
        (0): TabularSequenceFeatures(
          (_aggregation): ConcatFeatures()
          (to_merge): ModuleDict(
            (categorical_module): SequenceEmbeddingFeatures(
              (filter_features): FilterFeatures()
              (embedding_tables): ModuleDict(
                (product_id-list_seq): Embedding(166796, 64, padding_idx=0)
                (category_id-list_seq): Embedding(626, 64, padding_idx=0)
              )
            )
          )
          (projection_module): SequentialBlock(
            (0): DenseBlock(
              (0): Linear(in_features=128, out_features=320, bias=True)
              (1): ReLU(inplace=True)
            )
          )
          (_masking): MaskedLanguageModeling()
        )
        (1): SequentialBlock(
          (0): DenseBlock(
            (0): Linear(in_features=320, out_features=320, bias=True)
            (1): ReLU(inplace=True)
          )
        )
     

In [None]:
from transformers4rec.config.trainer import T4RecTrainingArguments
from transformers4rec.torch import Trainer
from transformers4rec.torch.utils.data_utils import MerlinDataLoader

#Set arguments for training 
training_args = T4RecTrainingArguments(
            output_dir="/content/drive/MyDrive/dataset_rees46/bert_with_category_id",
            max_sequence_length=20,
            data_loader_engine='merlin',
            num_train_epochs=10, 
            dataloader_drop_last=True,
            compute_metrics_each_n_steps = 1,
            per_device_train_batch_size = 192,
            per_device_eval_batch_size = 512,
            gradient_accumulation_steps = 1,
            learning_rate=0.0004904752786458524,
            report_to = [],
            logging_steps=200,
        )



PyTorch: setting up devices


In [None]:
# Instantiate the T4Rec Trainer, which manages training and evaluation
trainer = Trainer(
    model=model,
    args=training_args,
    schema=schema,
    compute_metrics=True,
)

In [None]:
OUTPUT_DIR = os.environ.get("OUTPUT_DIR", "/content/drive/MyDrive/dataset_rees46/sessions_by_day")

In [None]:
%%time
start_time_window_index = 1
final_time_window_index = 30
for time_index in range(start_time_window_index, final_time_window_index):
    # Set data 
    time_index_train = time_index
    time_index_eval = time_index + 1
    train_paths = glob.glob(os.path.join(OUTPUT_DIR, f"{time_index_train}/train.parquet"))
    eval_paths = glob.glob(os.path.join(OUTPUT_DIR, f"{time_index_eval}/test.parquet"))
    # Train on day related to time_index 
    print('*'*20)
    print("Launch training for day %s are:" %time_index)
    print('*'*20 + '\n')
    trainer.train_dataset_or_path = train_paths
    trainer.reset_lr_scheduler()
    trainer.train()
    trainer.state.global_step +=1
    # Evaluate on the following day
    trainer.eval_dataset_or_path = eval_paths
    train_metrics = trainer.evaluate(metric_key_prefix='eval')
    print('*'*20)
    print("Eval results for day %s are:\t" %time_index_eval)
    print('\n' + '*'*20 + '\n')
    for key in sorted(train_metrics.keys()):
        print(" %s = %s" % (key, str(train_metrics[key]))) 
    wipe_memory()

********************
Launch training for day 1 are:
********************



***** Running training *****
  Num examples = 111936
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5830


Step,Training Loss
200,10.3994
400,9.1545
600,8.7038
800,8.5665
1000,8.4708
1200,8.1585
1400,8.1625
1600,8.0456
1800,7.7554
2000,7.7969


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 2 are:	

********************

 eval_/loss = 7.709770679473877
 eval_/next-item/ndcg_at_10 = 0.11591222137212753
 eval_/next-item/ndcg_at_20 = 0.13735851645469666
 eval_/next-item/recall_at_10 = 0.21257811784744263
 eval_/next-item/recall_at_20 = 0.29749998450279236
 eval_runtime = 2.5753
 eval_samples_per_second = 4970.236
 eval_steps_per_second = 9.707
********************
Launch training for day 2 are:
********************



***** Running training *****
  Num examples = 105984
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5520


Step,Training Loss
200,7.5039
400,7.4255
600,7.3225
800,7.1565
1000,7.1152
1200,7.0138
1400,6.9129
1600,6.8552
1800,6.7419
2000,6.6875


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 3 are:	

********************

 eval_/loss = 7.29655122756958
 eval_/next-item/ndcg_at_10 = 0.13932333886623383
 eval_/next-item/ndcg_at_20 = 0.1626628339290619
 eval_/next-item/recall_at_10 = 0.2551800310611725
 eval_/next-item/recall_at_20 = 0.34757134318351746
 eval_runtime = 2.3626
 eval_samples_per_second = 4984.311
 eval_steps_per_second = 9.735
********************
Launch training for day 3 are:
********************



***** Running training *****
  Num examples = 97728
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5090


Step,Training Loss
200,6.9652
400,6.9155
600,6.8337
800,6.7453
1000,6.6545
1200,6.6044
1400,6.5263
1600,6.5316
1800,6.415
2000,6.4557


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 4 are:	

********************

 eval_/loss = 7.0104804039001465
 eval_/next-item/ndcg_at_10 = 0.15056456625461578
 eval_/next-item/ndcg_at_20 = 0.17666293680667877
 eval_/next-item/recall_at_10 = 0.2763671875
 eval_/next-item/recall_at_20 = 0.3799479305744171
 eval_runtime = 3.1216
 eval_samples_per_second = 4920.489
 eval_steps_per_second = 9.61
********************
Launch training for day 4 are:
********************



***** Running training *****
  Num examples = 124416
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6480


Step,Training Loss
200,6.6541
400,6.627
600,6.5953
800,6.4881
1000,6.446
1200,6.4015
1400,6.3508
1600,6.3278
1800,6.2583
2000,6.2259


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 5 are:	

********************

 eval_/loss = 6.801351547241211
 eval_/next-item/ndcg_at_10 = 0.16289079189300537
 eval_/next-item/ndcg_at_20 = 0.18970556557178497
 eval_/next-item/recall_at_10 = 0.29810473322868347
 eval_/next-item/recall_at_20 = 0.40458622574806213
 eval_runtime = 2.6429
 eval_samples_per_second = 5230.703
 eval_steps_per_second = 10.216
********************
Launch training for day 5 are:
********************



***** Running training *****
  Num examples = 114432
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5960


Step,Training Loss
200,6.4769
400,6.4621
600,6.4787
800,6.3164
1000,6.3214
1200,6.3056
1400,6.21
1600,6.1589
1800,6.2015
2000,6.1188


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 6 are:	

********************

 eval_/loss = 6.533895969390869
 eval_/next-item/ndcg_at_10 = 0.17460039258003235
 eval_/next-item/ndcg_at_20 = 0.20250798761844635
 eval_/next-item/recall_at_10 = 0.31807002425193787
 eval_/next-item/recall_at_20 = 0.4281684160232544
 eval_runtime = 2.6578
 eval_samples_per_second = 5201.217
 eval_steps_per_second = 10.159
********************
Launch training for day 6 are:
********************



***** Running training *****
  Num examples = 112704
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5870


Step,Training Loss
200,6.2805
400,6.3054
600,6.344
800,6.188
1000,6.1631
1200,6.1901
1400,6.0773
1600,6.0834
1800,6.0593
2000,5.9887


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 7 are:	

********************

 eval_/loss = 6.778093338012695
 eval_/next-item/ndcg_at_10 = 0.1690753847360611
 eval_/next-item/ndcg_at_20 = 0.19596855342388153
 eval_/next-item/recall_at_10 = 0.3089843690395355
 eval_/next-item/recall_at_20 = 0.4153124988079071
 eval_runtime = 2.6077
 eval_samples_per_second = 4908.583
 eval_steps_per_second = 9.587
********************
Launch training for day 7 are:
********************



***** Running training *****
  Num examples = 105600
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5500


Step,Training Loss
200,6.4297
400,6.3754
600,6.3806
800,6.2567
1000,6.283
1200,6.2332
1400,6.1794
1600,6.1733
1800,6.0811
2000,6.0916


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 8 are:	

********************

 eval_/loss = 6.800469875335693
 eval_/next-item/ndcg_at_10 = 0.16831296682357788
 eval_/next-item/ndcg_at_20 = 0.1950402706861496
 eval_/next-item/recall_at_10 = 0.3066406548023224
 eval_/next-item/recall_at_20 = 0.41223961114883423
 eval_runtime = 3.0278
 eval_samples_per_second = 5073.055
 eval_steps_per_second = 9.908
********************
Launch training for day 8 are:
********************



***** Running training *****
  Num examples = 124992
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6510


Step,Training Loss
200,6.4764
400,6.4506
600,6.4592
800,6.3848
1000,6.2926
1200,6.2955
1400,6.2403
1600,6.1935
1800,6.1528
2000,6.1596


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 9 are:	

********************

 eval_/loss = 6.749115467071533
 eval_/next-item/ndcg_at_10 = 0.175711527466774
 eval_/next-item/ndcg_at_20 = 0.20262031257152557
 eval_/next-item/recall_at_10 = 0.3164736032485962
 eval_/next-item/recall_at_20 = 0.4228852391242981
 eval_runtime = 2.949
 eval_samples_per_second = 5034.982
 eval_steps_per_second = 9.834
********************
Launch training for day 9 are:
********************



***** Running training *****
  Num examples = 120768
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6290


Step,Training Loss
200,6.4254
400,6.3948
600,6.4736
800,6.3205
1000,6.2683
1200,6.3094
1400,6.2426
1600,6.1651
1800,6.1968
2000,6.1574


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 10 are:	

********************

 eval_/loss = 6.782398223876953
 eval_/next-item/ndcg_at_10 = 0.17461661994457245
 eval_/next-item/ndcg_at_20 = 0.201347216963768
 eval_/next-item/recall_at_10 = 0.3177083432674408
 eval_/next-item/recall_at_20 = 0.423394113779068
 eval_runtime = 2.784
 eval_samples_per_second = 4965.495
 eval_steps_per_second = 9.698
********************
Launch training for day 10 are:
********************



***** Running training *****
  Num examples = 112320
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5850


Step,Training Loss
200,6.4424
400,6.398
600,6.4308
800,6.3046
1000,6.2632
1200,6.2621
1400,6.1966
1600,6.1455
1800,6.1773
2000,6.0769


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 11 are:	

********************

 eval_/loss = 6.800650119781494
 eval_/next-item/ndcg_at_10 = 0.1733357310295105
 eval_/next-item/ndcg_at_20 = 0.19965022802352905
 eval_/next-item/recall_at_10 = 0.3167724609375
 eval_/next-item/recall_at_20 = 0.4208984375
 eval_runtime = 3.164
 eval_samples_per_second = 5178.3
 eval_steps_per_second = 10.114
********************
Launch training for day 11 are:
********************



***** Running training *****
  Num examples = 132480
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6900


Step,Training Loss
200,6.4078
400,6.4002
600,6.431
800,6.3954
1000,6.311
1200,6.2836
1400,6.309
1600,6.1677
1800,6.1992
2000,6.2029


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 12 are:	

********************

 eval_/loss = 6.566293239593506
 eval_/next-item/ndcg_at_10 = 0.1843193769454956
 eval_/next-item/ndcg_at_20 = 0.21150413155555725
 eval_/next-item/recall_at_10 = 0.3330393135547638
 eval_/next-item/recall_at_20 = 0.4404611885547638
 eval_runtime = 3.0702
 eval_samples_per_second = 5169.686
 eval_steps_per_second = 10.097
********************
Launch training for day 12 are:
********************



***** Running training *****
  Num examples = 128640
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6700


Step,Training Loss
200,6.3229
400,6.3232
600,6.3243
800,6.3195
1000,6.1826
1200,6.1537
1400,6.2381
1600,6.0989
1800,6.1138
2000,6.1134


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 13 are:	

********************

 eval_/loss = 6.541694641113281
 eval_/next-item/ndcg_at_10 = 0.18420101702213287
 eval_/next-item/ndcg_at_20 = 0.21288086473941803
 eval_/next-item/recall_at_10 = 0.3317440152168274
 eval_/next-item/recall_at_20 = 0.44519761204719543
 eval_runtime = 3.4268
 eval_samples_per_second = 5079.956
 eval_steps_per_second = 9.922
********************
Launch training for day 13 are:
********************



***** Running training *****
  Num examples = 141312
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 7360


Step,Training Loss
200,6.2751
400,6.2296
600,6.1897
800,6.2082
1000,6.1275
1200,6.0962
1400,6.0995
1600,6.0707
1800,6.0408
2000,6.0337


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 14 are:	

********************

 eval_/loss = 6.597865581512451
 eval_/next-item/ndcg_at_10 = 0.18598634004592896
 eval_/next-item/ndcg_at_20 = 0.213200643658638
 eval_/next-item/recall_at_10 = 0.3350260555744171
 eval_/next-item/recall_at_20 = 0.44251304864883423
 eval_runtime = 3.1031
 eval_samples_per_second = 4949.913
 eval_steps_per_second = 9.668
********************
Launch training for day 14 are:
********************



***** Running training *****
  Num examples = 127296
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6630


Step,Training Loss
200,6.2243
400,6.2323
600,6.3036
800,6.2163
1000,6.0882
1200,6.1253
1400,6.1928
1600,6.0218
1800,6.0259
2000,6.1197


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 15 are:	

********************

 eval_/loss = 6.631482124328613
 eval_/next-item/ndcg_at_10 = 0.18534228205680847
 eval_/next-item/ndcg_at_20 = 0.21255752444267273
 eval_/next-item/recall_at_10 = 0.3341064453125
 eval_/next-item/recall_at_20 = 0.4412841796875
 eval_runtime = 3.4808
 eval_samples_per_second = 4706.972
 eval_steps_per_second = 9.193
********************
Launch training for day 15 are:
********************



***** Running training *****
  Num examples = 136320
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 7100


Step,Training Loss
200,6.2886
400,6.2643
600,6.2591
800,6.2622
1000,6.1603
1200,6.1328
1400,6.173
1600,6.1051
1800,6.01
2000,6.0492


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 16 are:	

********************

 eval_/loss = 6.587581634521484
 eval_/next-item/ndcg_at_10 = 0.18464794754981995
 eval_/next-item/ndcg_at_20 = 0.21139304339885712
 eval_/next-item/recall_at_10 = 0.330322265625
 eval_/next-item/recall_at_20 = 0.43621826171875
 eval_runtime = 3.2378
 eval_samples_per_second = 5060.205
 eval_steps_per_second = 9.883
********************
Launch training for day 16 are:
********************



***** Running training *****
  Num examples = 133632
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6960


Step,Training Loss
200,6.233
400,6.2336
600,6.2961
800,6.2722
1000,6.1043
1200,6.1024
1400,6.2288
1600,6.0416
1800,6.0153
2000,6.0849


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 17 are:	

********************

 eval_/loss = 6.606080532073975
 eval_/next-item/ndcg_at_10 = 0.18804462254047394
 eval_/next-item/ndcg_at_20 = 0.2157135009765625
 eval_/next-item/recall_at_10 = 0.3400458097457886
 eval_/next-item/recall_at_20 = 0.4492861032485962
 eval_runtime = 2.9089
 eval_samples_per_second = 5104.382
 eval_steps_per_second = 9.969
********************
Launch training for day 17 are:
********************



***** Running training *****
  Num examples = 122304
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6370


Step,Training Loss
200,6.2653
400,6.2852
600,6.3492
800,6.2765
1000,6.1382
1200,6.1902
1400,6.1919
1600,6.0545
1800,6.0362
2000,6.1004


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 18 are:	

********************

 eval_/loss = 6.723509788513184
 eval_/next-item/ndcg_at_10 = 0.18616409599781036
 eval_/next-item/ndcg_at_20 = 0.21200011670589447
 eval_/next-item/recall_at_10 = 0.3321572542190552
 eval_/next-item/recall_at_20 = 0.4340347647666931
 eval_runtime = 3.0818
 eval_samples_per_second = 5150.235
 eval_steps_per_second = 10.059
********************
Launch training for day 18 are:
********************



***** Running training *****
  Num examples = 129216
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6730


Step,Training Loss
200,6.3912
400,6.3089
600,6.3501
800,6.3533
1000,6.1662
1200,6.1439
1400,6.3074
1600,6.0969
1800,6.0688
2000,6.1246


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 19 are:	

********************

 eval_/loss = 6.493366718292236
 eval_/next-item/ndcg_at_10 = 0.19086754322052002
 eval_/next-item/ndcg_at_20 = 0.21763823926448822
 eval_/next-item/recall_at_10 = 0.3429418206214905
 eval_/next-item/recall_at_20 = 0.4490166902542114
 eval_runtime = 2.9736
 eval_samples_per_second = 4993.232
 eval_steps_per_second = 9.752
********************
Launch training for day 19 are:
********************



***** Running training *****
  Num examples = 122304
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6370


Step,Training Loss
200,6.1942
400,6.1761
600,6.2655
800,6.1099
1000,6.059
1200,6.1393
1400,6.0889
1600,6.0021
1800,5.9646
2000,6.0667


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 20 are:	

********************

 eval_/loss = 6.373322486877441
 eval_/next-item/ndcg_at_10 = 0.19519972801208496
 eval_/next-item/ndcg_at_20 = 0.22282081842422485
 eval_/next-item/recall_at_10 = 0.3482421934604645
 eval_/next-item/recall_at_20 = 0.45729169249534607
 eval_runtime = 3.0336
 eval_samples_per_second = 5063.302
 eval_steps_per_second = 9.889
********************
Launch training for day 20 are:
********************



***** Running training *****
  Num examples = 126528
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6590


Step,Training Loss
200,6.0428
400,6.0306
600,6.1166
800,6.0825
1000,5.9471
1200,5.9169
1400,6.0155
1600,5.8754
1800,5.8649
2000,5.9512


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 21 are:	

********************

 eval_/loss = 6.323989391326904
 eval_/next-item/ndcg_at_10 = 0.20114947855472565
 eval_/next-item/ndcg_at_20 = 0.2282973825931549
 eval_/next-item/recall_at_10 = 0.3543911576271057
 eval_/next-item/recall_at_20 = 0.4616110026836395
 eval_runtime = 3.0016
 eval_samples_per_second = 4946.749
 eval_steps_per_second = 9.662
********************
Launch training for day 21 are:
********************



***** Running training *****
  Num examples = 120960
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6300


Step,Training Loss
200,6.0246
400,5.9983
600,6.1363
800,5.954
1000,5.9334
1200,5.9765
1400,5.9019
1600,5.8116
1800,5.8485
2000,5.8842


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 22 are:	

********************

 eval_/loss = 6.403288841247559
 eval_/next-item/ndcg_at_10 = 0.19457392394542694
 eval_/next-item/ndcg_at_20 = 0.2218189686536789
 eval_/next-item/recall_at_10 = 0.3459051847457886
 eval_/next-item/recall_at_20 = 0.45346173644065857
 eval_runtime = 2.8869
 eval_samples_per_second = 5143.32
 eval_steps_per_second = 10.046
********************
Launch training for day 22 are:
********************



***** Running training *****
  Num examples = 122496
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6380


Step,Training Loss
200,5.9898
400,6.0135
600,6.1483
800,6.0245
1000,5.8794
1200,5.9635
1400,5.961
1600,5.8186
1800,5.8456
2000,5.972


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 23 are:	

********************

 eval_/loss = 6.3953142166137695
 eval_/next-item/ndcg_at_10 = 0.19765008985996246
 eval_/next-item/ndcg_at_20 = 0.22561299800872803
 eval_/next-item/recall_at_10 = 0.34799298644065857
 eval_/next-item/recall_at_20 = 0.4587823152542114
 eval_runtime = 2.8791
 eval_samples_per_second = 5157.115
 eval_steps_per_second = 10.072
********************
Launch training for day 23 are:
********************



***** Running training *****
  Num examples = 120384
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6270


Step,Training Loss
200,6.0325
400,5.9828
600,6.17
800,5.9439
1000,5.8938
1200,5.972
1400,5.9341
1600,5.7876
1800,5.8566
2000,5.8735


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 24 are:	

********************

 eval_/loss = 6.3553290367126465
 eval_/next-item/ndcg_at_10 = 0.1950632780790329
 eval_/next-item/ndcg_at_20 = 0.2233458161354065
 eval_/next-item/recall_at_10 = 0.3485243022441864
 eval_/next-item/recall_at_20 = 0.4602864682674408
 eval_runtime = 2.814
 eval_samples_per_second = 4912.64
 eval_steps_per_second = 9.595
********************
Launch training for day 24 are:
********************



***** Running training *****
  Num examples = 113280
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5900


Step,Training Loss
200,6.034
400,5.9697
600,6.0705
800,5.898
1000,5.8818
1200,5.9407
1400,5.8008
1600,5.8184
1800,5.864
2000,5.7179


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 25 are:	

********************

 eval_/loss = 6.432180404663086
 eval_/next-item/ndcg_at_10 = 0.1959989368915558
 eval_/next-item/ndcg_at_20 = 0.2229960560798645
 eval_/next-item/recall_at_10 = 0.35045576095581055
 eval_/next-item/recall_at_20 = 0.45722657442092896
 eval_runtime = 3.3066
 eval_samples_per_second = 4645.296
 eval_steps_per_second = 9.073
********************
Launch training for day 25 are:
********************



***** Running training *****
  Num examples = 125376
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6530


Step,Training Loss
200,6.0124
400,6.0021
600,6.0578
800,6.0122
1000,5.8533
1200,5.9138
1400,5.9667
1600,5.785
1800,5.7966
2000,5.9084


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 26 are:	

********************

 eval_/loss = 6.255114555358887
 eval_/next-item/ndcg_at_10 = 0.20640508830547333
 eval_/next-item/ndcg_at_20 = 0.23476579785346985
 eval_/next-item/recall_at_10 = 0.362847238779068
 eval_/next-item/recall_at_20 = 0.4750434160232544
 eval_runtime = 2.7208
 eval_samples_per_second = 5080.874
 eval_steps_per_second = 9.924
********************
Launch training for day 26 are:
********************



***** Running training *****
  Num examples = 114432
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5960


Step,Training Loss
200,5.9451
400,5.891
600,6.0239
800,5.8268
1000,5.7951
1200,5.926
1400,5.7481
1600,5.7237
1800,5.8169
2000,5.6787


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 27 are:	

********************

 eval_/loss = 6.268741130828857
 eval_/next-item/ndcg_at_10 = 0.20229215919971466
 eval_/next-item/ndcg_at_20 = 0.23078404366970062
 eval_/next-item/recall_at_10 = 0.3561197817325592
 eval_/next-item/recall_at_20 = 0.46867766976356506
 eval_runtime = 2.8085
 eval_samples_per_second = 4922.263
 eval_steps_per_second = 9.614
********************
Launch training for day 27 are:
********************



***** Running training *****
  Num examples = 115776
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 6030


Step,Training Loss
200,5.8407
400,5.8513
600,6.0185
800,5.7761
1000,5.7693
1200,5.8747
1400,5.7042
1600,5.6734
1800,5.8041
2000,5.6771


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 28 are:	

********************

 eval_/loss = 6.265506267547607
 eval_/next-item/ndcg_at_10 = 0.2051866352558136
 eval_/next-item/ndcg_at_20 = 0.2329881489276886
 eval_/next-item/recall_at_10 = 0.362229585647583
 eval_/next-item/recall_at_20 = 0.4721304178237915
 eval_runtime = 2.6585
 eval_samples_per_second = 5007.309
 eval_steps_per_second = 9.78
********************
Launch training for day 28 are:
********************



***** Running training *****
  Num examples = 110016
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5730


Step,Training Loss
200,5.9384
400,5.8975
600,5.9739
800,5.8244
1000,5.8107
1200,5.8825
1400,5.7459
1600,5.7263
1800,5.8022
2000,5.6498


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 29 are:	

********************

 eval_/loss = 6.325847148895264
 eval_/next-item/ndcg_at_10 = 0.20469743013381958
 eval_/next-item/ndcg_at_20 = 0.23402173817157745
 eval_/next-item/recall_at_10 = 0.355781227350235
 eval_/next-item/recall_at_20 = 0.47164061665534973
 eval_runtime = 2.528
 eval_samples_per_second = 5063.193
 eval_steps_per_second = 9.889
********************
Launch training for day 29 are:
********************



***** Running training *****
  Num examples = 107328
  Num Epochs = 10
  Instantaneous batch size per device = 192
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 1
  Total optimization steps = 5590


Step,Training Loss
200,5.9567
400,5.9551
600,6.0131
800,5.8926
1000,5.7859
1200,5.8745
1400,5.7554
1600,5.7599
1800,5.7998
2000,5.6777


Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-1500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2000
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-2500
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
Saving model checkpoint to /content/drive/MyDrive/dataset_rees46/bert_with_category_id/checkpoint-3000
Trainer.model is not a `PreTraine

********************
Eval results for day 30 are:	

********************

 eval_/loss = 6.261165618896484
 eval_/next-item/ndcg_at_10 = 0.20263312757015228
 eval_/next-item/ndcg_at_20 = 0.2305440604686737
 eval_/next-item/recall_at_10 = 0.35945311188697815
 eval_/next-item/recall_at_20 = 0.4696093499660492
 eval_runtime = 2.4938
 eval_samples_per_second = 5132.777
 eval_steps_per_second = 10.025
CPU times: user 2h 45min 51s, sys: 2min 44s, total: 2h 48min 35s
Wall time: 3h 1min 15s


In [None]:
with open("/content/drive/MyDrive/dataset_rees46/results.txt", 'a') as f: 
    f.write('\n')
    f.write('Bert with category and brand accuracy results:')
    f.write('\n')
    for key, value in  model.compute_metrics().items(): 
        f.write('%s: %s' % (key, value.item()))

In [None]:
print("Results:")
for key, value in  model.compute_metrics().items(): 
  print('%s: %s ' % (key, value.item()))

Results:
next-item/ndcg_at_10: 0.20263312757015228 
next-item/ndcg_at_20: 0.2305440604686737 
next-item/recall_at_10: 0.35945311188697815 
next-item/recall_at_20: 0.4696093499660492 
