# Multiome with Torch
This notebooks is to help competitors that would like to apply Deep Neural Networks to the MSCI data.
It is focused on the more challenging Multiome part of the data (but it is trivial to adapt it to to the CITEseq data)

The main challenge here is that the Multiome data is very large while Kaggle  GPU machines only have 13GB RAM + 16GB GPU Memory.

I found it is actually possible to store all of the dataset in GPU memory using sparse tensor formats. This uses ~12GB on the GPU, leaving only ~4GB for the model parameters and the forward/backward computation. Given that we have only ~100K training examples, I do not expect we will need very large models, so I feel 4GB is actually enough.

If 4GB is not enough, the other option is to leave the dataset in RAM and load the batches on demand to the GPU (which is what is more classically done). In that case, however, we have only ~1GB RAM left, and will suffer a small performance penalty from having to load the batches to the GPU. But we will have the whole 16GB available for training a complex model. Yet another option is to apply dimensionality reduction to the data beforehand (e.g. with PCA/TruncatedSVD), although I like more the idea of using the raw data and letting the network do its own dimensionality reduction.

The competition data is pre-encoded as sparse matrices in [this dataset](https://www.kaggle.com/datasets/fabiencrom/multimodal-single-cell-as-sparse-matrix) generated by [this notebook](https://www.kaggle.com/code/fabiencrom/multimodal-single-cell-creating-sparse-data/).

The model used here is just a very simple MLP. In the current version, I add a `Softplus` activation at the end, considering the values we have to predict are all positives (although I am not sure it will really work better that way).

In the current version, I also directly optimize the competition metric (row-wise Pearson correlation). Although it does not seem to perform much better than using a simpler Mean Square Error Loss.

This notebook will train 5 models over 5 folds. The final submission is created in [this notebook](https://www.kaggle.com/fabiencrom/msci-multiome-torch-quickstart-submission).

So far I did not get results better than the one obtained by the much simpler PCA+Ridge Regression method (that you can find in [this notebook](https://www.kaggle.com/code/ambrosm/msci-multiome-quickstart) as initially proposed by AmbrosM or in [this notebook](https://www.kaggle.com/code/fabiencrom/msci-multiome-quickstart-w-sparse-matrices) for a version using sparse matrices for better results). But I expect it will perform better after working on the architecture/hyperparameters. In any case, I think a deep learning model will be a part of any winning submission.

In [1]:
import os
import copy
import gc
import math
import itertools
import pickle
import glob
import joblib
import json
import random
import re
import operator

import collections
from collections import defaultdict
from operator import itemgetter, attrgetter

from tqdm.notebook import tqdm

import torch
import torch.nn as nn

import numpy as np
import pandas as pd
import plotly.express as px

import scipy

import sklearn
import sklearn.cluster
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
import sklearn.preprocessing

import copy


# Score and loss functions
We can use either a classic Mean Square Error loss (nn.MSELoss) or use a loss that will optimize directly the competition metric.

In [2]:
def partial_correlation_score_torch_faster(y_true, y_pred):
    """Compute the correlation between each rows of the y_true and y_pred tensors.
    Compatible with backpropagation.
    """
    y_true_centered = y_true - torch.mean(y_true, dim=1)[:,None]
    y_pred_centered = y_pred - torch.mean(y_pred, dim=1)[:,None]
    cov_tp = torch.sum(y_true_centered*y_pred_centered, dim=1)/(y_true.shape[1]-1)
    var_t = torch.sum(y_true_centered**2, dim=1)/(y_true.shape[1]-1)
    var_p = torch.sum(y_pred_centered**2, dim=1)/(y_true.shape[1]-1)
    return cov_tp/torch.sqrt(var_t*var_p)

def correl_loss(pred, tgt):
    """Loss for directly optimizing the correlation.
    """
    return -torch.mean(partial_correlation_score_torch_faster(tgt, pred))

# Config
We put the configuration dict at the beginning of the notebook, so that it is easier to find and modify

In [3]:
config = dict(
    layers = [128, 128, 128],
    patience = 4,
    max_epochs = 20,
    criterion = correl_loss, #nn.MSELoss(),
    
    n_folds = 5,
    folds_to_train = [0, 1, 2, 3, 4],
    kfold_random_state = 42,
    
    optimizerparams = dict(
     lr=1e-3, 
     weight_decay=1e-2
    ),
    
    head="softplus"
    
)

INPUT_SIZE = 228942
OUTPUT_SIZE = 23418


# Utility functions for loading and batching the sparse data in device memory
There are a few challenges here:
- If we directly try to create a torch sparse tensor before moving it to memory, we will get an OOM error
- Torch CSR tensors cannot be moved to the gpu; so we make our own TorchCSR class that will contain the csr format information
- torch gpu operations are only compatible with COO tensors (not CSR), so we need some functions to create batches of COO tensors from the TorchCSR objects

In [4]:
# Strangely, current torch implementation of csr tensor do not accept to be moved to the gpu. 
# So we make our own equivalent class
TorchCSR = collections.namedtuple("TrochCSR", "data indices indptr shape")

def load_csr_data_to_gpu(train_inputs):
    """Move a scipy csr sparse matrix to the gpu as a TorchCSR object
    This try to manage memory efficiently by creating the tensors and moving them to the gpu one by one
    """
    th_data = torch.from_numpy(train_inputs.data).to(device)
    th_indices = torch.from_numpy(train_inputs.indices).to(device)
    th_indptr = torch.from_numpy(train_inputs.indptr).to(device)
    th_shape = train_inputs.shape
    return TorchCSR(th_data, th_indices, th_indptr, th_shape)

def make_coo_batch(torch_csr, indx):
    """Make a coo torch tensor from a TorchCSR object by taking the rows indicated by the indx tensor
    """
    th_data, th_indices, th_indptr, th_shape = torch_csr
    start_pts = th_indptr[indx]
    end_pts = th_indptr[indx+1]
    coo_data = torch.cat([th_data[start_pts[i]: end_pts[i]] for i in range(len(start_pts))], dim=0)
    coo_col = torch.cat([th_indices[start_pts[i]: end_pts[i]] for i in range(len(start_pts))], dim=0)
    coo_row = torch.repeat_interleave(torch.arange(indx.shape[0], device=device), th_indptr[indx+1] - th_indptr[indx])
    coo_batch = torch.sparse_coo_tensor(torch.vstack([coo_row, coo_col]), coo_data, [indx.shape[0], th_shape[1]])
    return coo_batch


def make_coo_batch_slice(torch_csr, start, end):
    """Make a coo torch tensor from a TorchCSR object by taking the rows within the (start, end) slice
    """
    th_data, th_indices, th_indptr, th_shape = torch_csr
    start_pts = th_indptr[start]
    end_pts = th_indptr[end]
    coo_data = th_data[start_pts: end_pts]
    coo_col = th_indices[start_pts: end_pts]
    coo_row = torch.repeat_interleave(torch.arange(end-start, device=device), th_indptr[start+1:end+1] - th_indptr[start:end])
    coo_batch = torch.sparse_coo_tensor(torch.vstack([coo_row, coo_col]), coo_data, [end-start, th_shape[1]])
    return coo_batch


# GPU memory DataLoader
We create a dataloader that will work with the in-device TorchCSR tensor.
This should ensure the fastest training speed.

In [5]:
class DataLoaderCOO:
    """Torch compatible DataLoader. Works with in-device TorchCSR tensors.
    Args:
         - train_inputs, train_targets: TorchCSR tensors
         - train_idx: tensor containing the indices of the rows of train_inputs and train_targets that should be used
         - batch_size, shuffle, drop_last: as in torch.utils.data.DataLoader
    """
    def __init__(self, train_inputs, train_targets, train_idx=None, 
                 *,
                batch_size=512, shuffle=False, drop_last=False):
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.drop_last = drop_last
        
        self.train_inputs = train_inputs
        self.train_targets = train_targets
        
        self.train_idx = train_idx
        
        self.nb_examples = len(self.train_idx) if self.train_idx is not None else len(train_inputs)
        
        self.nb_batches = self.nb_examples//batch_size
        if not drop_last and not self.nb_examples%batch_size==0:
            self.nb_batches +=1
        
    def __iter__(self):
        if self.shuffle:
            shuffled_idx = torch.randperm(self.nb_examples, device=device)
            if self.train_idx is not None:
                idx_array = self.train_idx[shuffled_idx]
            else:
                idx_array = shuffled_idx
        else:
            if self.train_idx is not None:
                idx_array = self.train_idx
            else:
                idx_array = None
            
        for i in range(self.nb_batches):
            slc = slice(i*self.batch_size, (i+1)*self.batch_size)
            if idx_array is None:
                inp_batch = make_coo_batch_slice(self.train_inputs, i*self.batch_size, (i+1)*self.batch_size)
                tgt_batch = make_coo_batch_slice(self.train_targets, i*self.batch_size, (i+1)*self.batch_size)
            else:
                idx_batch = idx_array[slc]
                inp_batch = make_coo_batch(self.train_inputs, idx_batch)
                tgt_batch = make_coo_batch(self.train_targets, idx_batch)
            yield inp_batch, tgt_batch
            
            
    def __len__(self):
        return self.nb_batches

# Simple Model: MLP

In [6]:
class MLP(nn.Module):
    def __init__(self, layer_size_lst, add_final_activation=False):
        super().__init__()
        
        assert len(layer_size_lst) > 2
        
        layer_lst = []
        for i in range(len(layer_size_lst)-1):
            sz1 = layer_size_lst[i]
            sz2 = layer_size_lst[i+1]
            layer_lst += [nn.Linear(sz1, sz2)]
            if i != len(layer_size_lst)-2 or add_final_activation:
                 layer_lst += [nn.ReLU()]
        self.mlp = nn.Sequential(*layer_lst)
        
    def forward(self, x):
        return self.mlp(x)
    
def build_model():
    model = MLP([INPUT_SIZE] + config["layers"] + [OUTPUT_SIZE])
    if config["head"] == "softplus":
        model = nn.Sequential(model, nn.Softplus())
    else:
        assert config["head"] is None
    return model

# Training functions

In [7]:
def train_fn(model, optimizer, criterion, dl_train):

    loss_list = []
    model.train()
    for inpt, tgt in tqdm(dl_train):
        mb_size = inpt.shape[0]
        tgt = tgt.to_dense()

        optimizer.zero_grad()
        pred = model(inpt)

        loss = criterion(pred, tgt)
        loss_list.append(loss.detach())
        loss.backward()
        optimizer.step()
    avg_loss = sum(loss_list).cpu().item()/len(loss_list)
    
    return {"loss":avg_loss}


In [8]:
def valid_fn(model, criterion, dl_valid):
    loss_list = []
    all_preds = []
    all_tgts = []
    partial_correlation_scores = []
    model.eval()
    for inpt, tgt in tqdm(dl_valid):
        mb_size = inpt.shape[0]
        tgt = tgt.to_dense()
        with torch.no_grad():
            pred = model(inpt)
        loss = criterion(pred, tgt)
        loss_list.append(loss.detach())
        
        partial_correlation_scores.append(partial_correlation_score_torch_faster(tgt, pred))

    avg_loss = sum(loss_list).cpu().item()/len(loss_list)
    
    partial_correlation_scores = torch.cat(partial_correlation_scores)

    score = torch.sum(partial_correlation_scores).cpu().item()/len(partial_correlation_scores) #correlation_score_torch(all_tgts, all_preds)
    
    return {"loss":avg_loss, "score":score}


In [9]:
def train_model(model, optimizer, dl_train, dl_valid, save_prefix):

    criterion = config["criterion"]
    
    save_params_filename = save_prefix+"_best_params.pth"
    save_config_filename = save_prefix+"_config.pkl"
    best_score = None

    for epoch in range(config["max_epochs"]):
        log_train = train_fn(model, optimizer, criterion, dl_train)
        log_valid = valid_fn(model, criterion, dl_valid)

        print(log_train)
        print(log_valid)
        
        score = log_valid["score"]
        if best_score is None or score > best_score:
            best_score = score
            patience = config["patience"]
            best_params = copy.deepcopy(model.state_dict())
        else:
            patience -= 1
        
        if patience < 0:
            print("out of patience")
            break


    torch.save(best_params, save_params_filename)
    pickle.dump(config,open(save_config_filename, "wb"))
    


In [10]:
def train_one_fold(num_fold):
    
    train_idx, valid_idx = FOLDS_LIST[num_fold]
    
    train_idx = torch.from_numpy(train_idx).to(device)
    valid_idx = torch.from_numpy(valid_idx).to(device)
    
    
    dl_train = DataLoaderCOO(train_inputs, train_targets, train_idx=train_idx,
                batch_size=512, shuffle=True, drop_last=True)
    dl_valid = DataLoaderCOO(train_inputs, train_targets, train_idx=valid_idx,
                batch_size=512, shuffle=False, drop_last=False)
    
    model =  build_model()
    model.to(device)
    
    optimizer = torch.optim.AdamW(model.parameters(), **config["optimizerparams"])
    
    train_model(model, optimizer, dl_train, dl_valid, save_prefix="f%i"%num_fold)

# Load Data

In [11]:
if torch.cuda.is_available():
    device = torch.device("cuda:0")
    print(f"machine has {torch.cuda.device_count()} cuda devices")
    print(f"model of first cuda device is {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")

machine has 1 cuda devices
model of first cuda device is Tesla P100-PCIE-16GB


In [12]:
%%time
train_inputs = scipy.sparse.load_npz(
    "../input/multimodal-single-cell-as-sparse-matrix/train_multi_inputs_values.sparse.npz")

CPU times: user 36.3 s, sys: 4.61 s, total: 40.9 s
Wall time: 1min 2s


We will normalize the input by dividing each column by its max value. This is the simplest reasonable option. Centering the data (i.e. substracting the mean, would destroy the sparsity here)

In [13]:
max_inputs = train_inputs.max(axis=0)
max_inputs = max_inputs.todense()+1e-10
np.savez("max_inputs.npz", max_inputs = max_inputs)
max_inputs = torch.from_numpy(max_inputs)[0].to(device)

In [14]:
%%time
train_inputs = load_csr_data_to_gpu(train_inputs)
gc.collect()

CPU times: user 1.17 s, sys: 25 ms, total: 1.19 s
Wall time: 1.19 s


51

In [15]:
train_inputs.data[...] /= max_inputs[train_inputs.indices.long()]

In [16]:
torch.max(train_inputs.data)

tensor(1., device='cuda:0')

In [17]:
%%time
train_targets = scipy.sparse.load_npz(
    "../input/multimodal-single-cell-as-sparse-matrix/train_multi_targets_values.sparse.npz")

CPU times: user 16.3 s, sys: 1.82 s, total: 18.1 s
Wall time: 26.5 s


In [18]:
%%time
train_targets = load_csr_data_to_gpu(train_targets)
gc.collect()

CPU times: user 893 ms, sys: 19 ms, total: 912 ms
Wall time: 928 ms


124

In [19]:
assert INPUT_SIZE == train_inputs.shape[1]
assert OUTPUT_SIZE == train_targets.shape[1]

NB_EXAMPLES = train_inputs.shape[0]
assert NB_EXAMPLES == train_targets.shape[0]

print(INPUT_SIZE, OUTPUT_SIZE, NB_EXAMPLES)

228942 23418 105942


# Training
We use a rather naive kfold split here, which might not be optimal for this competition.

In [20]:
kfold = KFold(n_splits=config["n_folds"], shuffle=True, random_state=config["kfold_random_state"])
FOLDS_LIST = list(kfold.split(range(train_inputs.shape[0])))

In [21]:
for num_fold in config["folds_to_train"]:
    train_one_fold(num_fold)

  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6129105539032907}
{'loss': -0.6489819117954799, 'score': 0.6498714877117844}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6536519368489583}
{'loss': -0.6547461918422154, 'score': 0.6555302408531549}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.660298896558357}
{'loss': -0.6591293698265439, 'score': 0.6598735450882534}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6649163910836885}
{'loss': -0.6603275480724516, 'score': 0.6610726674217754}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6673434170809659}
{'loss': -0.6611809503464472, 'score': 0.6619001346512341}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6694148208155777}
{'loss': -0.6614861261276972, 'score': 0.6621936702858795}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6707318392666903}
{'loss': -0.661566416422526, 'score': 0.6622812839209024}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6717645818536931}
{'loss': -0.6616451626732236, 'score': 0.6623374193272452}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6726319284150095}
{'loss': -0.6616478874569848, 'score': 0.6623476509037708}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6734213164358428}
{'loss': -0.6616027922857375, 'score': 0.6622986130775402}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6741212324662642}
{'loss': -0.6616348539079938, 'score': 0.6623369584454198}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6747194694750237}
{'loss': -0.661683127993629, 'score': 0.6623863649771108}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6751858335552794}
{'loss': -0.6615403039114816, 'score': 0.6622557971559535}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6755734992749763}
{'loss': -0.661449704851423, 'score': 0.6621488725724433}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6759358261570786}
{'loss': -0.6614339011056083, 'score': 0.6621392862304734}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6762503884055397}
{'loss': -0.6612957545689174, 'score': 0.6620006068891878}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6764797788677793}
{'loss': -0.6612415313720703, 'score': 0.661950462946576}
out of patience


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.613655830152107}
{'loss': -0.6495511645362491, 'score': 0.6504108116239559}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.656368926077178}
{'loss': -0.6580864588419596, 'score': 0.6588662417704941}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6632736668442235}
{'loss': -0.659798712957473, 'score': 0.6605637617100855}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6661329789595171}
{'loss': -0.6601899010794503, 'score': 0.6609444500979281}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6681926380504262}
{'loss': -0.6608256385439918, 'score': 0.6615489427002218}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6699658480557529}
{'loss': -0.6611936660039992, 'score': 0.6619040060585681}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6713263309363163}
{'loss': -0.661261104402088, 'score': 0.6619741061842229}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6723460341944839}
{'loss': -0.6612374896094912, 'score': 0.6619495872711076}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.673116279370857}
{'loss': -0.661149297441755, 'score': 0.661858378757846}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6738379276160038}
{'loss': -0.6611997513543992, 'score': 0.6619189386297135}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.674426963112571}
{'loss': -0.661239260718936, 'score': 0.6619482046256312}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6749268040512547}
{'loss': -0.6610621043613979, 'score': 0.66177251647376}
out of patience


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6123872699159565}
{'loss': -0.6462088993617466, 'score': 0.6472303013498206}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6526623812588779}
{'loss': -0.6536909285045805, 'score': 0.6545525382144374}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6589557532108191}
{'loss': -0.6582692010062081, 'score': 0.659067549659595}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6633928241151752}
{'loss': -0.6597505296979632, 'score': 0.6605230370669719}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.665884491891572}
{'loss': -0.6605092911493211, 'score': 0.6612761535126015}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6675896846886837}
{'loss': -0.6611030669439406, 'score': 0.6618452311597366}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6692001805160985}
{'loss': -0.6615954353695824, 'score': 0.6623249856935529}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6704814101710465}
{'loss': -0.6616827646891276, 'score': 0.6624050907353218}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6714512680516098}
{'loss': -0.6618937991914295, 'score': 0.6626144331402208}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6723177360765862}
{'loss': -0.6619056974138532, 'score': 0.6626251261032188}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.673043222138376}
{'loss': -0.6620002019973028, 'score': 0.6627166154633519}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6736892700195313}
{'loss': -0.6619572866530645, 'score': 0.662667621413064}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6741780598958333}
{'loss': -0.6619542893909273, 'score': 0.6626671605094865}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6746042424982245}
{'loss': -0.6618611017862955, 'score': 0.6625745188904096}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6749932953805634}
{'loss': -0.6618491127377465, 'score': 0.6625597699759298}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6753338438091856}
{'loss': -0.6617328552972703, 'score': 0.6624479547680291}
out of patience


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6119168830640388}
{'loss': -0.6468963168916249, 'score': 0.6477566532353218}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.654249618992661}
{'loss': -0.6563812891642252, 'score': 0.657123089646852}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6613227613044508}
{'loss': -0.6592060270763579, 'score': 0.6599394870475033}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6650750362511837}
{'loss': -0.6604332696823847, 'score': 0.6611518939081084}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6672714695785985}
{'loss': -0.6610299519130162, 'score': 0.6617287608257032}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6690683538263494}
{'loss': -0.6614584241594587, 'score': 0.6621356004135596}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6705821644176136}
{'loss': -0.6620002928234282, 'score': 0.6626859192850906}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6716831554066051}
{'loss': -0.6620349429902577, 'score': 0.6627111307107797}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6725642811168324}
{'loss': -0.6620342617943173, 'score': 0.6627066138557203}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6732917554450758}
{'loss': -0.6621452967325846, 'score': 0.662819903955069}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6739260124437737}
{'loss': -0.6620768592471168, 'score': 0.6627542251952756}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6744427767666903}
{'loss': -0.6621070589338031, 'score': 0.6627811880545592}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.674854024251302}
{'loss': -0.6620744977678571, 'score': 0.6627546400084954}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6751640088630445}
{'loss': -0.6619254520961216, 'score': 0.6625990850510902}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6754620176373106}
{'loss': -0.6619140080043248, 'score': 0.662594153382811}
out of patience


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6093772194602273}
{'loss': -0.6420968373616537, 'score': 0.6431844436561969}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6513695456764915}
{'loss': -0.6555653526669457, 'score': 0.6563657789786672}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6606522993607955}
{'loss': -0.6592183340163458, 'score': 0.6600114801863083}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6650322191642992}
{'loss': -0.6602554321289062, 'score': 0.6610248227917925}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.667160820238518}
{'loss': -0.6607772736322313, 'score': 0.6615402051721493}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6689512772993608}
{'loss': -0.661385627019973, 'score': 0.6621310374681424}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6703780665542141}
{'loss': -0.6617210479009719, 'score': 0.6624572650202946}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6713986021099668}
{'loss': -0.6618982042585101, 'score': 0.6626496000831839}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6722218368992661}
{'loss': -0.6619749069213867, 'score': 0.6627140344033179}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.672955322265625}
{'loss': -0.6621944790794736, 'score': 0.6629343002230036}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6736216227213542}
{'loss': -0.6622289930071149, 'score': 0.6629726934910091}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6741241455078125}
{'loss': -0.6620492481050038, 'score': 0.6627860736324807}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6745347456498579}
{'loss': -0.6622138704572406, 'score': 0.6629566540465122}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6748476895419034}
{'loss': -0.6621601922171456, 'score': 0.6628959991357136}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6751399647105824}
{'loss': -0.6620621000017438, 'score': 0.662806814293468}


  0%|          | 0/165 [00:00<?, ?it/s]

  0%|          | 0/42 [00:00<?, ?it/s]

{'loss': -0.6753891915986032}
{'loss': -0.662010011218843, 'score': 0.6627535799302672}
out of patience
