# El ensamble de Beto

## ¿Por qué Beto?

Elegí a BETO (BERT entrenado con corpus en castellano, ver notebook de Language Model para más detalle) porque es un modelo que alcanza el estado del arte en diferentes tareas de procesamiento del lenguaje natural. 

En un principio probé hacer algunos experimentos con FastText, ya que es mucho más rápido de entrenar. Pero los resultados no fueron satisfactorios. Por eso decidí volcarme de lleno a Beto.

## Métodos

### Preprocesamiento

Una de las ventajas de BETO es que no es necesario hacer prácticamente ningún preprocesamiento sobre los datos de entrenamiento. Simplemente alimentando el modelo con lo datos crudos y tokenizando las oraciones es suficiente. Esta es una gran ventaja, incluso hubiera sido más beneficioso aún haber recibido los datos crudos y no con un preprocesamiento como fueron entregados ya que esto puede quitarle poder predictivo.

Realicé pruebas haciendo data augmentation sobre el conjunto de entrenamiento, pero no afectaron al score resultante. El método que utilicé fue *completar* la palabra faltante en una oración utilizando el language modeling preentrenado. Con esto generé oraciones extra completando palabra seleccionadas al azar de diferentes oraciones.  

### Finetuning Language Model

Antes de aplicar un clasificador sobre BETO realicé un finetuning no supervisado (ver notebook de **Language Model**)

### Búsqueda de hiperparámetros

Para entender el comportamiento de los diferentes parámetros que afectan a BETO, como learning rate, epochs, scheduler, etc, decidí usar MLFlow y loguear las corrida con diferentes parámetros. 
Basándome en la bibliografía seleccioné una lista de hiperparámetros y lancé múltiples experimentos cruzándolos.

Cada cruza de parámetros fue logueada a MLFlow donde pude observar cuales fueron los que mejor comportamiento tuvieron y usarlos para el entrenamiento final. 

En el siguiente gráfico se pueden ver las curvas de Balanced Accuracy en el conjunto de validación de los mejores 6 modelos obtenidos en la búsqueda de parámetros:

![Mejores modelos en Balanced Accuracy](mejores_6.png)

### Ensamble

Una vez entrenados múltiples modelos con los mejores hiperparámetros conseguidos con MLFlow utilicé un **Voting Hard** sobre las predicciones. Probé diferentes combinaciones de modelos llegando al mejor resultado con 6 BETOs ensamblados.

También realicé pruebas haciendo un **voting soft** pero el resultado no mejoró.


## Resultado y conclusiones

Lo que marcó la diferencia en el puntaje fue haber hecho el ensamble de modelos. Al usar únicamente un modelo con los mejores hiperparámetros el score que había conseguido era de alrededor de 0.85. Esto demuestra el gran poder que tiene el ensamblado. 

Me hubiera gustado poder hacer un stacking en vez de un voting hard, pero para esto se requiere mucho más tiempo de entrenamiento y cómputo que se hace privativo con un modelo tan pesado como BERT. 

Esta solución aporta muchísimo en la automatización de preguntas (o cualquier tipo de texto) ahorrando recursos y tiempo. Colocar un modelo como este en producción y a través de una API hacerle consultas no es complicado y las consultas se podrían hacer online y en tiempo real ahorrandole el tiempo a los trabajadores de una trabajo tedioso como es la clasificación de preguntas de forma manual.





# Código

A continuación se encuentra el código utilizado para el entrenamiento de BETO. 

In [1]:
import torch
import time
import datetime
import mlflow
from transformers import get_linear_schedule_with_warmup, get_constant_schedule, get_cosine_schedule_with_warmup
from transformers import BertForSequenceClassification, AdamW, BertConfig
import itertools
from torch.utils.data import TensorDataset, random_split
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
from sklearn.metrics import balanced_accuracy_score
import pandas as pd
from transformers import *

import random
import numpy as np

seed_val = 42

random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)
device = torch.device("cuda")

In [2]:
# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

def format_time(elapsed):
    '''
    Takes a time in seconds and returns a string hh:mm:ss
    '''
    # Round to the nearest second.
    elapsed_rounded = int(round((elapsed)))
    
    # Format as hh:mm:ss
    return str(datetime.timedelta(seconds=elapsed_rounded))


In [18]:
dataset = 'dataset/split_train.csv'

In [19]:
train = pd.read_csv(dataset).dropna()#pd.read_csv('dataset/split_train.csv').dropna()
val = pd.read_csv('dataset/split_val.csv').dropna()
test = pd.read_csv('dataset/test_santander.csv')

In [20]:
train.shape

(18092, 3)

In [21]:
from sklearn.preprocessing import LabelEncoder


train['Intencion'] = train.Intencion.apply(lambda x: int(x.replace('Cat_','')))
val['Intencion'] = val.Intencion.apply(lambda x: int(x.replace('Cat_','')))

le = LabelEncoder()

le.fit(train.Intencion.unique())

train['Label'] = train.Intencion.apply(lambda x: le.transform([x])[0])
val['Label'] = val.Intencion.apply(lambda x: le.transform([x])[0])

sentences_train = train.Pregunta.values
labels_train = train.Label.values

sentences_val = val.Pregunta.values
labels_val = val.Label.values

test_sentences = test.Pregunta.values

In [22]:
def train_step(model, optimizer, scheduler, train_dataloader, mlflow_log=True):
    # ========================================
    #               Training
    # ========================================

    # Perform one full pass over the training set.

    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
    print('Training...')

    t0 = time.time()
    total_train_loss = 0

    # Put the model into training mode. 
    model.train()

    # For each batch of training data...
    for step, batch in enumerate(train_dataloader):
        # Progress update every 10 batches.
        if step % 100 == 0 and not step == 0:
            elapsed = format_time(time.time() - t0)
            print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))

        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)
        
        model.zero_grad()        

        # Perform a forward pass (evaluate the model on this training batch).
        loss, logits = model(b_input_ids, 
                             token_type_ids=None, 
                             attention_mask=b_input_mask, 
                             labels=b_labels)

        # Accumulate the training loss over all of the batches
        total_train_loss += loss.item()

        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        # Update optimizer and scheduler
        optimizer.step()
        scheduler.step()

    avg_train_loss = total_train_loss / len(train_dataloader)            

    # Measure how long this epoch took.
    training_time = format_time(time.time() - t0)
    
    if mlflow_log:
        mlflow.log_metric('Train Loss', avg_train_loss)

    print("")
    print("  Average training loss: {0:.2f}".format(avg_train_loss))
    print("  Training epcoh took: {:}".format(training_time))

def val_step(model, validation_dataloader, mlflow_log=True):
    print("")
    print("Running Validation...")

    t0 = time.time()

    # Put the model in evaluation mode
    model.eval()

    y_pred = []
    y_true = []

    total_eval_accuracy = 0
    total_eval_loss = 0
    nb_eval_steps = 0

    # Evaluate data for one epoch
    for batch in validation_dataloader:

        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)

        with torch.no_grad():        
            (loss, logits) = model(b_input_ids, 
                                   token_type_ids=None, 
                                   attention_mask=b_input_mask,
                                   labels=b_labels)

        # Accumulate the validation loss.
        total_eval_loss += loss.item()

        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()

        pred_flat = np.argmax(logits, axis=1).flatten()
        labels_flat = label_ids.flatten()
        y_true+=labels_flat.tolist()
        y_pred+=pred_flat.tolist()

        total_eval_accuracy += flat_accuracy(logits, label_ids)    
        
    avg_val_accuracy = total_eval_accuracy / len(validation_dataloader)
    bal_acc = balanced_accuracy_score(y_true, y_pred)
    print("Balanced Accuracy: {0:.2f}".format(bal_acc))
    print("Accuracy: {0:.2f}".format(avg_val_accuracy))

    # Calculate the average loss over all of the batches.
    avg_val_loss = total_eval_loss / len(validation_dataloader)

    if mlflow_log:

        mlflow.log_metric("Balanced Acc", bal_acc)
        mlflow.log_metric("Acc", avg_val_accuracy)


    # Measure how long the validation run took.
    validation_time = format_time(time.time() - t0)
    print("  Validation Loss: {0:.2f}".format(avg_val_loss))
    print("  Validation took: {:}".format(validation_time))




    

In [23]:
def prepare_training(pretrained, sentences_train, labels_train, sentences_val,
                     labels_val, epochs, batch_size,
                     max_len, warm_up, lr, freeze=False):
    
    num_labels = len(np.unique(labels_train))
    input_ids = []
    attention_masks = []
    
    tokenizer = BertTokenizer.from_pretrained(pretrained)

    for sent in sentences_train:
        encoded_dict = tokenizer.encode_plus(
                            sent,                      # Sentence to encode.
                            add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                            max_length = max_len,           # Pad & truncate all sentences.
                            pad_to_max_length = True,
                            return_attention_mask = True,   # Construct attn. masks.
                            return_tensors = 'pt',     # Return pytorch tensors.
                       )
        input_ids.append(encoded_dict['input_ids'])
        attention_masks.append(encoded_dict['attention_mask'])

    # Convert the lists into tensors.
    input_ids = torch.cat(input_ids, dim=0)
    attention_masks = torch.cat(attention_masks, dim=0)
    labels = torch.tensor(labels_train)

    # Print sentence 0, now as a list of IDs.
    print('Original: ', sentences_train[0])
    print('Token IDs:', input_ids[0])

    # Combine the training inputs into a TensorDataset.
    train_dataset = TensorDataset(input_ids, attention_masks, labels)

    # The same for validation:
    
    input_ids = []
    attention_masks = []

    for sent in sentences_val:
        encoded_dict = tokenizer.encode_plus(
                            sent,                      # Sentence to encode.
                            add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                            max_length = max_len,           # Pad & truncate all sentences.
                            pad_to_max_length = True,
                            return_attention_mask = True,   # Construct attn. masks.
                            return_tensors = 'pt',     # Return pytorch tensors.
                       )
        input_ids.append(encoded_dict['input_ids'])
        attention_masks.append(encoded_dict['attention_mask'])

    # Convert the lists into tensors.
    input_ids = torch.cat(input_ids, dim=0)
    attention_masks = torch.cat(attention_masks, dim=0)
    labels = torch.tensor(labels_val)

    # Print sentence 0, now as a list of IDs.
    print('Original val: ', sentences_val[0])
    print('Token IDs: val', input_ids[0])

    # Combine the training inputs into a TensorDataset.
    val_dataset = TensorDataset(input_ids, attention_masks, labels)

    print('{:>5,} training samples'.format(len(sentences_train)))
    print('{:>5,} validation samples'.format(len(sentences_val)))

    # Create the DataLoaders for our training and validation sets.
    # We'll take training samples in random order. 
    train_dataloader = DataLoader(
                train_dataset,  # The training samples.
                sampler = RandomSampler(train_dataset), # Select batches randomly
                batch_size = batch_size # Trains with this batch size.
            )

    # For validation the order doesn't matter, so we'll just read them sequentially.
    validation_dataloader = DataLoader(
                val_dataset, # The validation samples.
                sampler = SequentialSampler(val_dataset), # Pull out batches sequentially.
                batch_size = batch_size # Evaluate with this batch size.
            )

    model = BertForSequenceClassification.from_pretrained(
        pretrained, 
        num_labels = num_labels,  
        output_attentions = False,
        output_hidden_states = False
    )
    model.cuda()

    optimizer = AdamW(model.parameters(),
                      lr = lr
                    )

    total_steps = len(train_dataloader) * epochs
    scheduler = get_cosine_schedule_with_warmup(optimizer, num_warmup_steps=warm_up,
                                                num_training_steps = total_steps)
    
    return model, optimizer, scheduler, train_dataloader, validation_dataloader


    

In [24]:
train.shape

(18092, 4)

# Búsqueda de hiperparámetros con MLFlow

In [None]:
remote_server_uri = "http://localhost:5000" # set to your server URI
mlflow.set_tracking_uri(remote_server_uri)
mlflow.set_experiment("/beto_pretrained_lm_20")

epochs = 18
batch_size = 32
max_len = 48
pretrained = './BETOcased-20-epochs/'
num_labels = len(train.Label.unique())


for lr, warm_up in itertools.product([1.8e-4, 1.7e-4,1.6e-4], [1000, 2000]):
    with mlflow.start_run():
        print(f"training with {lr}, {max_len}, {warm_up}")
        mlflow.log_param('lr', lr)
        mlflow.log_param('max_len', max_len)
        mlflow.log_param('warm_up', warm_up)
        mlflow.log_param('model', pretrained)
        mlflow.log_param('dataset', dataset)
        
        train_objects = prepare_training(pretrained, 
                                         sentences_train,labels_train, 
                                         sentences_val, labels_val,
                                         epochs=epochs, 
                                         batch_size=batch_size,
                                         max_len=max_len,
                                         warm_up=warm_up,
                                         lr=lr, 
                                         freeze=False)
        model = train_objects[0]
        optimizer = train_objects[1]
        scheduler = train_objects[2]
        train_dataloader = train_objects[3]
        validation_dataloader = train_objects[4]
        
        for epoch_i in range(0, epochs):
            train_step(model, optimizer, scheduler, train_dataloader)
            val_step(model,validation_dataloader)

        print("")
        print("Training complete!")


training with 0.00018, 48, 1000
Original:  quiero obtener una supercuenta. se puede hacer a trabves d ela web
Token IDs: tensor([    4,  1937,  4251,  1108,  1843, 11172,  1009,  1062,  1499,  1409,
         1013,  1211, 30946,  2347,  1116,  1040, 30932,  1004,  3953,     5,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1])
Original val:  una consulta sobre debito desde mi cuenta
Token IDs: val tensor([   4, 1108, 6905, 1269, 5452, 1071, 1668, 1153, 1971,    5,    1,    1,
           1,    1,    1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
           1,    1,    1,    1,    1,    1,    1,    1,    1,    1,    1,    1,
           1,    1,    1,    1,    1,    1,    1,    1,    1,    1,    1,    1])
18,092 training samples
2,011 validation samples

Training...
  Batch   100  of    566.    Elapsed: 0:00:22.


  Batch   100  of    566.    Elapsed: 0:00:22.
  Batch   200  of    566.    Elapsed: 0:00:44.
  Batch   300  of    566.    Elapsed: 0:01:06.
  Batch   400  of    566.    Elapsed: 0:01:28.
  Batch   500  of    566.    Elapsed: 0:01:51.

  Average training loss: 0.00
  Training epcoh took: 0:02:05

Running Validation...
Balanced Accuracy: 0.87
Accuracy: 0.90
  Validation Loss: 0.59
  Validation took: 0:00:04

Training...
  Batch   100  of    566.    Elapsed: 0:00:22.
  Batch   200  of    566.    Elapsed: 0:00:44.
  Batch   300  of    566.    Elapsed: 0:01:06.
  Batch   400  of    566.    Elapsed: 0:01:28.
  Batch   500  of    566.    Elapsed: 0:01:50.

  Average training loss: 0.00
  Training epcoh took: 0:02:05

Running Validation...
Balanced Accuracy: 0.87
Accuracy: 0.90
  Validation Loss: 0.59
  Validation took: 0:00:04

Training complete!
training with 0.00018, 48, 2000
Original:  quiero obtener una supercuenta. se puede hacer a trabves d ela web
Token IDs: tensor([    4,  1937,  425

# Final Train

In [None]:
sentences_train

In [74]:
full_sentences = np.array(sentences_train.tolist() + sentences_val.tolist())

In [75]:
labels_val

array([110, 216, 129, ...,  34, 294, 234])

In [76]:
full_labels = np.concatenate((labels_train, labels_val))


In [78]:
len(full_sentences), len(full_labels), len(np.unique(full_labels))

(20103, 20103, 351)

In [81]:
import random
random.seed(1)
c = list(zip(full_sentences, full_labels))

random.shuffle(c)

full_sentences, full_labels = zip(*c)

In [83]:
remote_server_uri = "http://localhost:5000" # set to your server URI
mlflow.set_tracking_uri(remote_server_uri)
mlflow.set_experiment("/beto_pretrained_lm_20")

epochs = 18
batch_size = 32
max_len = 48
pretrained = './BETOcased-20-epochs/'
num_labels = len(np.unique(full_labels))
lr = 0.00018
warm_up = 1000

train_objects = prepare_training(pretrained, 
                                 full_sentences, full_labels,
                                 full_sentences,full_labels, 
                                 epochs=epochs, 
                                 batch_size=batch_size,
                                 max_len=max_len,
                                 warm_up=warm_up,
                                 lr=lr, 
                                 freeze=False)
model = train_objects[0]
optimizer = train_objects[1]
scheduler = train_objects[2]
train_dataloader = train_objects[3]
#validation_dataloader = train_objects[4]

for epoch_i in range(0, epochs):
    train_step(model, optimizer, scheduler, train_dataloader)
    #val_step(model,validation_dataloader)

print("")
print("Training complete!")


Original:  no me deja extraer dolares por cajero lo dice que no se puede realizar la transacción
Token IDs: tensor([    4,  1084,  1129,  5113, 17522, 28224,  1096,  1285,  3821,  1114,
         2429,  1038,  1084,  1062,  1499,  3335,  1030, 17590,     5,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1])
Original val:  no me deja extraer dolares por cajero lo dice que no se puede realizar la transacción
Token IDs: val tensor([    4,  1084,  1129,  5113, 17522, 28224,  1096,  1285,  3821,  1114,
         2429,  1038,  1084,  1062,  1499,  3335,  1030, 17590,     5,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1])
20,103 training sample

In [84]:
import os

In [85]:
model_path = f'models/beto_lm-20_lr-{lr}_wu-{warm_up}_epochs-{epochs}'
model_path

'models/beto_lm-20_lr-0.00018_wu-1000_epochs-18'

In [86]:
os.makedirs(model_path)
model.save_pretrained(model_path)

# Ensamble y testeo

In [30]:
test_sentences

array(['querer saber tarjeta sin limite',
       '¿cuál es el límite de mi tarjeta santander?',
       'hay beneficios en restaurantes de la costa atlántica?', ...,
       'quiero pagar de mi open credit un poquito mas del minimo',
       'nesecito imprimir mi resumen tarjeta de credito va',
       'quiero obtener una visa credito'], dtype=object)

In [31]:
 # Tokenize all of the sentences and map the tokens to thier word IDs.
input_ids = []
attention_masks = []

batch_size = 32
max_len = 48
pretrained = './BETOcased-20-epochs/'

tokenizer = BertTokenizer.from_pretrained(pretrained)
# For every sentence...
for sent in test_sentences:
    encoded_dict = tokenizer.encode_plus(
                        sent,                      # Sentence to encode.
                        add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                        max_length = max_len,           # Pad & truncate all sentences.
                        pad_to_max_length = True,
                        return_attention_mask = True,   # Construct attn. masks.
                        return_tensors = 'pt',     # Return pytorch tensors.
                   )
    input_ids.append(encoded_dict['input_ids'])
    attention_masks.append(encoded_dict['attention_mask'])

# Convert the lists into tensors.
input_ids = torch.cat(input_ids, dim=0)
attention_masks = torch.cat(attention_masks, dim=0)

# Print sentence 0, now as a list of IDs.
print('Original: ', test_sentences[0])
print('Token IDs:', input_ids[0])

print(len(input_ids), len(attention_masks))

# Combine the training inputs into a TensorDataset.
test_dataset = TensorDataset(input_ids, attention_masks)

Original:  querer saber tarjeta sin limite
Token IDs: tensor([    4,  9312,  2486,  7929,  1477, 28290,     5,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1])
6702 6702


In [32]:
test_dataloader = DataLoader(
                    test_dataset, # The validation samples.
                    sampler = SequentialSampler(test_dataset), # Pull out batches sequentially.
                    batch_size = 128 # Evaluate with this batch size.
                )

In [33]:
#df = pd.read_csv('dataset/test_santander.csv')

In [87]:
from torch import nn
model_paths = ['models/beto_lm-20_lr-0.00018_wu-1000_epochs-18',
                'models/beto_lm-20_lr-0.0001_wu-1000_epochs-18/',
               'models/beto_lm-20_lr-0.00015_wu-1000_epochs-18',
              'models/beto_lm-20_lr-0.00015_wu-2000_epochs-18',
              'models/beto_lm-20_lr-0.0001_wu-2000_epochs-18']



## Prueba Voting Soft (no mejoró)

In [88]:
total_preds = []
for pretrained in model_paths:
    print('Predicting labels for {:,} test sentences...'.format(len(test_dataloader.dataset)))

    # Put model in evaluation mode
    model = BertForSequenceClassification.from_pretrained(
        pretrained, 
        num_labels = num_labels,  
        output_attentions = False,
        output_hidden_states = False
    )
    model.cuda()
    model.eval()

    # Tracking variables 
    predictions = []

    # Predict 
    for batch in test_dataloader:
      # Add batch to GPU
        batch = tuple(t.to(device) for t in batch)

      # Unpack the inputs from our dataloader
        b_input_ids, b_input_mask = batch

        # Telling the model not to compute or store gradients, saving memory and 
        # speeding up prediction
        with torch.no_grad():
          # Forward pass, calculate logit predictions
          outputs = model(b_input_ids, token_type_ids=None, 
                          attention_mask=b_input_mask)

        logits = outputs[0]

        # Move logits and labels to CPU
        logits = logits.detach().cpu().numpy()

        # Store predictions and true labels
        predictions.append(logits)
    
    print('DONE.')
    # Combine the results across all batches. 
    flat_predictions = softmax(torch.Tensor(np.concatenate(predictions, axis=0)))
    total_preds.append(flat_predictions)

Predicting labels for 6,702 test sentences...
DONE.
Predicting labels for 6,702 test sentences...
DONE.
Predicting labels for 6,702 test sentences...
DONE.
Predicting labels for 6,702 test sentences...
DONE.
Predicting labels for 6,702 test sentences...
DONE.


In [89]:
len(total_preds)

5

In [90]:
total_preds = torch.stack(total_preds).mean(dim=0).numpy()

In [91]:
#soft_vot = np.mean(total_preds,axis=0)
flat_predictions_bin = np.argmax(total_preds, axis=1).flatten()

In [93]:
flat_predictions_bin

array([294, 294, 302, ..., 190, 129, 328])

In [100]:
total_preds = []

print('Predicting labels for {:,} test sentences...'.format(len(test_dataloader.dataset)))

model.eval()

# Tracking variables 
predictions = []

# Predict 
for batch in test_dataloader:
  # Add batch to GPU
    batch = tuple(t.to(device) for t in batch)

  # Unpack the inputs from our dataloader
    b_input_ids, b_input_mask = batch

    # Telling the model not to compute or store gradients, saving memory and 
    # speeding up prediction
    with torch.no_grad():
      # Forward pass, calculate logit predictions
      outputs = model(b_input_ids, token_type_ids=None, 
                      attention_mask=b_input_mask)

    logits = outputs[0]

    # Move logits and labels to CPU
    logits = logits.detach().cpu().numpy()

    # Store predictions and true labels
    predictions.append(logits)

print('DONE.')
# Combine the results across all batches. 
flat_predictions = np.concatenate(predictions, axis=0)

Predicting labels for 6,702 test sentences...
DONE.


In [None]:
soft_vot = np.max(total_preds,axis=0)
flat_predictions_bin = np.argmax(soft_vot, axis=1).flatten()

In [101]:


# For each sample, pick the label (0 or 1) with the higher score.
flat_predictions_bin = np.argmax(flat_predictions, axis=1).flatten()

In [102]:
flat_predictions_bin

array([294, 294, 302, ..., 190, 129, 328])

In [103]:
res_test = le.inverse_transform(flat_predictions_bin)

In [104]:
test['Intencion'] = res_test

In [105]:
test[test.Intencion == 34]

Unnamed: 0,id,Pregunta,Intencion
1549,1549,queria saber cómo conseguir mi iban,34
1832,1832,cuál es el iban del banco xxxxx? (del exterior),34
2636,2636,donde cargo el swift?,34
4402,4402,necesito toda la info de abba swift code etc,34
5791,5791,cuál es el aba del banco xxxxx? (del exterior),34
6442,6442,quiero saber mi número de c b u,34


In [106]:
model_path

'models/beto_lm-20_lr-0.00018_wu-1000_epochs-18'

In [107]:
pd.Series(res_test).to_csv(f'{model_path}.csv', header=None)

  pd.Series(res_test).to_csv(f'{model_path}.csv', header=None)


# Voting Hard

In [115]:
# Best voting: 1,2,3, 6,7,8

test_1 = pd.read_csv('beto_aug_2.csv', header=None)
test_2 = pd.read_csv('beto_best_64_mlflow.csv',header=None)
test_3 = pd.read_csv('beto_best_mlflow.csv',header=None)
test_4 = pd.read_csv('beto_test_20_epochs.csv',header=None)
test_5 = pd.read_csv('beto_uncased_test_21_epochs.csv',header=None)
test_6 = pd.read_csv('models/beto_lm-20_lr-0.0001_wu-2000_epochs-18.csv', header=None)
test_7 = pd.read_csv('models/beto_lm-20_lr-0.00015_wu-2000_epochs-18.csv', header=None)
test_8 = pd.read_csv('models/beto_lm-20_lr-0.00015_wu-1000_epochs-18.csv', header=None)
test_9 = pd.read_csv('models/beto_lm-20_lr-0.0001_wu-1000_epochs-18.csv', header=None)
test_10 = pd.read_csv('models/beto_lm-20_lr-0.00018_wu-1000_epochs-18.csv', header=None)
test_11 = pd.read_csv('voting_best_6.csv',header=None)

In [None]:
np.unique(np.stack((test_1[1],test_2[1],test_3[1],test_4[1],test_5[1],test_6[1])),axis=0,return_counts=1)

In [116]:
#stacked_preds = np.stack((test_1[1],test_2[1],test_3[1],test_4[1],test_5[1],test_6[1]))
stacked_preds = np.stack((test_11[1], test_9[1],test_10[1]))

In [117]:
from collections import Counter
res = []
for i in range(stacked_preds.shape[1]):
    res.append(Counter(stacked_preds[:,i]).most_common()[0][0])

In [118]:
pd.Series(res).to_csv('voting_best_6_and_9-10.csv', header=None)

  pd.Series(res).to_csv('voting_best_6_and_9-10.csv', header=None)


In [135]:
max(full_labels)

350

In [136]:
val, counts = np.unique(res,return_counts=True)
train_val, train_counts = np.unique(full_labels,return_counts=True)

In [None]:
from sklearn.metrics import classification_report

In [None]:
print(classification_report(res_test, test_6[1]))