<a href="https://colab.research.google.com/github/jumafernandez/clasificacion_correos/blob/main/notebooks/jcc/03-BERT-wandb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# JCC: BERT con ajuste de hiperparámetros

En esta notebook se presentan los experimentos sobre la estrategia de representación y técnica de aprendizaje *avanzada* utilizada para las JCC de  la Universidad Nacional de La Plata.

Para ello vamos a tomar el texto de la consulta de los correos y aplicar BERT con el modelo pre-entrenado en español (BETO - Departamento de Computación de la Universidad de Chile).


## Instalación de librerías

Se instalan dos librerías que no están en el entorno Colab:
- *simpletransformers* (para entrenar Bert),
- requests (para consumir funciones propias de Github),
- wget (para la descarga de archivos),
- wandb (para la optimización de hiperparámetros).

In [None]:
!pip install simpletransformers
!pip install requests
!pip install wget
!pip install wandb

### Funciones útiles

Se cargan funciones útiles desde el repo https://github.com/jumafernandez/clasificacion_correos para la carga y balanceo del dataset.

In [None]:
import requests

# Se hace el request del raw del script python
url = 'https://raw.githubusercontent.com/jumafernandez/clasificacion_correos/main/scripts/jcc/funciones_dataset.py'
r = requests.get(url)

# Se guarda en el working directory
with open('funciones_dataset.py', 'w') as f:
    f.write(r.text)

# Se importan las funciones a utilizar
from funciones_dataset import get_clases, cargar_dataset, separar_x_y_rna

También se carga la función para preprocesar el texto que se usó en los otros modelos desde el repo: https://github.com/jumafernandez/clasificacion_correos.

In [None]:

import requests

# Se hace el request del raw del script python
url = 'https://raw.githubusercontent.com/jumafernandez/clasificacion_correos/main/scripts/jcc/funciones_preprocesamiento.py'
r = requests.get(url)

# Se guarda en el working directory
with open('funciones_preprocesamiento.py', 'w') as f:
    f.write(r.text)

# Se importan las funciones a utilizar
from funciones_preprocesamiento import preprocesar_correos_bert

## Carga del dataset con los correos

Se cargan los datos y se realiza el balanceo de clases:

In [None]:
from funciones_dataset import get_clases, cargar_dataset
from os import path
import warnings
warnings.filterwarnings("ignore")

# Cantidad de clases
CANTIDAD_CLASES = 4

# Constantes con los datos
DS_DIR = 'https://raw.githubusercontent.com/jumafernandez/clasificacion_correos/main/data/consolidado_jcc/'
TRAIN_FILE = 'correos-train-80.csv'
TEST_FILE = 'correos-test-20.csv'

# Chequeo sobre si los archivos están en el working directory
download_files = not(path.exists(TRAIN_FILE))

etiquetas = get_clases()
train_df, test_df, etiquetas = cargar_dataset(DS_DIR, TRAIN_FILE, TEST_FILE, download_files, 'clase', etiquetas, CANTIDAD_CLASES, 'Otras Consultas')


Se inicia descarga de los datasets.

El conjunto de entrenamiento tiene la dimensión: (800, 24)
El conjunto de testeo tiene la dimensión: (200, 24)


### Pre-procesamiento de los datos

Se preparan los datos para el entrenamiento de BERT:

In [None]:
train_df = train_df[['Consulta', 'clase']]
train_df.columns = ['text', 'labels']
test_df = test_df[['Consulta', 'clase']]
test_df.columns = ['text', 'labels']

# Cambio los integers por las etiquetas
train_df.labels = etiquetas[train_df.labels]
test_df.labels = etiquetas[test_df.labels]

# Las vuelvo a pasar a números 0-N para evitar conflictos con simpletransformers
# Este paso está fijo para estos experimentos
dict_clases_id = {'Otras Consultas': 0,
                            'Ingreso a la Universidad': 1,
                            'Boleto Universitario': 2,
                            'Requisitos de Ingreso': 3}

train_df['labels'].replace(dict_clases_id, inplace=True)
test_df['labels'].replace(dict_clases_id, inplace=True)

# Muestro salida por consola
print('Existen {} clases: {}.'.format(len(train_df.labels.unique()), train_df.labels.unique()))

Existen 4 clases: [0 1 2 3].


## Elección de un modelo monolenguaje pre-entrenado

La librería *simpletransformers* se basa en la librería *Transformers* de HuggingFace. Esto permite utilizar todos los modelos pre-entrenados disponibles en la [Transformers library](https://huggingface.co/transformers/pretrained_models.html) que son provistos por toda la comunidad de desarrolladores. Para ver cuales son los modelos disponibles, se puede ingresar a [https://huggingface.co/models](https://huggingface.co/models).

En nuestro caso, vamos a utilizar el modelo `dccuchile/bert-base-spanish-wwm-cased`. Este modelo está pre-entrenado por un equipo de investigadores del Departamento de Computación de la Universidad de Chile.

### Optimización de hiperparámetros

Se utiliza Sweep de wandb para la optimización de hiperparámetros.

Para más información:
https://simpletransformers.ai/docs/tips-and-tricks/#hyperparameter-optimization

In [None]:
import wandb

sweep_config = {
    "method": "bayes",  # grid, random
    "metric": {"name": "train_loss", "goal": "minimize"},
    "parameters": {
        "num_train_epochs": {"values": [2, 3, 4]},
        "learning_rate": {"min": 4e-5, "max": 4e-4},
    },
}

sweep_id = wandb.sweep(sweep_config, project="Bert_JCC2021")

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Create sweep with ID: 5ocb1k9j
Sweep URL: https://wandb.ai/jumafernandez/Bert_JCC2021/sweeps/5ocb1k9j


Trackeo del logg de aprendizaje del Modelo:

In [None]:
import logging

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

### Definición del Modelo para Clasificación

Se carga el modelo pre-entrenado BETO con la respectiva definición de hiperparámetros para el entrenamiento:

In [None]:
from simpletransformers.classification import ClassificationModel

# Hiperparámetros
train_args = {
        'overwrite_output_dir': True,
        'reprocess_input_data': True,
        'evaluate_during_training': True,
        'fp16': True,
        'do_lower_case': True,
        'use_early_stopping': True,
        'manual_seed': 4,
        'use_multiprocessing': True,
        'train_batch_size': 16,
        'eval_batch_size': 8,
        'wandb_project': 'Bert_JCC2021',
        }

## Entrenamiento

Se entrena el modelo con el dataset de train en función de los hiperparámetros:

In [None]:
def train():
    # Se inicializa wandb
    wandb.init()

    # Creamos el ClassificationModel
    model = ClassificationModel(
        model_type='bert', 
    #    model_name='bert-base-multilingual-cased',
        model_name='dccuchile/bert-base-spanish-wwm-cased',
        num_labels=CANTIDAD_CLASES,
        use_cuda=False,
        args=train_args
    )

    # Se entrena el modelo
    model.train_model(train_df, eval_df=test_df)

    # Se evalúa el modelo
    model.eval_model(test_df)

    # Se sincroniza el run
    wandb.join()

wandb.agent(sweep_id, train)

INFO:wandb.agents.pyagent:Starting sweep agent: entity=None, project=None, count=None
[34m[1mwandb[0m: Agent Starting Run: p01pmrsz with config:
[34m[1mwandb[0m: 	learning_rate: 0.0003325194503654975
[34m[1mwandb[0m: 	num_train_epochs: 3
[34m[1mwandb[0m: Currently logged in as: [33mjumafernandez[0m (use `wandb login --relogin` to force relogin)


INFO:filelock:Lock 139777078333776 acquired on /root/.cache/huggingface/transformers/cb7cedb04246e225d56ba26d207f1d1809b31a9bbe9b63103371d835c6ac0502.f4e4777229bac528fa2a8d4833e2ef53624e985ebde0fd527064a5cc7c50832b.lock


Downloading:   0%|          | 0.00/648 [00:00<?, ?B/s]

INFO:filelock:Lock 139777078333776 released on /root/.cache/huggingface/transformers/cb7cedb04246e225d56ba26d207f1d1809b31a9bbe9b63103371d835c6ac0502.f4e4777229bac528fa2a8d4833e2ef53624e985ebde0fd527064a5cc7c50832b.lock
INFO:filelock:Lock 139777044493200 acquired on /root/.cache/huggingface/transformers/52382cbe7c1587c6b588daa81eaf247c5e2ad073d42b52192a8cd4202e7429b6.a88ccd19b1f271e63b6a901510804e6c0318089355c471334fe8b71b316a30ab.lock


Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

INFO:filelock:Lock 139777044493200 released on /root/.cache/huggingface/transformers/52382cbe7c1587c6b588daa81eaf247c5e2ad073d42b52192a8cd4202e7429b6.a88ccd19b1f271e63b6a901510804e6c0318089355c471334fe8b71b316a30ab.lock
Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to b

Downloading:   0%|          | 0.00/242k [00:00<?, ?B/s]

INFO:filelock:Lock 139777044492688 released on /root/.cache/huggingface/transformers/6761cd0c3d282272f598fcc1fa8c4ecfff8c18762ec8acb40f9cbb562cb0901e.6587bde86239957281af55b2f7e564df111a2b4f9dfc0ad884f13ea7106e4dfb.lock
INFO:filelock:Lock 139777035150096 acquired on /root/.cache/huggingface/transformers/44de7af89c157bf67367a71105165d92bebe0585543739a918e3870d25484c27.6a099cd4b12bf7db174fffe48b004eb919c325f108e0c36176a0fe0ad1848d31.lock


Downloading:   0%|          | 0.00/480k [00:00<?, ?B/s]

INFO:filelock:Lock 139777035150096 released on /root/.cache/huggingface/transformers/44de7af89c157bf67367a71105165d92bebe0585543739a918e3870d25484c27.6a099cd4b12bf7db174fffe48b004eb919c325f108e0c36176a0fe0ad1848d31.lock
INFO:filelock:Lock 139777044493840 acquired on /root/.cache/huggingface/transformers/9848a00af462c42dfb4ec88ef438fbab5256330f7f6f50badc48d277f9367d49.f982506b52498d4adb4bd491f593dc92b2ef6be61bfdbe9d30f53f963f9f5b66.lock


Downloading:   0%|          | 0.00/134 [00:00<?, ?B/s]

INFO:filelock:Lock 139777044493840 released on /root/.cache/huggingface/transformers/9848a00af462c42dfb4ec88ef438fbab5256330f7f6f50badc48d277f9367d49.f982506b52498d4adb4bd491f593dc92b2ef6be61bfdbe9d30f53f963f9f5b66.lock
INFO:filelock:Lock 139777189382608 acquired on /root/.cache/huggingface/transformers/ca34e6c1251888a8ed98da2a454f869d28e3438eef67c2f93aa8133459ac08a3.0e90f656d0426b15b4927d1fe8ca5ec4c2e7b0d0e878c9153c3ddc6ed9bbed3c.lock


Downloading:   0%|          | 0.00/364 [00:00<?, ?B/s]

INFO:filelock:Lock 139777189382608 released on /root/.cache/huggingface/transformers/ca34e6c1251888a8ed98da2a454f869d28e3438eef67c2f93aa8133459ac08a3.0e90f656d0426b15b4927d1fe8ca5ec4c2e7b0d0e878c9153c3ddc6ed9bbed3c.lock
INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/800 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_bert_128_4_2


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/50 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2
INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.
INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2


Running Evaluation:   0%|          | 0/25 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model:{'mcc': 0.6667790263652669, 'eval_loss': 0.6182582986354828}


VBox(children=(Label(value=' 0.03MB of 0.03MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Training loss,0.52839
lr,0.0
global_step,50.0
_runtime,979.0
_timestamp,1617661627.0
_step,4.0
mcc,0.66678
train_loss,0.52839
eval_loss,0.61826


0,1
Training loss,▁
lr,▁
global_step,▁▁
_runtime,▁▄███
_timestamp,▁▄███
_step,▁▃▅▆█
mcc,▁
train_loss,▁
eval_loss,▁


[34m[1mwandb[0m: Agent Starting Run: k0rxcw3q with config:
[34m[1mwandb[0m: 	learning_rate: 0.00024445254549075417
[34m[1mwandb[0m: 	num_train_epochs: 3


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchi

  0%|          | 0/800 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_bert_128_4_2


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/50 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2
INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.
INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2


Running Evaluation:   0%|          | 0/25 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model:{'mcc': 0.6667790263652669, 'eval_loss': 0.6182582986354828}


VBox(children=(Label(value=' 0.03MB of 0.03MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Training loss,0.52839
lr,0.0
global_step,50.0
_runtime,964.0
_timestamp,1617662600.0
_step,4.0
mcc,0.66678
train_loss,0.52839
eval_loss,0.61826


0,1
Training loss,▁
lr,▁
global_step,▁▁
_runtime,▁▄███
_timestamp,▁▄███
_step,▁▃▅▆█
mcc,▁
train_loss,▁
eval_loss,▁


[34m[1mwandb[0m: Agent Starting Run: 52ypeu77 with config:
[34m[1mwandb[0m: 	learning_rate: 0.00013388952193487948
[34m[1mwandb[0m: 	num_train_epochs: 2


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchi

  0%|          | 0/800 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_bert_128_4_2


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/50 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2
INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.
INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2


Running Evaluation:   0%|          | 0/25 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model:{'mcc': 0.6667790263652669, 'eval_loss': 0.6182582986354828}


VBox(children=(Label(value=' 0.03MB of 0.03MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Training loss,0.52839
lr,0.0
global_step,50.0
_runtime,983.0
_timestamp,1617663594.0
_step,4.0
mcc,0.66678
train_loss,0.52839
eval_loss,0.61826


0,1
Training loss,▁
lr,▁
global_step,▁▁
_runtime,▁▄███
_timestamp,▁▄███
_step,▁▃▅▆█
mcc,▁
train_loss,▁
eval_loss,▁


[34m[1mwandb[0m: Agent Starting Run: a9j6d8p8 with config:
[34m[1mwandb[0m: 	learning_rate: 0.00038333537789076426
[34m[1mwandb[0m: 	num_train_epochs: 4


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchi

  0%|          | 0/800 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_bert_128_4_2


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/50 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2
INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.
INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2


Running Evaluation:   0%|          | 0/25 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model:{'mcc': 0.6667790263652669, 'eval_loss': 0.6182582986354828}


VBox(children=(Label(value=' 0.03MB of 0.03MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Training loss,0.52839
lr,0.0
global_step,50.0
_runtime,967.0
_timestamp,1617664571.0
_step,4.0
mcc,0.66678
train_loss,0.52839
eval_loss,0.61826


0,1
Training loss,▁
lr,▁
global_step,▁▁
_runtime,▁▄███
_timestamp,▁▄███
_step,▁▃▅▆█
mcc,▁
train_loss,▁
eval_loss,▁


[34m[1mwandb[0m: Agent Starting Run: wy7r01zy with config:
[34m[1mwandb[0m: 	learning_rate: 4.715089759221623e-05
[34m[1mwandb[0m: 	num_train_epochs: 2


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchi

  0%|          | 0/800 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_bert_128_4_2


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/50 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2
INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.
INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2


Running Evaluation:   0%|          | 0/25 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model:{'mcc': 0.6667790263652669, 'eval_loss': 0.6182582986354828}


VBox(children=(Label(value=' 0.03MB of 0.03MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Training loss,0.52839
lr,0.0
global_step,50.0
_runtime,988.0
_timestamp,1617665571.0
_step,4.0
mcc,0.66678
train_loss,0.52839
eval_loss,0.61826


0,1
Training loss,▁
lr,▁
global_step,▁▁
_runtime,▁▄███
_timestamp,▁▄███
_step,▁▃▅▆█
mcc,▁
train_loss,▁
eval_loss,▁


[34m[1mwandb[0m: Agent Starting Run: g23hs38y with config:
[34m[1mwandb[0m: 	learning_rate: 0.00016629698128897924
[34m[1mwandb[0m: 	num_train_epochs: 4
[34m[1mwandb[0m: wandb version 0.10.25 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchi

  0%|          | 0/800 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_bert_128_4_2


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/50 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2
INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.
INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2


Running Evaluation:   0%|          | 0/25 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model:{'mcc': 0.6667790263652669, 'eval_loss': 0.6182582986354828}


VBox(children=(Label(value=' 0.03MB of 0.03MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Training loss,0.52839
lr,0.0
global_step,50.0
_runtime,980.0
_timestamp,1617666563.0
_step,4.0
mcc,0.66678
train_loss,0.52839
eval_loss,0.61826


0,1
Training loss,▁
lr,▁
global_step,▁▁
_runtime,▁▄███
_timestamp,▁▄███
_step,▁▃▅▆█
mcc,▁
train_loss,▁
eval_loss,▁


[34m[1mwandb[0m: Agent Starting Run: e727o60u with config:
[34m[1mwandb[0m: 	learning_rate: 0.0003338003468045088
[34m[1mwandb[0m: 	num_train_epochs: 2
[34m[1mwandb[0m: wandb version 0.10.25 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchi

  0%|          | 0/800 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_bert_128_4_2


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/50 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2
INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.
INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2


Running Evaluation:   0%|          | 0/25 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model:{'mcc': 0.6667790263652669, 'eval_loss': 0.6182582986354828}


VBox(children=(Label(value=' 0.03MB of 0.03MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Training loss,0.52839
lr,0.0
global_step,50.0
_runtime,1007.0
_timestamp,1617667582.0
_step,4.0
mcc,0.66678
train_loss,0.52839
eval_loss,0.61826


0,1
Training loss,▁
lr,▁
global_step,▁▁
_runtime,▁▄███
_timestamp,▁▄███
_step,▁▃▅▆█
mcc,▁
train_loss,▁
eval_loss,▁


[34m[1mwandb[0m: Agent Starting Run: kl98fv9z with config:
[34m[1mwandb[0m: 	learning_rate: 0.00011282659154906465
[34m[1mwandb[0m: 	num_train_epochs: 2
[34m[1mwandb[0m: wandb version 0.10.25 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchi

  0%|          | 0/800 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_bert_128_4_2


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/50 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2
INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.
INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2


Running Evaluation:   0%|          | 0/25 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model:{'mcc': 0.6667790263652669, 'eval_loss': 0.6182582986354828}


VBox(children=(Label(value=' 0.03MB of 0.03MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Training loss,0.52839
lr,0.0
global_step,50.0
_runtime,985.0
_timestamp,1617668580.0
_step,4.0
mcc,0.66678
train_loss,0.52839
eval_loss,0.61826


0,1
Training loss,▁
lr,▁
global_step,▁▁
_runtime,▁▄███
_timestamp,▁▄███
_step,▁▃▅▆█
mcc,▁
train_loss,▁
eval_loss,▁


[34m[1mwandb[0m: Agent Starting Run: j32qilps with config:
[34m[1mwandb[0m: 	learning_rate: 0.0002744680937267031
[34m[1mwandb[0m: 	num_train_epochs: 2
[34m[1mwandb[0m: wandb version 0.10.25 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchi

  0%|          | 0/800 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_bert_128_4_2


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/50 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2
INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.
INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.


  0%|          | 0/200 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_dev_bert_128_4_2


Running Evaluation:   0%|          | 0/25 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model:{'mcc': 0.6667790263652669, 'eval_loss': 0.6182582986354828}


VBox(children=(Label(value=' 0.03MB of 0.03MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Training loss,0.52839
lr,0.0
global_step,50.0
_runtime,1016.0
_timestamp,1617669611.0
_step,4.0
mcc,0.66678
train_loss,0.52839
eval_loss,0.61826


0,1
Training loss,▁
lr,▁
global_step,▁▁
_runtime,▁▄███
_timestamp,▁▄███
_step,▁▃▅▆█
mcc,▁
train_loss,▁
eval_loss,▁


[34m[1mwandb[0m: Agent Starting Run: ehl3vk1e with config:
[34m[1mwandb[0m: 	learning_rate: 0.00034418130472676904
[34m[1mwandb[0m: 	num_train_epochs: 3
[34m[1mwandb[0m: wandb version 0.10.25 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at dccuchile/bert-base-spanish-wwm-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dccuchi

  0%|          | 0/800 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_bert_128_4_2


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/50 [00:00<?, ?it/s]