Gratidão aos colegas Thiago Soares Laitz e Hugo (hugo@maritaca.ai) pelo apoio e código base fornecidos

# Install

In [None]:
#!pip install sentencepiece

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m63.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99


# Imports

In [1]:
from tqdm import tqdm

# Organizando o ambiente

In [2]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Tue Jun 27 00:29:35 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.03   Driver Version: 525.116.03   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   49C    P8    26W / 370W |     60MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
from psutil import virtual_memory


In [4]:
def mostra_memoria(lista_mem=['cpu','gpu']):
  """
  Esta função exibe informações de memória da CPU e/ou GPU, conforme parâmetros fornecidos.

  Parâmetros:
  -----------
  lista_mem : list, opcional
      Lista com strings 'cpu' e/ou 'gpu'.
      'cpu' - exibe informações de memória da CPU.
      'gpu' - exibe informações de memória da GPU (se disponível).
      O valor padrão é ['cpu'].

  Saída:
  -------
  A função não retorna nada, apenas exibe as informações na tela.

  Exemplo de uso:
  ---------------
  Para exibir informações de memória da CPU:
      mostra_memoria(['cpu'])

  Para exibir informações de memória da CPU e GPU:
      mostra_memoria(['cpu', 'gpu'])

  Autor: Marcus Vinícius Borela de Castro

  """
  if 'cpu' in lista_mem:
    vm = virtual_memory()
    ram={}
    ram['total']=round(vm.total / 1e9,2)
    ram['available']=round(virtual_memory().available / 1e9,2)
    # ram['percent']=round(virtual_memory().percent / 1e9,2)
    ram['used']=round(virtual_memory().used / 1e9,2)
    ram['free']=round(virtual_memory().free / 1e9,2)
    ram['active']=round(virtual_memory().active / 1e9,2)
    ram['inactive']=round(virtual_memory().inactive / 1e9,2)
    ram['buffers']=round(virtual_memory().buffers / 1e9,2)
    ram['cached']=round(virtual_memory().cached/1e9 ,2)
    print(f"Your runtime RAM in gb: \n total {ram['total']}\n available {ram['available']}\n used {ram['used']}\n free {ram['free']}\n cached {ram['cached']}\n buffers {ram['buffers']}")
    print('/nGPU')
    gpu_info = !nvidia-smi
  if 'gpu' in lista_mem:
    gpu_info = '\n'.join(gpu_info)
    if gpu_info.find('failed') >= 0:
      print('Not connected to a GPU')
    else:
      print(gpu_info)


In [5]:
mostra_memoria()

Your runtime RAM in gb: 
 total 67.35
 available 55.4
 used 10.82
 free 35.72
 cached 19.93
 buffers 0.88
/nGPU
Tue Jun 27 00:29:39 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.03   Driver Version: 525.116.03   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   50C    P8    37W / 370W |     60MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                        

### Vinculando pasta do google drive para salvar dados

In [6]:
import os

In [7]:
# from google.colab import drive
# drive.mount('/content/drive')

In [8]:
current_dir = os.getcwd()
print("Current directory:", current_dir)

Current directory: /home/borela/fontes/ind-ir/code/train


## Fixando as seeds

In [9]:
import random
import torch
import torch.nn.functional as F
import numpy as np
from torch.utils.data import DataLoader

In [10]:
def inicializa_seed(num_semente:int=123):
  """
  Inicializa as sementes para garantir a reprodutibilidade dos resultados do modelo.
  Essa é uma prática recomendada, já que a geração de números aleatórios pode influenciar os resultados do modelo.
  Além disso, a função também configura as sementes da GPU para garantir a reprodutibilidade quando se utiliza aceleração por GPU.

  Args:
      num_semente (int): número da semente a ser utilizada para inicializar as sementes das bibliotecas.

  References:
      http://nlp.seas.harvard.edu/2018/04/03/attention.html
      https://github.com/CyberZHG/torch-multi-head-attention/blob/master/torch_multi_head_attention/multi_head_attention.py#L15
  """
  # Define as sementes das bibliotecas random, numpy e pytorch
  random.seed(num_semente)
  np.random.seed(num_semente)
  torch.manual_seed(num_semente)

  # Define as sementes da GPU
  torch.backends.cudnn.deterministic = True
  torch.backends.cudnn.benchmark = False

  #torch.cuda.manual_seed(num_semente)
  #Cuda algorithms
  #torch.backends.cudnn.deterministic = True


In [11]:
num_semente=123
inicializa_seed(num_semente)

## Definindo Hiperparâmetros iniciais

In [12]:
def inicia_hparam()->dict:
  # Inicialização dos parâmetros
  hparam = {}
  hparam["num_workers_dataloader"] = 0
  hparam["device"] = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
  if torch.cuda.is_available(): print(torch. cuda. get_device_name(hparam["device"]))
  return hparam

In [13]:
hparam=inicia_hparam()

NVIDIA GeForce RTX 3090


## Preparando para debug e display

https://zohaib.me/debugging-in-google-collab-notebook/

In [14]:
# !pip install -Uqq ipdb
# import ipdb

In [15]:
import pandas as pd

In [None]:
# !pip install transformers -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m95.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m71.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m57.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [16]:
import transformers

  from .autonotebook import tqdm as notebook_tqdm


In [17]:
def config_display():
  """
  Esta função configura as opções de display do Pandas.
  """

  # Configurando formato saída Pandas
  # define o número máximo de colunas que serão exibidas
  pd.options.display.max_columns = None

  # define a largura máxima de uma linha
  pd.options.display.width = 1000

  # define o número máximo de linhas que serão exibidas
  pd.options.display.max_rows = 100

  # define o número máximo de caracteres por coluna
  pd.options.display.max_colwidth = 50

  # se deve exibir o número de linhas e colunas de um DataFrame.
  pd.options.display.show_dimensions = True

  # número de dígitos após a vírgula decimal a serem exibidos para floats.
  pd.options.display.precision = 7


In [18]:
def config_debug():
  """
  Esta função configura as opções de debug do PyTorch e dos pacotes
  transformers e datasets.
  """

  # Define opções de impressão de tensores para o modo científico
  torch.set_printoptions(sci_mode=True)
  """
    Significa que valores muito grandes ou muito pequenos são mostrados em notação científica.
    Por exemplo, em vez de imprimir o número 0.0000012345 como 0.0000012345,
    ele seria impresso como 1.2345e-06. Isso é útil em situações em que os valores dos tensores
    envolvidos nas operações são muito grandes ou pequenos, e a notação científica permite
    uma melhor compreensão dos números envolvidos.
  """

  # Habilita detecção de anomalias no autograd do PyTorch
  torch.autograd.set_detect_anomaly(True)
  """
    Permite identificar operações que podem causar problemas de estabilidade numérica,
    como gradientes explodindo ou desaparecendo. Quando essa opção é ativada,
    o PyTorch verifica se há operações que geram valores NaN ou infinitos nos tensores
    envolvidos no cálculo do gradiente. Se for detectado um valor anômalo, o PyTorch
    interrompe a execução e gera uma exceção, permitindo que o erro seja corrigido
    antes que se torne um problema maior.

    É importante notar que a detecção de anomalias pode ter um impacto significativo
    no desempenho, especialmente em modelos grandes e complexos. Por esse motivo,
    ela deve ser usada com cautela e apenas para depuração.
  """

  # Configura variável de ambiente para habilitar a execução síncrona (bloqueante) das chamadas da API do CUDA.
  os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
  """
    o Python aguarda o término da execução de uma chamada da API do CUDA antes de executar a próxima chamada.
    Isso é útil para depurar erros no código que envolve operações na GPU, pois permite que o erro seja capturado
    no momento em que ocorre, e não depois de uma sequência de operações que pode tornar a origem do erro mais difícil de determinar.
    No entanto, é importante lembrar que esse modo de execução é significativamente mais lento do que a execução assíncrona,
    que é o comportamento padrão do CUDA. Por isso, é recomendado utilizar esse comando apenas em situações de depuração
    e removê-lo após a solução do problema.
  """

  # Define o nível de verbosity do pacote transformers para info
  transformers.utils.logging.set_verbosity_info()


  """
    Define o nível de detalhamento das mensagens de log geradas pela biblioteca Hugging Face Transformers
    para o nível info. Isso significa que a biblioteca irá imprimir mensagens de log informativas sobre
    o andamento da execução, tais como tempo de execução, tamanho de batches, etc.

    Essas informações podem ser úteis para entender o que está acontecendo durante a execução da tarefa
    e auxiliar no processo de debug. É importante notar que, em alguns casos, a quantidade de informações
    geradas pode ser muito grande, o que pode afetar o desempenho do sistema e dificultar a visualização
    das informações relevantes. Por isso, é importante ajustar o nível de detalhamento de acordo com a
    necessidade de cada tarefa.

    Caso queira reduzir a quantidade de mensagens, comentar a linha acima e
      descomentar as duas linhas abaixo, para definir o nível de verbosity como error ou warning

    transformers.utils.logging.set_verbosity_error()
    transformers.utils.logging.set_verbosity_warning()
  """


  # Define o modo verbose do xmode, que é utilizado no debug
  %xmode Verbose

  """
    Comando usado no Jupyter Notebook para controlar o modo de exibição das informações de exceções.
    O modo verbose é um modo detalhado que exibe informações adicionais ao imprimir as exceções.
    Ele inclui as informações de pilha de chamadas completa e valores de variáveis locais e globais
    no momento da exceção. Isso pode ser útil para depurar e encontrar a causa de exceções em seu código.
    Ao usar %xmode Verbose, as informações de exceção serão impressas com mais detalhes e informações adicionais serão incluídas.

    Caso queira desabilitar o modo verbose e utilizar o modo plain,
    comentar a linha acima e descomentar a linha abaixo:
    %xmode Plain
  """

  """
    Dica:
    1.  pdb (Python Debugger)
      Quando ocorre uma exceção em uma parte do código, o programa para a execução e exibe uma mensagem de erro
      com informações sobre a exceção, como a linha do código em que ocorreu o erro e o tipo da exceção.

      Se você estiver depurando o código e quiser examinar o estado das variáveis ​​e executar outras operações
      no momento em que a exceção ocorreu, pode usar o pdb (Python Debugger). Para isso, é preciso colocar o comando %debug
      logo após ocorrer a exceção. Isso fará com que o programa pare na linha em que ocorreu a exceção e abra o pdb,
      permitindo que você explore o estado das variáveis, examine a pilha de chamadas e execute outras operações para depurar o código.


    2. ipdb
      O ipdb é um depurador interativo para o Python que oferece recursos mais avançados do que o pdb,
      incluindo a capacidade de navegar pelo código fonte enquanto depura.

      Você pode começar a depurar seu código inserindo o comando ipdb.set_trace() em qualquer lugar do
      seu código onde deseja pausar a execução e começar a depurar. Quando a execução chegar nessa linha,
      o depurador entrará em ação, permitindo que você examine o estado atual do seu programa e execute
      comandos para investigar o comportamento.

      Durante a depuração, você pode usar comandos:
        next (para executar a próxima linha de código),
        step (para entrar em uma função chamada na próxima linha de código)
        continue (para continuar a execução normalmente até o próximo ponto de interrupção).

      Ao contrário do pdb, o ipdb é um depurador interativo que permite navegar pelo código fonte em que
      está trabalhando enquanto depura, permitindo que você inspecione variáveis, defina pontos de interrupção
      adicionais e até mesmo execute expressões Python no contexto do seu programa.
  """


In [19]:
config_display()

In [20]:
config_debug()

Exception reporting mode: Verbose


In [None]:
os.environ['NEPTUNE_ALLOW_SELF_SIGNED_CERTIFICATE'] = 'TRUE'
os.environ['NEPTUNE_PROJECT'] = 'marcusborela/IA386DD'
os.environ['NEPTUNE_API_TOKEN'] = getpass.getpass('Informe NEPTUNE_API_TOKEN')

tag_contexto_rastro = 'Aula9_InPars'
neptune_version = 0
### Código Rastro

Busca implementar o rastro proposto em [Rastro-DM: Mineração de Dados com Rastro](https://revista.tcu.gov.br/ojs/index.php/RTCU/article/view/1664), autores Marcus Vinícius Borela de Castro e Remis Balaniuk, com o apoio da [solução Neptune](https://app.neptune.ai/)


def converte_optimizer_state_dict(parm_optimizer)-> dict:
  """
    Recebe um objeto "parm_optimizer" que é do tipo "torch.optim.Optimizer" e retorna um dicionário 
    com informações sobre o otimizador.

    O dicionário de retorno é gerado a partir do estado do otimizador que é extraído da propriedade
    "state_dict()" do objeto "parm_optimizer", seu primeiro grupo de parâmetros do otimizador.
  """
  # return str(hparam['optimizer'])
  return parm_optimizer.state_dict()['param_groups'][0]
if neptune_version == 0:
  import neptune.new as neptune  
  class NeptuneRastroRun():
      se_geracao_rastro = True 
      neptune_project = ""
      tag_contexto_rastro = ""
      neptune_api_token = ""

      def __init__(self, parm_params:dict,  parm_lista_tag:list = None):
        # print(f"NeptuneRastroRun.init: se_geracao_rastro {self.__class__.se_geracao_rastro} parm_params `{parm_params} ")
        if self.__class__.se_geracao_rastro:      
          self.run_neptune = neptune.init(project=self.__class__.neptune_project, api_token=self.__class__.neptune_api_token, capture_hardware_metrics=True)
          self.run_neptune['sys/name'] = self.__class__.tag_contexto_rastro
          vparams = copy.deepcopy(parm_params)
          if "optimizer" in vparams:
            vparams["optimizer"] = converte_optimizer_state_dict(vparams["optimizer"])
          if 'criterion'  in vparams:
            vparams["criterion"] = str(vparams["criterion"])
          if 'scheduler'  in vparams:
            vparams["scheduler"] = str(type(vparams["scheduler"]))
          if 'device' in vparams:
            vparams['device'] = str(vparams["device"])
          self.device = vparams["device"]
          for tag in parm_lista_tag:
            self.run_neptune['sys/tags'].add(tag)
          self.run_neptune['parameters'] = vparams
          self.tmpDir = tempfile.mkdtemp()

      @property
      def run():
        return self.run_neptune

      @classmethod
      def ativa_geracao_rastro(cls):
        cls.se_geracao_rastro = True      

      @classmethod
      def def_contexto(cls):
        cls.se_geracao_rastro = True      

      @classmethod
      def desativa_geracao_rastro(cls):
        cls.se_geracao_rastro = False      

      @classmethod
      def retorna_status_geracao_rastro(cls):
        return cls.se_geracao_rastro      

      @classmethod
      def retorna_tag_contexto_rastro(cls):
        return cls.tag_contexto_rastro 

      @classmethod
      def inicia_contexto(cls, neptune_project, tag_contexto_rastro, neptune_api_token):
        assert '.' not in tag_contexto_rastro, "NeptuneRastroRun.init(): tag_contexto_rastro não pode possuir ponto, pois será usado para gravar nome de arquivo"      
        cls.neptune_api_token = neptune_api_token
        cls.tag_contexto_rastro = tag_contexto_rastro
        cls.neptune_project = neptune_project

      def salva_metrica(self, parm_metricas={}):
        #print(f"NeptuneRastroRun.salva_metrica: se_geracao_rastro {self.__class__.se_geracao_rastro} parm_metricas:{parm_metricas} ")
        if self.__class__.se_geracao_rastro:
          for metrica, valor in parm_metricas.items(): 
            self.run_neptune[metrica].log(valor)
  
      def gera_grafico_modelo(self, loader_train, model):
        if self.__class__.se_geracao_rastro: 
          # efetuar um forward 
          """
          se dataloader devolver x e y:
          """
          x_, y_ = next(iter(loader_train))
          x_ = x_.to(self.device)
          outputs = model(x_)
          """
          # se dataloader devolver dict:
          dados_ = next(iter(loader_train))
          outputs = model(dados_['x'].to(self.device))
          #outputs = model(x_['input_ids'].to(self.device), x_['attention_mask'].to(self.device))
          """
          nome_arquivo = os.path.join(self.tmpDir, "modelo "+ self.__class__.tag_contexto_rastro + time.strftime("%Y-%b-%d %H:%M:%S"))
          make_dot(outputs, params=dict(model.named_parameters()), show_attrs=True, show_saved=True).render(nome_arquivo, format="png")
          self.run_neptune["parameters/model_graph"].upload(nome_arquivo+'.png')
          self.run_neptune['parameters/model'] = re.sub('<bound method Module.state_dict of ', '',str(model.state_dict))      



      def stop(self):
        if self.__class__.se_geracao_rastro:         
          self.run_neptune.stop()

if neptune_version == 1:
  import neptune
  class NeptuneRastroRun():
      """
        Classe para geração de rastro de experimento utilizando a ferramenta Neptune.

        Busca implementar o rastro proposto em [Rastro-DM: Mineração de Dados com Rastro](https://revista.tcu.gov.br/ojs/index.php/RTCU/article/view/1664),
        autores Marcus Vinícius Borela de Castro e Remis Balaniuk, com o apoio da [solução Neptune](https://app.neptune.ai/)

        Attributes:
        -----------
        se_geracao_rastro : bool
            Indica se deve ser gerado rastro de experimento. 
        neptune_project : str
            Nome do projeto criado no Neptune. 
        tag_contexto_rastro : str
            Nome da tag utilizada para identificar o experimento.
        neptune_api_token : str
            Token utilizado para autenticação na API do Neptune. 
        run_neptune : object
            Objeto que representa o experimento no Neptune.
        device : str
            Dispositivo utilizado para o treinamento do modelo.
        tmpDir : str
          Diretório temporário utilizado para salvar gráfico do modelo.          
      """
      se_geracao_rastro = True 
      neptune_project = ""
      tag_contexto_rastro = ""
      neptune_api_token = ""

      def __init__(self, parm_params:dict,  parm_lista_tag:list = None):
        """
          Método construtor da classe NeptuneRastroRun.
          
          Args:
          - parm_params: dicionário contendo os parâmetros do modelo.
          - parm_lista_tag: lista contendo tags adicionais para o experimento.
        """      
        # print(f"NeptuneRastroRun.init: se_geracao_rastro {self.__class__.se_geracao_rastro} parm_params `{parm_params} ")
        if self.__class__.se_geracao_rastro:      
          self.run_neptune = neptune.init_run(project=self.__class__.neptune_project, api_token=self.__class__.neptune_api_token, capture_hardware_metrics=True)
          self.run_neptune['sys/name'] = self.__class__.tag_contexto_rastro
          vparams = copy.deepcopy(parm_params)
          if "optimizer" in vparams:
            vparams["optimizer"] = converte_optimizer_state_dict(vparams["optimizer"])
          if 'criterion'  in vparams:
            vparams["criterion"] = str(vparams["criterion"])
          if 'scheduler'  in vparams:
            vparams["scheduler"] = str(type(vparams["scheduler"]))
          if 'device' in vparams:
            vparams['device'] = str(vparams["device"])
          self.device = vparams["device"]
          for tag in parm_lista_tag:
            self.run_neptune['sys/tags'].add(tag)
          self.run_neptune['parameters'] = vparams
          # self.tmpDir = tempfile.mkdtemp()

      @property
      def run():
        """
        Retorna a instância do objeto run_neptune.
        """      
        return self.run_neptune

      @classmethod
      def ativa_geracao_rastro(cls):
        """
        Ativa a geração de rastro.
        """      
        cls.se_geracao_rastro = True      

      @classmethod
      def def_contexto(cls):
        """
        Define o contexto para a geração de rastro.
        """      
        cls.se_geracao_rastro = True      

      @classmethod
      def desativa_geracao_rastro(cls):
        """
        Desativa a geração de rastro.
        """      
        cls.se_geracao_rastro = False      

      @classmethod
      def retorna_status_geracao_rastro(cls):
        """
          Retorna o status da geração de rastro.
          
          Returns:
          - True se a geração de rastro está ativada, False caso contrário.
        """      
        return cls.se_geracao_rastro      

      @classmethod
      def retorna_tag_contexto_rastro(cls):
        """
          Retorna a tag do contexto de rastro.
        """      
        return cls.tag_contexto_rastro 

      @classmethod
      def inicia_contexto(cls, neptune_project, tag_contexto_rastro, neptune_api_token):
        """
        Inicia o contexto de execução no Neptune.

        Args:
            neptune_project (str): Nome do projeto no Neptune.
            tag_contexto_rastro (str): Tag que identifica o contexto de execução no Neptune.
            neptune_api_token (str): Token de acesso à API do Neptune.

        Raises:
            AssertionError: Caso a tag_contexto_rastro possua um ponto (.), 
              o que pode gerar erros na gravação de arquivo.
        """      
        assert '.' not in tag_contexto_rastro, "NeptuneRastroRun.init(): tag_contexto_rastro não pode possuir ponto, pois será usado para gravar nome de arquivo"      
        cls.neptune_api_token = neptune_api_token
        cls.tag_contexto_rastro = tag_contexto_rastro
        cls.neptune_project = neptune_project

      def salva_metrica(self, parm_metricas={}):
        """
          Salva as métricas no Neptune Run caso a geração de rastro esteja ativa.

          Parameters
          ----------
          parm_metricas: dict
              Dicionário contendo as métricas a serem salvas. As chaves devem ser os nomes das métricas e os valores devem ser
              os valores das métricas.
        """
        #print(f"NeptuneRastroRun.salva_metrica: se_geracao_rastro {self.__class__.se_geracao_rastro} parm_metricas:{parm_metricas} ")
        if self.__class__.se_geracao_rastro:
          for metrica, valor in parm_metricas.items(): 
            self.run_neptune[metrica].append(valor)
  
      def gera_grafico_modelo(self, loader_train, model):
        """
          Gera um gráfico do modelo e o envia para o Neptune. 
          Para gerar o gráfico, um forward pass é realizado em um batch de exemplos 
          de treino e o resultado é renderizado como um gráfico de nós conectados. 
          O gráfico é salvo em um arquivo .png e enviado para o Neptune como um arquivo anexo.

          Args:
              loader_train (torch.utils.data.DataLoader): DataLoader do conjunto de treinamento.
              model (torch.nn.Module): Modelo a ser visualizado.
          
          Pendente:
            Evolui para usar from io import StringIO (buffer = io.StringIO()) ao invés de tempdir 
        """    
        return

        """
        falta ajustar make_dot
        if self.__class__.se_geracao_rastro: 
          # efetuar um forward 
          batch = next(iter(loader_train))
          # falta generalizar linha abaixo. Criar função que recebe modelo e batch como parâmetro?
          outputs = model(input_ids=batch['input_ids'].to(hparam['device']), attention_mask=batch['attention_mask'].to(hparam['device']), token_type_ids=batch['token_type_ids'].to(hparam['device']), labels=batch['labels'].to(hparam['device']))
          nome_arquivo = os.path.join(self.tmpDir, "modelo "+ self.__class__.tag_contexto_rastro + time.strftime("%Y-%b-%d %H:%M:%S"))
          make_dot(outputs, params=dict(model.named_parameters()), show_attrs=True, show_saved=True).render(nome_arquivo, format="png")
          self.run_neptune["parameters/model_graph"].upload(nome_arquivo+'.png')
          self.run_neptune['parameters/model'] = re.sub('<bound method Module.state_dict of ', '',str(model.state_dict))      
        """


      def stop(self):
        """
          Para a execução do objeto Neptune. Todos os experimentos do Neptune são sincronizados com o servidor, e nenhum outro 
          experimento poderá ser adicionado a este objeto após a chamada a este método.
        """
        if self.__class__.se_geracao_rastro:         
          self.run_neptune.stop()

### Definindo parâmetros para o rastro


NeptuneRastroRun.inicia_contexto(os.environ['NEPTUNE_PROJECT'], tag_contexto_rastro,  os.environ['NEPTUNE_API_TOKEN'])
#NeptuneRastroRun.desativa_geracao_rastro()

device(type='cuda', index=0)

## Rastro (neptune.ai)

Gerado rastro da execução no Neptune (detalhes no artigo [Rastro-DM: Mineração de Dados com Rastro](https://revista.tcu.gov.br/ojs/index.php/RTCU/article/view/1664))


### Importação de libraries para Rastro

In [None]:
# !pip install neptune-client

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting neptune-client
  Downloading neptune_client-1.3.1-py3-none-any.whl (450 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m450.4/450.4 kB[0m [31m16.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting GitPython>=2.0.8 (from neptune-client)
  Downloading GitPython-3.1.31-py3-none-any.whl (184 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/184.3 kB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
Collecting PyJWT (from neptune-client)
  Downloading PyJWT-2.7.0-py3-none-any.whl (22 kB)
Collecting boto3>=1.16.0 (from neptune-client)
  Downloading boto3-1.26.160-py3-none-any.whl (135 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.9/135.9 kB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bravado<12.0.0,>=11.0.0 (from neptune-client)
  Downloading bravado-11.0.3-py2.py3-none-any.whl (38 kB)
Collecting swagger-spec-v

## Rastro (neptune.ai)

Gerado rastro da execução no Neptune (detalhes no artigo [Rastro-DM: Mineração de Dados com Rastro](https://revista.tcu.gov.br/ojs/index.php/RTCU/article/view/1664))

### Importação de libraries para Rastro


In [115]:
import getpass, copy, tempfile, re
os.environ['NEPTUNE_ALLOW_SELF_SIGNED_CERTIFICATE'] = 'TRUE'
os.environ['NEPTUNE_PROJECT'] = 'marcusborela/IA386DD'
os.environ['NEPTUNE_API_TOKEN'] = getpass.getpass('Informe NEPTUNE_API_TOKEN')

tag_contexto_rastro = 'Aula9_InPars'
neptune_version = 0


In [None]:
### Código Rastro

Busca implementar o rastro proposto em [Rastro-DM: Mineração de Dados com Rastro](https://revista.tcu.gov.br/ojs/index.php/RTCU/article/view/1664), autores Marcus Vinícius Borela de Castro e Remis Balaniuk, com o apoio da [solução Neptune](https://app.neptune.ai/)




In [118]:
def converte_optimizer_state_dict(parm_optimizer)-> dict:
  """
    Recebe um objeto "parm_optimizer" que é do tipo "torch.optim.Optimizer" e retorna um dicionário 
    com informações sobre o otimizador.

    O dicionário de retorno é gerado a partir do estado do otimizador que é extraído da propriedade
    "state_dict()" do objeto "parm_optimizer", seu primeiro grupo de parâmetros do otimizador.
  """
  # return str(hparam['optimizer'])
  return parm_optimizer.state_dict()['param_groups'][0]
if neptune_version == 0:
  import neptune.new as neptune  
  class NeptuneRastroRun():
      se_geracao_rastro = True 
      neptune_project = ""
      tag_contexto_rastro = ""
      neptune_api_token = ""

      def __init__(self, parm_params:dict,  parm_lista_tag:list = None):
        # print(f"NeptuneRastroRun.init: se_geracao_rastro {self.__class__.se_geracao_rastro} parm_params `{parm_params} ")
        if self.__class__.se_geracao_rastro:      
          self.run_neptune = neptune.init(project=self.__class__.neptune_project, api_token=self.__class__.neptune_api_token, capture_hardware_metrics=True)
          self.run_neptune['sys/name'] = self.__class__.tag_contexto_rastro
          vparams = copy.deepcopy(parm_params)
          if "optimizer" in vparams:
            vparams["optimizer"] = converte_optimizer_state_dict(vparams["optimizer"])
          if 'criterion'  in vparams:
            vparams["criterion"] = str(vparams["criterion"])
          if 'scheduler'  in vparams:
            vparams["scheduler"] = str(type(vparams["scheduler"]))
          if 'device' in vparams:
            vparams['device'] = str(vparams["device"])
          self.device = vparams["device"]
          for tag in parm_lista_tag:
            self.run_neptune['sys/tags'].add(tag)
          self.run_neptune['parameters'] = vparams
          self.tmpDir = tempfile.mkdtemp()

      @property
      def run():
        return self.run_neptune

      @classmethod
      def ativa_geracao_rastro(cls):
        cls.se_geracao_rastro = True      

      @classmethod
      def def_contexto(cls):
        cls.se_geracao_rastro = True      

      @classmethod
      def desativa_geracao_rastro(cls):
        cls.se_geracao_rastro = False      

      @classmethod
      def retorna_status_geracao_rastro(cls):
        return cls.se_geracao_rastro      

      @classmethod
      def retorna_tag_contexto_rastro(cls):
        return cls.tag_contexto_rastro 

      @classmethod
      def inicia_contexto(cls, neptune_project, tag_contexto_rastro, neptune_api_token):
        assert '.' not in tag_contexto_rastro, "NeptuneRastroRun.init(): tag_contexto_rastro não pode possuir ponto, pois será usado para gravar nome de arquivo"      
        cls.neptune_api_token = neptune_api_token
        cls.tag_contexto_rastro = tag_contexto_rastro
        cls.neptune_project = neptune_project

      def salva_metrica(self, parm_metricas={}):
        #print(f"NeptuneRastroRun.salva_metrica: se_geracao_rastro {self.__class__.se_geracao_rastro} parm_metricas:{parm_metricas} ")
        if self.__class__.se_geracao_rastro:
          for metrica, valor in parm_metricas.items(): 
            self.run_neptune[metrica].log(valor)
  
      def gera_grafico_modelo(self, loader_train, model):
        if self.__class__.se_geracao_rastro: 
          # efetuar um forward 
          """
          se dataloader devolver x e y:
          """
          x_, y_ = next(iter(loader_train))
          x_ = x_.to(self.device)
          outputs = model(x_)
          """
          # se dataloader devolver dict:
          dados_ = next(iter(loader_train))
          outputs = model(dados_['x'].to(self.device))
          #outputs = model(x_['input_ids'].to(self.device), x_['attention_mask'].to(self.device))
          """
          nome_arquivo = os.path.join(self.tmpDir, "modelo "+ self.__class__.tag_contexto_rastro + time.strftime("%Y-%b-%d %H:%M:%S"))
          make_dot(outputs, params=dict(model.named_parameters()), show_attrs=True, show_saved=True).render(nome_arquivo, format="png")
          self.run_neptune["parameters/model_graph"].upload(nome_arquivo+'.png')
          self.run_neptune['parameters/model'] = re.sub('<bound method Module.state_dict of ', '',str(model.state_dict))      



      def stop(self):
        if self.__class__.se_geracao_rastro:         
          self.run_neptune.stop()

if neptune_version == 1:
  import neptune
  class NeptuneRastroRun():
      """
        Classe para geração de rastro de experimento utilizando a ferramenta Neptune.

        Busca implementar o rastro proposto em [Rastro-DM: Mineração de Dados com Rastro](https://revista.tcu.gov.br/ojs/index.php/RTCU/article/view/1664),
        autores Marcus Vinícius Borela de Castro e Remis Balaniuk, com o apoio da [solução Neptune](https://app.neptune.ai/)

        Attributes:
        -----------
        se_geracao_rastro : bool
            Indica se deve ser gerado rastro de experimento. 
        neptune_project : str
            Nome do projeto criado no Neptune. 
        tag_contexto_rastro : str
            Nome da tag utilizada para identificar o experimento.
        neptune_api_token : str
            Token utilizado para autenticação na API do Neptune. 
        run_neptune : object
            Objeto que representa o experimento no Neptune.
        device : str
            Dispositivo utilizado para o treinamento do modelo.
        tmpDir : str
          Diretório temporário utilizado para salvar gráfico do modelo.          
      """
      se_geracao_rastro = True 
      neptune_project = ""
      tag_contexto_rastro = ""
      neptune_api_token = ""

      def __init__(self, parm_params:dict,  parm_lista_tag:list = None):
        """
          Método construtor da classe NeptuneRastroRun.
          
          Args:
          - parm_params: dicionário contendo os parâmetros do modelo.
          - parm_lista_tag: lista contendo tags adicionais para o experimento.
        """      
        # print(f"NeptuneRastroRun.init: se_geracao_rastro {self.__class__.se_geracao_rastro} parm_params `{parm_params} ")
        if self.__class__.se_geracao_rastro:      
          self.run_neptune = neptune.init_run(project=self.__class__.neptune_project, api_token=self.__class__.neptune_api_token, capture_hardware_metrics=True)
          self.run_neptune['sys/name'] = self.__class__.tag_contexto_rastro
          vparams = copy.deepcopy(parm_params)
          if "optimizer" in vparams:
            vparams["optimizer"] = converte_optimizer_state_dict(vparams["optimizer"])
          if 'criterion'  in vparams:
            vparams["criterion"] = str(vparams["criterion"])
          if 'scheduler'  in vparams:
            vparams["scheduler"] = str(type(vparams["scheduler"]))
          if 'device' in vparams:
            vparams['device'] = str(vparams["device"])
          self.device = vparams["device"]
          for tag in parm_lista_tag:
            self.run_neptune['sys/tags'].add(tag)
          self.run_neptune['parameters'] = vparams
          # self.tmpDir = tempfile.mkdtemp()

      @property
      def run():
        """
        Retorna a instância do objeto run_neptune.
        """      
        return self.run_neptune

      @classmethod
      def ativa_geracao_rastro(cls):
        """
        Ativa a geração de rastro.
        """      
        cls.se_geracao_rastro = True      

      @classmethod
      def def_contexto(cls):
        """
        Define o contexto para a geração de rastro.
        """      
        cls.se_geracao_rastro = True      

      @classmethod
      def desativa_geracao_rastro(cls):
        """
        Desativa a geração de rastro.
        """      
        cls.se_geracao_rastro = False      

      @classmethod
      def retorna_status_geracao_rastro(cls):
        """
          Retorna o status da geração de rastro.
          
          Returns:
          - True se a geração de rastro está ativada, False caso contrário.
        """      
        return cls.se_geracao_rastro      

      @classmethod
      def retorna_tag_contexto_rastro(cls):
        """
          Retorna a tag do contexto de rastro.
        """      
        return cls.tag_contexto_rastro 

      @classmethod
      def inicia_contexto(cls, neptune_project, tag_contexto_rastro, neptune_api_token):
        """
        Inicia o contexto de execução no Neptune.

        Args:
            neptune_project (str): Nome do projeto no Neptune.
            tag_contexto_rastro (str): Tag que identifica o contexto de execução no Neptune.
            neptune_api_token (str): Token de acesso à API do Neptune.

        Raises:
            AssertionError: Caso a tag_contexto_rastro possua um ponto (.), 
              o que pode gerar erros na gravação de arquivo.
        """      
        assert '.' not in tag_contexto_rastro, "NeptuneRastroRun.init(): tag_contexto_rastro não pode possuir ponto, pois será usado para gravar nome de arquivo"      
        cls.neptune_api_token = neptune_api_token
        cls.tag_contexto_rastro = tag_contexto_rastro
        cls.neptune_project = neptune_project

      def salva_metrica(self, parm_metricas={}):
        """
          Salva as métricas no Neptune Run caso a geração de rastro esteja ativa.

          Parameters
          ----------
          parm_metricas: dict
              Dicionário contendo as métricas a serem salvas. As chaves devem ser os nomes das métricas e os valores devem ser
              os valores das métricas.
        """
        #print(f"NeptuneRastroRun.salva_metrica: se_geracao_rastro {self.__class__.se_geracao_rastro} parm_metricas:{parm_metricas} ")
        if self.__class__.se_geracao_rastro:
          for metrica, valor in parm_metricas.items(): 
            self.run_neptune[metrica].append(valor)
  
      def gera_grafico_modelo(self, loader_train, model):
        """
          Gera um gráfico do modelo e o envia para o Neptune. 
          Para gerar o gráfico, um forward pass é realizado em um batch de exemplos 
          de treino e o resultado é renderizado como um gráfico de nós conectados. 
          O gráfico é salvo em um arquivo .png e enviado para o Neptune como um arquivo anexo.

          Args:
              loader_train (torch.utils.data.DataLoader): DataLoader do conjunto de treinamento.
              model (torch.nn.Module): Modelo a ser visualizado.
          
          Pendente:
            Evolui para usar from io import StringIO (buffer = io.StringIO()) ao invés de tempdir 
        """    
        return

        """
        falta ajustar make_dot
        if self.__class__.se_geracao_rastro: 
          # efetuar um forward 
          batch = next(iter(loader_train))
          # falta generalizar linha abaixo. Criar função que recebe modelo e batch como parâmetro?
          outputs = model(input_ids=batch['input_ids'].to(hparam['device']), attention_mask=batch['attention_mask'].to(hparam['device']), token_type_ids=batch['token_type_ids'].to(hparam['device']), labels=batch['labels'].to(hparam['device']))
          nome_arquivo = os.path.join(self.tmpDir, "modelo "+ self.__class__.tag_contexto_rastro + time.strftime("%Y-%b-%d %H:%M:%S"))
          make_dot(outputs, params=dict(model.named_parameters()), show_attrs=True, show_saved=True).render(nome_arquivo, format="png")
          self.run_neptune["parameters/model_graph"].upload(nome_arquivo+'.png')
          self.run_neptune['parameters/model'] = re.sub('<bound method Module.state_dict of ', '',str(model.state_dict))      
        """


      def stop(self):
        """
          Para a execução do objeto Neptune. Todos os experimentos do Neptune são sincronizados com o servidor, e nenhum outro 
          experimento poderá ser adicionado a este objeto após a chamada a este método.
        """
        if self.__class__.se_geracao_rastro:         
          self.run_neptune.stop()



In [119]:
### Definindo parâmetros para o rastro
NeptuneRastroRun.inicia_contexto(os.environ['NEPTUNE_PROJECT'], tag_contexto_rastro,  os.environ['NEPTUNE_API_TOKEN'])
#NeptuneRastroRun.desativa_geracao_rastro()

### Código Rastro

Busca implementar o rastro proposto em [Rastro-DM: Mineração de Dados com Rastro](https://revista.tcu.gov.br/ojs/index.php/RTCU/article/view/1664), autores Marcus Vinícius Borela de Castro e Remis Balaniuk, com o apoio da [solução Neptune](https://app.neptune.ai/)



In [120]:
def converte_optimizer_state_dict(parm_optimizer)-> dict:
  """
    Recebe um objeto "parm_optimizer" que é do tipo "torch.optim.Optimizer" e retorna um dicionário
    com informações sobre o otimizador.

    O dicionário de retorno é gerado a partir do estado do otimizador que é extraído da propriedade
    "state_dict()" do objeto "parm_optimizer", seu primeiro grupo de parâmetros do otimizador.
  """
  # return str(hparam['optimizer'])
  return parm_optimizer.state_dict()['param_groups'][0]

In [121]:
def gera_tag_rastro_experiencia_treino(parm_aula: str, hparam: dict) -> str:
    """
    Gera uma string formatada com informações de hiperparâmetros para ser usada como tag de rastro de experiência de treino.

    Args:
        parm_aula (str): Nome da aula que está sendo treinada.
        hparam (dict): Dicionário contendo os hiperparâmetros utilizados no treinamento.

    Returns:
        str: String formatada com as informações de hiperparâmetros.

    Uso:

    hparam['lista_tag_rastro_experiencia_treino'] =        gera_tag_rastro_experiencia_treino(parm_aula='aula7', hparam=hparam)
    """
    # Inicializa uma lista vazia para armazenar as tags
    lista_tag = []

    # Lista com as chaves dos hiperparâmetros que serão utilizados
    lista_chaves = ['embed_dim', 'leiaute_input', 'dim_feedforward', 'max_seq_length', 'ind_activation_function', 'batch_size', 'learning_rate', 'weight_decay', 'amsgrad', 'decrease_factor_lr', 'max_examples', 'eval_every_steps']

    # Itera pelas chaves da lista e cria uma string com a chave e o valor correspondente em hparam,
    # adicionando essa string à lista_tag
    for chave in lista_chaves:
        if chave in hparam:
          tag = f"{chave} {hparam[chave]}"
          lista_tag.append(tag)

    # Concatena a lista de tags em uma única string, separando cada tag por '|',
    # e adicionando o nome da aula como prefixo
    tag_formatada = f"{parm_aula}|" + "|".join(lista_tag)

    return tag_formatada




# Data load

In [34]:
PATH_LOCAL_DATA = '../..'

In [37]:
# path_data = '/content/drive/MyDrive/treinamento/202301_IA368DD/indir/data/train_data_juris_tcu_index_bm25.csv'

# PATH_TRAIN_DATA_ZIP = f"{PATH_LOCAL_DATA}/data/train_data_juris_tcu_index.zip"
PATH_TRAIN_DATA = f"{PATH_LOCAL_DATA}/data/train_juris_tcu_index/train_data_juris_tcu_index.csv"

In [38]:
os.path.exists(PATH_TRAIN_DATA)

True

In [None]:
%%time
if not os.path.exists(PATH_TRAIN_DATA):
  import zipfile
  !wget https://github.com/marcusborela/ind-ir/raw/main/data/train_juris_tcu_index/train_data_juris_tcu_index.csv -O {PATH_TRAIN_DATA}

  # Extrair o arquivo zip
  with zipfile.ZipFile(PATH_TRAIN_DATA_ZIP, 'r') as zip_ref:
      zip_ref.extractall(PATH_LOCAL_DATA)

  # Listar os arquivos extraídos
  arquivos_extraidos = zip_ref.namelist()
  # Exibir os arquivos extraídos
  for arquivo in arquivos_extraidos:
      print(arquivo)
  print("File loaded")
else:
  print("File already there!")

File already there!
CPU times: user 866 µs, sys: 0 ns, total: 866 µs
Wall time: 1.2 ms


In [39]:
df_data = pd.read_csv(PATH_TRAIN_DATA)

In [40]:
df_data.shape

(402738, 7)

Verificando correção do arquivo!

In [41]:
print(df_data.isnull().sum())

QUERY_ID      0
DOC_ID        0
RELEVANCE     0
SCORE         0
TYPE          0
DOC_TEXT      0
QUERY_TEXT    0
Length: 7, dtype: int64


In [42]:
df_data[['QUERY_TEXT','DOC_TEXT']].applymap(len).describe()

Unnamed: 0,QUERY_TEXT,DOC_TEXT
count,402738.0,402738.0
mean,322.8252313,830.6957451
std,165.8299958,398.1844365
min,41.0,86.0
25%,217.0,572.0
50%,294.0,759.0
75%,391.0,1020.0
max,4212.0,3739.0


Para cada positivo, tem 2 negativos

In [43]:
df_data['RELEVANCE'].describe()

count    402738.0000000
mean          0.1666667
std           0.3726785
min           0.0000000
25%           0.0000000
50%           0.0000000
75%           0.0000000
max           1.0000000
Name: RELEVANCE, Length: 8, dtype: float64

In [44]:
from sklearn.model_selection import train_test_split

In [45]:
df_data[['QUERY_TEXT','DOC_TEXT']]

Unnamed: 0,QUERY_TEXT,DOC_TEXT
0,O dever de observância à hierarquia militar nã...,"O termo é ""Agente público"".\nAgente público te..."
1,O dever de observância à hierarquia militar nã...,"O termo é ""Reforma-prêmio"".\nReforma-prêmio te..."
2,O dever de observância à hierarquia militar nã...,"O termo é ""Exercício financeiro anterior"".\nEx..."
3,O dever de observância à hierarquia militar nã...,"O termo é ""CJF"".\nCJF é classificado como uma ..."
4,O dever de observância à hierarquia militar nã...,"O termo é ""Embratur"".\nEmbratur é classificado..."
...,...,...
402733,SÚMULA TCU 1: Não se compreendem como vencimen...,"O termo é ""Oitiva"".\nOitiva tem definição: ""Co..."
402734,SÚMULA TCU 1: Não se compreendem como vencimen...,"O termo é ""Derrogação"".\nDerrogação tem nota d..."
402735,SÚMULA TCU 1: Não se compreendem como vencimen...,"O termo é ""Presunção relativa"".\nPresunção rel..."
402736,SÚMULA TCU 1: Não se compreendem como vencimen...,"O termo é ""Contas extraordinárias"".\nContas ex..."


# Separating train and valid


In [46]:
hparam['percent_test_size'] = .005 #@param [.10, .15, .20] {type:'raw'}

In [47]:
X_train, X_valid, Y_train, Y_valid = train_test_split(df_data[['QUERY_TEXT','DOC_TEXT']].values,
                                                      df_data['RELEVANCE'].values,
                                                      test_size=hparam['percent_test_size'],
                                                      stratify=df_data['RELEVANCE'].values, random_state=num_semente)


In [48]:
X_train.shape

(400724, 2)

In [49]:
X_train.shape, type(X_valid), type(Y_valid), X_valid.shape, Y_valid.shape

((400724, 2), numpy.ndarray, numpy.ndarray, (2014, 2), (2014,))

In [50]:
print(np.unique(Y_train, return_counts=True), '\n', np.unique(Y_valid, return_counts=True))

(array([0, 1]), array([333937,  66787])) 
 (array([0, 1]), array([1678,  336]))


In [51]:
X_train[:1]

array([['Não é vedado o pagamento de serviços extraordinários a servidor ocupante de função comissionada.',
        'O termo é "Afastamento".\nAfastamento tem nota de escopo: "Usar como termo modificador subordinado a determinados assuntos. No caso de afastamento cautelar e temporário, previsto no art. 44 da Lei 8.443/92, usar Afastamento de responsável. Para situações genéricas dos afastamentos previstos na Lei 8.112/90,  usar Afastamento de pessoal. Usar Afastamento do país para a saída do Brasil que não seja na situação tratada nos arts. 95 e 96 da Lei 8.112/90, nesse caso usar Afastamento para estudo ou missão no exterior.".\nAfastamento tem tradução em espanhol: "Eliminación".\nAfastamento tem tradução em inglês: "Removal".']],
      dtype=object)

In [52]:
Y_train[:2]

array([0, 0])

# Tokenizer and Dataset

#### Carga

In [53]:
hparam['model_name'] = 'unicamp-dl/mMiniLM-L6-v2-pt-v2'

In [54]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification


In [57]:
# se local
nome_caminho_modelo = "/home/borela/fontes/relevar-busca/modelo/" + hparam['model_name']
assert os.path.exists(nome_caminho_modelo), f"Path para {hparam['model_name']} não existe!"


In [58]:
# tokenizer = AutoTokenizer.from_pretrained(hparam['model_name'])
# se local
tokenizer = AutoTokenizer.from_pretrained(nome_caminho_modelo)

loading file sentencepiece.bpe.model
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json


#### Experimentações e computando limites

In [59]:
tokenizer.pad_token_id,tokenizer.cls_token_id,tokenizer.sep_token_id

(1, 0, 2)

In [60]:
x='x, it is just a test!'
y='y, it is just a continuation!'

In [61]:
tokenizer(x,y)['input_ids']

[0,
 1022,
 4,
 442,
 83,
 1660,
 10,
 3034,
 38,
 2,
 2,
 113,
 4,
 442,
 83,
 1660,
 10,
 9454,
 1363,
 38,
 2]

In [62]:
tokenizer.encode_plus(x,y)['input_ids']

[0,
 1022,
 4,
 442,
 83,
 1660,
 10,
 3034,
 38,
 2,
 2,
 113,
 4,
 442,
 83,
 1660,
 10,
 9454,
 1363,
 38,
 2]

### Definindo parâmetros e informações para o rastro (hparam) - sobre dados

In [63]:
# visto que "max_position_embeddings": 512 em https://huggingface.co/microsoft/MiniLM-L12-H384-uncased/blob/main/config.json

hparam['vocab_size'] = tokenizer.vocab_size
hparam['batch_size'] = 32
hparam['num_epochs'] = 10  #@param {type:"slider", min:1, max:100, step:1}
hparam['max_seq_length'] = 512  #tokenizer.max_model_input_sizes
hparam['num_sentenca_train'] = X_train.shape[0] # ou 37400
hparam['num_sentenca_valid'] = X_valid.shape[0] #


Definindo o prefixo de nome do arquivo

In [64]:
PATH_LOCAL_DATA

'../..'

In [65]:
PATH_TRAIN_LOCAL = f"{PATH_LOCAL_DATA}/model/train/minilm"

In [66]:
if not os.path.exists(PATH_TRAIN_LOCAL):
  os.makedirs(PATH_TRAIN_LOCAL)
  print(f'{PATH_TRAIN_LOCAL} pasta criada!')
else:
  print(f'{PATH_TRAIN_LOCAL} pasta já existia!')


../../model/train/minilm pasta criada!


In [None]:
os.path.exists(PATH_TRAIN_LOCAL)

True

## Criando código para o Dataset

In [69]:
class MyDataset():
    """
      Classe para representar um dataset de texto e classes.
    """
    def __init__(self, texts: np.ndarray, classes:list[int], tokenizer, max_seq_length: int):
      """
      Inicializa um novo objeto MyDataset.

      Args:
          texts (np.ndarray): um array com as strings de texto. Cada linha deve ter 2 strings.
          classes (np.ndarray): um array com as classes de cada texto.
          tokenizer: um objeto tokenizer do Hugging Face Transformers.
          max_seq_length (int): o tamanho máximo da sequência a ser considerado.
      Raises:
          AssertionError: se os parâmetros não estiverem no formato esperado.
      """
      # Verifica se os parâmetros são do tipo esperado
      assert isinstance(texts, np.ndarray), f"Parâmetro texts deve ser do tipo np.ndarray e não {type(texts)}"
      assert texts.shape[1] == 2, "Array must have 2 columns"
      for row in texts:
          assert isinstance(row, np.ndarray) and row.shape == (2,), f"Each row in texts must have 2 elements"
          assert isinstance(row[0], str) and isinstance(row[1], str), f"Each element in texts.row must be a string e não {type(row[0])}"
      assert isinstance(classes,np.ndarray), f'classes deve ser do tipo np.ndarray e não {type(classes)}'
      assert isinstance(classes[0],np.int64), f'classes[0] deve ser do tipo numpy.int64 e não {type(classes[0])} '
      assert isinstance(max_seq_length, int), f'max_seq_length deve ser do tipo int e não {type(max_seq_length)}'
      assert max_seq_length > 100, f'max_seq_length deve ser maior do que 100'

      self.texts = texts
      self.classes = classes
      self.tokenizer = tokenizer
      self.max_seq_length = max_seq_length

      # Salvar os dados dos tensores
      x_data_input_ids = []
      x_data_token_type_ids = []
      x_data_attention_masks = []
      for text_pair in tqdm(texts, desc='encoding text pair'):
          encoding = tokenizer.encode_plus(
              text_pair[0],
              text_pair[1],
              add_special_tokens=True,
              max_length=self.max_seq_length,
              padding='max_length',
              return_tensors = 'pt',
              truncation=True,
              return_attention_mask=True,
              return_token_type_ids=True
          )
          x_data_input_ids.append(encoding['input_ids'].long())
          x_data_token_type_ids.append(encoding['token_type_ids'].long())
          x_data_attention_masks.append(encoding['attention_mask'].long())
      print(F'\tVou converter lista para tensor;  Momento: {time.strftime("[%Y-%b-%d %H:%M:%S]")}')
      # squeeze: vai transformar um tensor de shape [2, 1, 322] em um tensor de shape [2, 322].

      self.x_tensor_input_ids = torch.stack(x_data_input_ids).squeeze(1)
      self.x_tensor_attention_masks = torch.stack(x_data_attention_masks).squeeze(1)
      self.x_tensor_token_type_ids = torch.stack(x_data_token_type_ids).squeeze(1)

    def __len__(self):
        """
          Retorna o tamanho do dataset (= tamanho do array texts)
        """
        return len(self.texts)

    def __getitem__(self, idx):
        """
          Retorna um dicionário com os dados do texto e sua classe correspondente, em um formato que pode
          ser usado pelo dataloader do PyTorch para alimentar um modelo de aprendizado de máquina.
        """
        return {
            'input_ids': self.x_tensor_input_ids[idx],
            'attention_mask': self.x_tensor_attention_masks[idx],
            'token_type_ids': self.x_tensor_token_type_ids[idx],
            # 'labels': int(self.classes[idx])
            # 'labels': torch.tensor(self.classes[idx], dtype=torch.float)
            # 'labels' : torch.tensor(-1. if self.classes[idx] == 0 else 1., dtype=torch.float)
            'labels' : torch.tensor(self.classes[idx], dtype=torch.float)
        }

#### Testando o MyDataset e o Dataloader

In [70]:
# Cria dados fictícios
texts = np.array([['This is the first text', 'This is the second text'],
                  ['This is text 2.1', 'This is text 2.2'],
                  ['This is text 3.1', 'This is text 3.2'],
                  ['This is text 4.1', 'This is text 4.2'],
                  ['This is text 5.1', 'This is text 5.2'],
                  ['This is text 6.1', 'This is text 6.2'],
                  ['This is text 7.1', 'This is text 7.2']])
classes = np.array([1, 0, 1, 0, 1, 0, 1])


In [71]:
# Cria um objeto da classe MyDataset
dummy_dataset = MyDataset(texts=texts, classes=classes, tokenizer=tokenizer, max_seq_length=hparam['max_seq_length'])

encoding text pair: 100%|██████████| 7/7 [00:00<00:00, 39.30it/s]


	Vou converter lista para tensor;  Momento: [2023-Jun-27 00:42:39]


In [72]:
# Testa o método __len__()
assert len(dummy_dataset) == 7

# Testa o método __getitem__()
sample = dummy_dataset[0]
assert set(sample.keys()) == {'input_ids', 'attention_mask', 'token_type_ids', 'labels'} #
assert isinstance(sample['input_ids'], torch.Tensor)
assert sample['input_ids'].shape[0] == hparam['max_seq_length']
assert isinstance(sample['attention_mask'], torch.Tensor)
assert sample['attention_mask'].shape[0] == hparam['max_seq_length']
# assert isinstance(sample['labels'], int)
assert isinstance(sample['labels'], torch.Tensor)
# print(sample['labels'], sample['labels'].shape)


In [73]:
print(sample)

{'input_ids': tensor([    0,  3293,    83,    70,  5117,  7986,     2,     2,  3293,    83,
           70, 17932,  7986,     2,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1, 

In [74]:
dummy_loader = DataLoader(dummy_dataset, batch_size=2, shuffle=False, num_workers=hparam['num_workers_dataloader'] )

In [77]:
from transformers import BatchEncoding

In [78]:
first_batch = next(iter(dummy_loader))

In [79]:
BatchEncoding(first_batch).to(hparam['device'])

{'input_ids': tensor([[   0, 3293,   83,  ...,    1,    1,    1],
        [   0, 3293,   83,  ...,    1,    1,    1]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]], device='cuda:0'), 'token_type_ids': tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], device='cuda:0'), 'labels': tensor([1.0000e+00, 0.0000e+00], device='cuda:0')}

In [80]:
first_batch

{'input_ids': tensor([[   0, 3293,   83,  ...,    1,    1,    1],
         [   0, 3293,   83,  ...,    1,    1,    1]]),
 'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0]]),
 'token_type_ids': tensor([[0, 0, 0,  ..., 0, 0, 0],
         [0, 0, 0,  ..., 0, 0, 0]]),
 'labels': tensor([1.0000e+00, 0.0000e+00])}

Confirmando efeito do parâmetro shuffle em DataLoader

In [81]:
dummy_loader = DataLoader(dummy_dataset, batch_size=2, shuffle=False, num_workers=hparam['num_workers_dataloader'])
for ndx, epoch in tqdm(enumerate(range(hparam['num_epochs'])), desc='Epochs'):
  for batch in dummy_loader:
    print(batch['labels'])
  print(f"Fim época {ndx}")


Epochs: 10it [00:00, 1035.73it/s]

tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00])
Fim época 0
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00])
Fim época 1
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00])
Fim época 2
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00])
Fim época 3
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00])
Fim época 4
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00])
Fim época 5
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00])
Fim época 6
tensor([1.0000e+00, 0.0000e+00])
tensor([1.0000e+00, 0.0000e+00])
tensor([1.




In [None]:
dummy_loader = DataLoader(dummy_dataset, batch_size=2, shuffle=True, num_workers=hparam['num_workers_dataloader'])
for ndx, epoch in tqdm(enumerate(range(hparam['num_epochs'])), desc='Epochs'):
  for batch in dummy_loader:
    print(batch['labels'])
  print(f"Fim época {ndx}")

Epochs: 10it [00:00, 195.58it/s]

tensor([-1.0000e+00, 1.0000e+00])
tensor([1.0000e+00, 1.0000e+00])
tensor([-1.0000e+00, -1.0000e+00])
tensor([1.0000e+00])
Fim época 0
tensor([-1.0000e+00, 1.0000e+00])
tensor([-1.0000e+00, 1.0000e+00])
tensor([1.0000e+00, 1.0000e+00])
tensor([-1.0000e+00])
Fim época 1
tensor([-1.0000e+00, -1.0000e+00])
tensor([-1.0000e+00, 1.0000e+00])
tensor([1.0000e+00, 1.0000e+00])
tensor([1.0000e+00])
Fim época 2
tensor([1.0000e+00, -1.0000e+00])
tensor([1.0000e+00, 1.0000e+00])
tensor([-1.0000e+00, -1.0000e+00])
tensor([1.0000e+00])
Fim época 3
tensor([1.0000e+00, -1.0000e+00])
tensor([-1.0000e+00, 1.0000e+00])
tensor([1.0000e+00, -1.0000e+00])
tensor([1.0000e+00])
Fim época 4
tensor([1.0000e+00, 1.0000e+00])
tensor([1.0000e+00, 1.0000e+00])
tensor([-1.0000e+00, -1.0000e+00])
tensor([-1.0000e+00])
Fim época 5
tensor([1.0000e+00, -1.0000e+00])
tensor([1.0000e+00, 1.0000e+00])
tensor([-1.0000e+00, 1.0000e+00])
tensor([-1.0000e+00])
Fim época 6
tensor([-1.0000e+00, 1.0000e+00])
tensor([-1.0000e+00, 




In [82]:
del dummy_loader, dummy_dataset

# Treinamento do ranqueador

Fonte: https://colab.research.google.com/drive/10etP7Lb915EC-uEuf1IKC8DYkyg_om6-?usp=sharing

In this notebook, we will finetuned and evaluate a BERT-based model on a document classification task.

Our dataset will be a smaller version of the IMDB Sentiment Analysis dataset.

To finetune and evaluate faster, we will be using [miniLM](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased), which is small BERT-like model distilled from a larger one.

## Carregando o Dataset e os Dataloaders


Independente da configuração do leiaute de treinamento, a validação é no leiaute qp

In [83]:
datasets_carregados_previamente = False #@param {type:"boolean"}

print(datasets_carregados_previamente)

False


In [84]:
import io

In [85]:
%%time
if datasets_carregados_previamente:
  with open(PATH_TRAIN_LOCAL+'/dataset_valid_len_'+str(hparam['num_sentenca_valid'])+'.pt','rb') as f:
    buffer = io.BytesIO(f.read())
  dataset_valid = torch.load(buffer)
  with open(PATH_TRAIN_LOCAL+'/dataset_train'+'_len_'+str(hparam['num_sentenca_train'])+'.pt','rb') as f:
    buffer = io.BytesIO(f.read())
  dataset_train = torch.load(buffer)
  print('Datasets carregados')

CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 5.25 µs


In [87]:
%%time
if not datasets_carregados_previamente:
  print("carregando valid_dataset")
  dataset_valid = MyDataset(texts=X_valid, classes=Y_valid, tokenizer=tokenizer, max_seq_length=hparam['max_seq_length'])
  torch.save(dataset_valid, PATH_TRAIN_LOCAL+'/dataset_valid_len_'+str(hparam['num_sentenca_valid'])+'.pt')

carregando valid_dataset


encoding text pair: 100%|██████████| 2014/2014 [00:01<00:00, 1283.63it/s]


	Vou converter lista para tensor;  Momento: [2023-Jun-27 00:44:00]
CPU times: user 1.62 s, sys: 37.9 ms, total: 1.65 s
Wall time: 1.65 s


In [88]:
%%time
if not datasets_carregados_previamente:
  print("carregando train_dataset")
  dataset_train = MyDataset(texts=X_train, classes=Y_train, tokenizer=tokenizer, max_seq_length=hparam['max_seq_length'])
  torch.save(dataset_train, PATH_TRAIN_LOCAL+'/dataset_train'+'_len_'+str(hparam['num_sentenca_train'])+'.pt')

carregando train_dataset


encoding text pair: 100%|██████████| 400724/400724 [06:37<00:00, 1007.60it/s]


	Vou converter lista para tensor;  Momento: [2023-Jun-27 00:51:03]
CPU times: user 6min 40s, sys: 4.52 s, total: 6min 45s
Wall time: 6min 44s


## Finetuning

### Funções auxiliares de treinamento

In [89]:
gera_tag_rastro_experiencia_treino(parm_aula='indir-minilm', hparam=hparam)


'indir-minilm|max_seq_length 512|batch_size 32'

In [146]:
def treina_modelo (model, hparam, parm_dataloader_train, parm_dataloader_valid,
                   parm_intervalo_print:int=1,
                   parm_se_salva_modelo:bool=True, parm_se_gera_rastro:bool=True):
    global PATH_TRAIN_LOCAL

    if parm_se_gera_rastro:
        rastro_neptune = NeptuneRastroRun(hparam, parm_lista_tag= gera_tag_rastro_experiencia_treino(parm_aula='indir-minilm', hparam=hparam) )
    history = []
    n_examples = 0
    time_inicio_treino = time.time()



    try:
        best_metrica_valid = 0 # float('-inf')
        best_epoch = 0


        train_losses = []
        correct = 0
        train_samples = 0
        model.train()
        se_para_execucao = False
        ultimo_step_treinado = 0
        for epoch in tqdm(range(hparam['num_epochs']), desc='Epochs'):

            if n_examples >= hparam['max_examples'] or se_para_execucao:
                break


            for cnt_step, batch in enumerate(tqdm(parm_dataloader_train, mininterval=0.5, desc='Train', disable=False)):

                if n_examples >= hparam['max_examples'] or se_para_execucao:
                    break
                ultimo_step_treinado += 1
                hparam['optimizer'].zero_grad()
                # ipdb.set_trace(context=3)
                outputs = model(input_ids=batch['input_ids'].to(hparam['device']),
                                attention_mask=batch['attention_mask'].to(hparam['device']),
                                token_type_ids=batch['token_type_ids'].to(hparam['device']),
                                labels=batch['labels'].to(hparam['device']))
                loss = outputs.loss
                loss.backward()
                hparam['optimizer'].step()
                hparam['scheduler'].step()
                n_examples += len(batch['input_ids'])  # Increment of batch size
                train_losses.append(loss.cpu().item())
                train_samples += batch['labels'].size(0)

                # se saída for 0 e 1
                preds = (outputs.logits >= 0.5).float().squeeze().to(hparam['device'])
                correct += (preds == batch['labels'].to(hparam['device'])).sum().cpu().item()

                # se saída for -1 e 1
                # preds = torch.sign(outputs.logits).squeeze()
                # correct += (preds == labels).sum().item()



                # ipdb.set_trace(context=2)
                if ultimo_step_treinado % hparam['eval_every_steps']==0:
                    if isinstance(parm_dataloader_train,torch.utils.data.dataloader.DataLoader):
                        accuracy_train = correct / train_samples
                    elif isinstance(parm_dataloader_train,list):  # usado para overfit em um batch
                        accuracy_train = correct / train_samples
                        assert len(parm_dataloader_train) == 1, f"Calculo de accuracy para overfit em um batch precisa ser revisado"
                    else:
                        raise Exception(f'parm_dataloader_train has type unexpected {type(parm_dataloader_train)} ')
                    loss_train_mean = mean(train_losses)
                    train_losses = []
                    correct = 0
                    train_samples = 0
                    accuracy_valid, mean_loss_valid = evaluate(model=model, dataloader=parm_dataloader_valid, set_name='Valid')
                    model.train()
                    # ipdb.set_trace(context=2)
                    print(f'Epoch: {epoch + 1} Step: {ultimo_step_treinado} Training loss: {loss_train_mean:0.2f} accuracy: {accuracy_train:0.3f} \nEm validação: loss: {mean_loss_valid:0.3f}; accuracy: {accuracy_valid:0.3f}')
                    metrica_rastro = {"train/loss": loss_train_mean,
                                      "train/accuracy": accuracy_train,
                                      "train/steps": ultimo_step_treinado,
                                      "train/n_examples": n_examples,
                                      "train/learning_rate": hparam["optimizer"].param_groups[0]["lr"],
                                      "valid/loss": mean_loss_valid,
                                      "valid/accuracy": accuracy_valid}
                    history.append(metrica_rastro)
                    if parm_se_gera_rastro:
                        rastro_neptune.salva_metrica(metrica_rastro)

                    if parm_intervalo_print > 0:
                        if (ultimo_step_treinado)%(parm_intervalo_print*hparam['eval_every_steps']) == 0:
                            print(f'Step: {ultimo_step_treinado} Amostras:{n_examples:d}  {100*n_examples/hparam["max_examples"]:.3f}%  Momento: {time.strftime("[%Y-%b-%d %H:%M:%S]")}')
                            print(f'lr: {hparam["optimizer"].param_groups[0]["lr"]:.5e} Train loss: {loss_train_mean:.4f}  accuracy: {accuracy_train:.4f}')
                            print(f'Valid loss: {mean_loss_valid:.4f} accuracy: {accuracy_valid:.4f}  ')

                    if accuracy_valid > best_metrica_valid:
                        best_model_dict = model.state_dict()
                        best_metrica_valid = accuracy_valid
                        best_step = ultimo_step_treinado
                        if parm_se_salva_modelo:
                          path_modelo = f'{PATH_TRAIN_LOCAL}/model_epoch_{best_step}_treino_{time.strftime("%Y-%b-%d %H:%M:%S")}.pt'
                          # torch.save(model, path_modelo)
                          model.save_pretrained(path_modelo)

                        # print(f'encontrado best model Epoch: {epoch + 1} ')
                    elif hparam['early_stop'] < ( ultimo_step_treinado + 1 - best_step):
                        print(f"Parando por critério de early_stop na época {epoch + 1} step {ultimo_step_treinado} sendo best_step {best_step} e ealy_stop {hparam['early_stop']}")
                        se_para_execucao = True

        # fim do treino
        if isinstance(parm_dataloader_train,torch.utils.data.dataloader.DataLoader):
            accuracy_train = correct / len(parm_dataloader_train.dataset)
        elif isinstance(parm_dataloader_train,list):  # usado para overfit em um batch
            accuracy_train = correct / len(parm_dataloader_train[0]['labels'])
            assert len(parm_dataloader_train) == 1, f"Calculo de accuracy para overfit em um batch precisa ser revisado"

        #loss_train_mean = mean(train_losses)
        accuracy_valid, mean_loss_valid = evaluate(model=model, dataloader=parm_dataloader_valid, set_name='Valid')

        print(f'Epoch: {epoch + 1} Step: {ultimo_step_treinado} Training loss: {loss_train_mean:0.2f} accuracy: {accuracy_train:0.3f} \nEm validação: loss: {mean_loss_valid:0.3f}; accuracy: {accuracy_valid:0.3f}')
        metrica_rastro = {"train/loss": loss_train_mean,
                          "train/accuracy": accuracy_train,
                          "train/steps": ultimo_step_treinado,
                          "train/n_examples": n_examples,
                          "train/learning_rate": hparam["optimizer"].param_groups[0]["lr"],
                          "valid/loss": mean_loss_valid,
                          "valid/accuracy": accuracy_valid}
        history.append(metrica_rastro)
        if parm_se_gera_rastro:
            rastro_neptune.salva_metrica(metrica_rastro)

        print(f'** END ** Step: {ultimo_step_treinado} Amostras:{n_examples:d}  {100*n_examples/hparam["max_examples"]:.3f}%  Momento: {time.strftime("[%Y-%b-%d %H:%M:%S]")}')
        print(f'lr: {hparam["optimizer"].param_groups[0]["lr"]:.5e} Train loss: {loss_train_mean:.4f}  accuracy: {accuracy_train:.4f}')
        print(f'Valid loss: {mean_loss_valid:.4f} accuracy: {accuracy_valid:.4f}  ')


        if parm_se_salva_modelo:
            path_modelo = f'{PATH_TRAIN_LOCAL}/model_fim_treino_{time.strftime("%Y-%b-%d %H:%M:%S")}.pt'
            # torch.save(model, path_modelo)
            model.save_pretrained(path_modelo)
            print(f"Modelo final  (step {ultimo_step_treinado}) salvo em {path_modelo}")

            if best_model_dict is not None:
                model.load_state_dict(best_model_dict)
                model.to(hparam['device'])
                path_modelo = f'{PATH_TRAIN_LOCAL}/model_fim_treino_{time.strftime("%Y-%b-%d %H:%M:%S")}.pt'
                # torch.save(model, path_modelo)
                model.save_pretrained(path_modelo)
                print(f"Modelo com melhor resultado em validação (step {best_step}) salvo após treino em {path_modelo}")

        # calculando tempo gasto e médio por
        tempo_treino = time.time() - time_inicio_treino
        if parm_se_gera_rastro:
            rastro_neptune.run_neptune["context/tempo_treino"] = tempo_treino
            rastro_neptune.run_neptune["context/best_epoch"] = best_epoch
            rastro_neptune.run_neptune["context/tempo_treino_epoc_mean"] = tempo_treino/hparam['num_epochs']
            # rastro_neptune.gera_grafico_modelo(parm_dataloader_train, model)
    finally: # para não deixar em aberto a execução no Neptune
        if parm_se_gera_rastro:
            rastro_neptune.stop()
    return history

In [124]:
def count_parameters(model):
    """
    Conta o número de parâmetros treináveis em um modelo PyTorch.

    Args:
        model (torch.nn.Module): O modelo PyTorch a ser avaliado.

    Returns:
        int: O número de parâmetros treináveis do modelo.

    """
    # Retorna a soma do número de elementos em cada tensor de parâmetro que requer gradiente.
    # A propriedade "requires_grad" é definida como True para todos os tensores de parâmetro que
    # precisam ser treinados, enquanto que para aqueles que não precisam, ela é definida como False.
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

In [125]:
from statistics import mean, stdev

In [126]:
# We first define the evaluation function to measure accuracy and loss
def evaluate(model, dataloader, set_name):
    losses = []
    correct = 0
    model.eval()
    with torch.no_grad():
        for ndx, batch in tqdm(enumerate(dataloader), mininterval=0.5, desc=set_name, disable=False, total=len(dataloader)):
            input_ids = batch['input_ids'].to(hparam['device'])
            token_type_ids = batch['token_type_ids'].to(hparam['device'])
            attention_mask = batch['attention_mask'].to(hparam['device'])
            labels = batch['labels'].to(hparam['device'])
            outputs = model(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, labels=labels)
            loss_val = outputs.loss
            losses.append(loss_val.cpu().item())

            # se saída for 0 e 1
            preds = (outputs.logits >= 0.5).float().squeeze().to(hparam['device'])
            correct += (preds == labels).sum().item()

            # se saída for -1 e 1
            # preds = torch.sign(outputs.logits).squeeze()
            # correct += (preds == labels).sum().item()


            # if ndx < 5:
            #   print(f'Em evaluete. Batch {ndx}')
            #   print(f'loss_val: {loss_val} ')
            #   print(f'preds: {preds} \nlabels: {labels} ')
            #   print(f'correct: {correct}')

    # ipdb.set_trace(context=3)
    if isinstance(dataloader,torch.utils.data.dataloader.DataLoader):
        accuracy = correct / len(dataloader.dataset)
    elif isinstance(dataloader,list):  # usado para overfit em um batch
        accuracy = correct / len(dataloader[0]['labels'])
        assert len(dataloader) == 1, f"Calculo de accuracy para overfit em um batch precisa ser revisado"
    mean_loss = mean(losses)
    return accuracy, mean_loss

In [127]:
from torch import nn
from torch import optim
from tqdm.auto import tqdm
from transformers import get_linear_schedule_with_warmup

In [128]:
def inicializa_modelo():
  global model, hparam, dataloader_train, nome_caminho_modelo

  model = AutoModelForSequenceClassification.from_pretrained(nome_caminho_modelo).to(hparam['device'])
  hparam['learning_rate']= 1e-6 # 3e-5 # 1e-3
  hparam['num_params'] = count_parameters(model)
  print(f"Number of model parameters: {hparam['num_params']}")
  hparam['amsgrad']= False
  hparam['num_epochs'] = 4
  hparam['weight_decay'] = 1e-4
  hparam['early_stop'] = 8000
  hparam['batch_size'] = 32
  hparam['num_step_epoch'] = len(dataloader_train)
  hparam['num_training_steps'] = hparam['num_epochs'] * hparam['num_step_epoch']
  hparam['max_examples'] = hparam['num_training_steps'] * hparam['batch_size']
  hparam['eval_every_steps'] = 400
  hparam['num_warmup_steps'] = 400 # int(hparam['num_training_steps']  * 0.05)
  hparam['optimizer'] = torch.optim.Adam(model.parameters(), lr=hparam['learning_rate'], weight_decay= hparam['weight_decay'], amsgrad=hparam['amsgrad'])
  hparam['scheduler'] = get_linear_schedule_with_warmup(hparam['optimizer'], hparam['num_warmup_steps'], hparam['num_training_steps'])
  # hparam['step_eval'] = 1000


In [99]:
dataloader_train = DataLoader(dataset_train, batch_size=hparam['batch_size'], shuffle=True)
dataloader_valid = DataLoader(dataset_valid, batch_size=hparam['batch_size'], shuffle=False)

In [103]:
import gc

In [111]:
gc.collect()

5174

In [105]:
inicializa_modelo()

loading configuration file /home/borela/fontes/relevar-busca/modelo/unicamp-dl/mMiniLM-L6-v2-pt-v2/config.json
Model config XLMRobertaConfig {
  "_name_or_path": "/home/borela/fontes/relevar-busca/modelo/unicamp-dl/mMiniLM-L6-v2-pt-v2",
  "architectures": [
    "XLMRobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "id2label": {
    "0": "LABEL_0"
  },
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "label2id": {
    "LABEL_0": 0
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "xlm-roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "sbert_ce_default_activation_function": "torch.nn.modules.linear.Identity",
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "type_vocab_size": 

Number of model parameters: 106994305


Depois tentar definir:hparam[ hparam['dim_feedforward']=256

In [None]:
raise Exception('parar aqui')

Exception: ignored

### Evaluating on the valid dataset before training

In [112]:
%%time
accuracy, mean_loss = evaluate(model=model, dataloader=dataloader_valid, set_name='Valid')
print(f'Em validação: loss: {mean_loss:0.3f}; accuracy: {accuracy:0.3f}')

Valid: 100%|██████████| 63/63 [00:09<00:00,  6.63it/s]

Em validação: loss: 58.289; accuracy: 0.823
CPU times: user 4.9 s, sys: 320 ms, total: 5.22 s
Wall time: 9.5 s





### Testando em poucos dados: um batch apenas. Esperado overfit loss próxima de zero.

Por algum motivo a investigar quando passo dataset como lista não está gerando aprendizado. Mas quando passo normal, o modelo está aprendendo.




In [113]:
hparam['num_epochs'] = 5
hparam['early_stop'] = 3
hparam['eval_every_steps'] = 1

In [None]:
history =  treina_modelo (model, hparam,  [next(iter(dataloader_train))], [next(iter(dataloader_valid))], parm_se_salva_modelo=True, parm_se_gera_rastro=True)

Treinando em todos os dados

ultimo_step_treinado , correct, cnt_step, n_examples, train_losses
accuracy_train, accuracy_valid, mean_loss_valid



In [136]:
torch.cuda.empty_cache()

In [137]:
gc.collect()

3817

In [138]:
inicializa_modelo()

loading configuration file /home/borela/fontes/relevar-busca/modelo/unicamp-dl/mMiniLM-L6-v2-pt-v2/config.json
Model config XLMRobertaConfig {
  "_name_or_path": "/home/borela/fontes/relevar-busca/modelo/unicamp-dl/mMiniLM-L6-v2-pt-v2",
  "architectures": [
    "XLMRobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "id2label": {
    "0": "LABEL_0"
  },
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "label2id": {
    "LABEL_0": 0
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "xlm-roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "sbert_ce_default_activation_function": "torch.nn.modules.linear.Identity",
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "type_vocab_size": 

Number of model parameters: 106994305


In [139]:
mostra_memoria(['cpu', 'gpu'])

Your runtime RAM in gb: 
 total 67.35
 available 41.29
 used 24.82
 free 13.67
 cached 27.89
 buffers 0.97
/nGPU
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Tue Jun 27 01:26:20 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.03   Driver Version: 525.116.03   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 59%   48C 

### Treino do conjunto completo

In [140]:
hparam

{'num_workers_dataloader': 0,
 'device': device(type='cuda', index=0),
 'percent_test_size': 0.005,
 'model_name': 'unicamp-dl/mMiniLM-L6-v2-pt-v2',
 'vocab_size': 250002,
 'batch_size': 32,
 'num_epochs': 4,
 'max_seq_length': 512,
 'num_sentenca_train': 400724,
 'num_sentenca_valid': 2014,
 'learning_rate': 1e-06,
 'num_params': 106994305,
 'amsgrad': False,
 'weight_decay': 0.0001,
 'early_stop': 8000,
 'num_step_epoch': 12523,
 'num_training_steps': 50092,
 'max_examples': 1602944,
 'eval_every_steps': 400,
 'num_warmup_steps': 400,
 'optimizer': Adam (
 Parameter Group 0
     amsgrad: False
     betas: (0.9, 0.999)
     capturable: False
     differentiable: False
     eps: 1e-08
     foreach: None
     fused: None
     initial_lr: 1e-06
     lr: 0.0
     maximize: False
     weight_decay: 0.0001
 ),
 'scheduler': <torch.optim.lr_scheduler.LambdaLR at 0x7f840460ba60>}

In [141]:
history = treina_modelo (model, hparam,  dataloader_train, dataloader_valid, parm_intervalo_print=1)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Epochs:   0%|          | 0/4 [00:00<?, ?it/s]
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.63it/s]
Configuration saved in ../../model/train/minilm/model_epoch_400_treino_2023-Jun-27 01:29:13.pt/config.json


Epoch: 1 Step: 400 Training loss: 27.61 accuracy: 0.727 
Em validação: loss: 6.575; accuracy: 0.610
Step: 400 Amostras:12800  0.799%  Momento: [2023-Jun-27 01:29:13]
lr: 1.00000e-06 Train loss: 27.6101  accuracy: 0.7274
Valid loss: 6.5750 accuracy: 0.6097  


Model weights saved in ../../model/train/minilm/model_epoch_400_treino_2023-Jun-27 01:29:13.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.54it/s]
Configuration saved in ../../model/train/minilm/model_epoch_800_treino_2023-Jun-27 01:31:22.pt/config.json


Epoch: 1 Step: 800 Training loss: 2.80 accuracy: 0.613 
Em validação: loss: 1.697; accuracy: 0.622
Step: 800 Amostras:25600  1.597%  Momento: [2023-Jun-27 01:31:22]
lr: 9.91950e-07 Train loss: 2.7960  accuracy: 0.6128
Valid loss: 1.6968 accuracy: 0.6216  


Model weights saved in ../../model/train/minilm/model_epoch_800_treino_2023-Jun-27 01:31:22.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.60it/s]
Configuration saved in ../../model/train/minilm/model_epoch_1200_treino_2023-Jun-27 01:33:31.pt/config.json


Epoch: 1 Step: 1200 Training loss: 0.97 accuracy: 0.653 
Em validação: loss: 0.850; accuracy: 0.626
Step: 1200 Amostras:38400  2.396%  Momento: [2023-Jun-27 01:33:31]
lr: 9.83901e-07 Train loss: 0.9683  accuracy: 0.6527
Valid loss: 0.8495 accuracy: 0.6261  


Model weights saved in ../../model/train/minilm/model_epoch_1200_treino_2023-Jun-27 01:33:31.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.39it/s]
Configuration saved in ../../model/train/minilm/model_epoch_1600_treino_2023-Jun-27 01:35:40.pt/config.json


Epoch: 1 Step: 1600 Training loss: 0.54 accuracy: 0.691 
Em validação: loss: 0.530; accuracy: 0.666
Step: 1600 Amostras:51200  3.194%  Momento: [2023-Jun-27 01:35:40]
lr: 9.75851e-07 Train loss: 0.5432  accuracy: 0.6910
Valid loss: 0.5296 accuracy: 0.6658  


Model weights saved in ../../model/train/minilm/model_epoch_1600_treino_2023-Jun-27 01:35:40.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.26it/s]
Configuration saved in ../../model/train/minilm/model_epoch_2000_treino_2023-Jun-27 01:37:51.pt/config.json


Epoch: 1 Step: 2000 Training loss: 0.38 accuracy: 0.717 
Em validação: loss: 0.381; accuracy: 0.683
Step: 2000 Amostras:64000  3.993%  Momento: [2023-Jun-27 01:37:51]
lr: 9.67802e-07 Train loss: 0.3774  accuracy: 0.7172
Valid loss: 0.3815 accuracy: 0.6832  


Model weights saved in ../../model/train/minilm/model_epoch_2000_treino_2023-Jun-27 01:37:51.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 14.97it/s]
Configuration saved in ../../model/train/minilm/model_epoch_2400_treino_2023-Jun-27 01:40:03.pt/config.json


Epoch: 1 Step: 2400 Training loss: 0.28 accuracy: 0.754 
Em validação: loss: 0.294; accuracy: 0.718
Step: 2400 Amostras:76800  4.791%  Momento: [2023-Jun-27 01:40:03]
lr: 9.59752e-07 Train loss: 0.2834  accuracy: 0.7539
Valid loss: 0.2937 accuracy: 0.7185  


Model weights saved in ../../model/train/minilm/model_epoch_2400_treino_2023-Jun-27 01:40:03.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.73it/s]
Configuration saved in ../../model/train/minilm/model_epoch_2800_treino_2023-Jun-27 01:42:13.pt/config.json


Epoch: 1 Step: 2800 Training loss: 0.24 accuracy: 0.774 
Em validação: loss: 0.239; accuracy: 0.740
Step: 2800 Amostras:89600  5.590%  Momento: [2023-Jun-27 01:42:13]
lr: 9.51702e-07 Train loss: 0.2354  accuracy: 0.7743
Valid loss: 0.2387 accuracy: 0.7403  


Model weights saved in ../../model/train/minilm/model_epoch_2800_treino_2023-Jun-27 01:42:13.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.98it/s]
Configuration saved in ../../model/train/minilm/model_epoch_3200_treino_2023-Jun-27 01:44:24.pt/config.json


Epoch: 1 Step: 3200 Training loss: 0.20 accuracy: 0.789 
Em validação: loss: 0.205; accuracy: 0.764
Step: 3200 Amostras:102400  6.388%  Momento: [2023-Jun-27 01:44:24]
lr: 9.43653e-07 Train loss: 0.2000  accuracy: 0.7892
Valid loss: 0.2050 accuracy: 0.7637  


Model weights saved in ../../model/train/minilm/model_epoch_3200_treino_2023-Jun-27 01:44:24.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.17it/s]
Configuration saved in ../../model/train/minilm/model_epoch_3600_treino_2023-Jun-27 01:46:34.pt/config.json


Epoch: 1 Step: 3600 Training loss: 0.18 accuracy: 0.800 
Em validação: loss: 0.180; accuracy: 0.778
Step: 3600 Amostras:115200  7.187%  Momento: [2023-Jun-27 01:46:34]
lr: 9.35603e-07 Train loss: 0.1815  accuracy: 0.8005
Valid loss: 0.1803 accuracy: 0.7776  


Model weights saved in ../../model/train/minilm/model_epoch_3600_treino_2023-Jun-27 01:46:34.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.05it/s]
Configuration saved in ../../model/train/minilm/model_epoch_4000_treino_2023-Jun-27 01:48:44.pt/config.json


Epoch: 1 Step: 4000 Training loss: 0.16 accuracy: 0.819 
Em validação: loss: 0.162; accuracy: 0.795
Step: 4000 Amostras:128000  7.985%  Momento: [2023-Jun-27 01:48:44]
lr: 9.27554e-07 Train loss: 0.1626  accuracy: 0.8194
Valid loss: 0.1623 accuracy: 0.7954  


Model weights saved in ../../model/train/minilm/model_epoch_4000_treino_2023-Jun-27 01:48:44.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 14.21it/s]
Configuration saved in ../../model/train/minilm/model_epoch_4400_treino_2023-Jun-27 01:50:57.pt/config.json


Epoch: 1 Step: 4400 Training loss: 0.15 accuracy: 0.825 
Em validação: loss: 0.148; accuracy: 0.819
Step: 4400 Amostras:140800  8.784%  Momento: [2023-Jun-27 01:50:57]
lr: 9.19504e-07 Train loss: 0.1513  accuracy: 0.8254
Valid loss: 0.1481 accuracy: 0.8188  


Model weights saved in ../../model/train/minilm/model_epoch_4400_treino_2023-Jun-27 01:50:57.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 13.36it/s]
Configuration saved in ../../model/train/minilm/model_epoch_4800_treino_2023-Jun-27 01:53:09.pt/config.json


Epoch: 1 Step: 4800 Training loss: 0.15 accuracy: 0.833 
Em validação: loss: 0.138; accuracy: 0.835
Step: 4800 Amostras:153600  9.582%  Momento: [2023-Jun-27 01:53:09]
lr: 9.11455e-07 Train loss: 0.1456  accuracy: 0.8330
Valid loss: 0.1379 accuracy: 0.8347  


Model weights saved in ../../model/train/minilm/model_epoch_4800_treino_2023-Jun-27 01:53:09.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 14.73it/s]
Configuration saved in ../../model/train/minilm/model_epoch_5200_treino_2023-Jun-27 01:55:20.pt/config.json


Epoch: 1 Step: 5200 Training loss: 0.14 accuracy: 0.833 
Em validação: loss: 0.130; accuracy: 0.844
Step: 5200 Amostras:166400  10.381%  Momento: [2023-Jun-27 01:55:20]
lr: 9.03405e-07 Train loss: 0.1418  accuracy: 0.8327
Valid loss: 0.1295 accuracy: 0.8441  


Model weights saved in ../../model/train/minilm/model_epoch_5200_treino_2023-Jun-27 01:55:20.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.67it/s]
Configuration saved in ../../model/train/minilm/model_epoch_5600_treino_2023-Jun-27 01:57:31.pt/config.json


Epoch: 1 Step: 5600 Training loss: 0.13 accuracy: 0.847 
Em validação: loss: 0.125; accuracy: 0.848
Step: 5600 Amostras:179200  11.179%  Momento: [2023-Jun-27 01:57:31]
lr: 8.95355e-07 Train loss: 0.1302  accuracy: 0.8466
Valid loss: 0.1248 accuracy: 0.8476  


Model weights saved in ../../model/train/minilm/model_epoch_5600_treino_2023-Jun-27 01:57:31.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 13.93it/s]
Configuration saved in ../../model/train/minilm/model_epoch_6000_treino_2023-Jun-27 01:59:44.pt/config.json


Epoch: 1 Step: 6000 Training loss: 0.13 accuracy: 0.850 
Em validação: loss: 0.116; accuracy: 0.858
Step: 6000 Amostras:192000  11.978%  Momento: [2023-Jun-27 01:59:44]
lr: 8.87306e-07 Train loss: 0.1287  accuracy: 0.8503
Valid loss: 0.1156 accuracy: 0.8580  


Model weights saved in ../../model/train/minilm/model_epoch_6000_treino_2023-Jun-27 01:59:44.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 13.75it/s]
Configuration saved in ../../model/train/minilm/model_epoch_6400_treino_2023-Jun-27 02:01:55.pt/config.json


Epoch: 1 Step: 6400 Training loss: 0.12 accuracy: 0.853 
Em validação: loss: 0.111; accuracy: 0.864
Step: 6400 Amostras:204800  12.776%  Momento: [2023-Jun-27 02:01:55]
lr: 8.79256e-07 Train loss: 0.1222  accuracy: 0.8525
Valid loss: 0.1112 accuracy: 0.8640  


Model weights saved in ../../model/train/minilm/model_epoch_6400_treino_2023-Jun-27 02:01:55.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.84it/s]
Configuration saved in ../../model/train/minilm/model_epoch_6800_treino_2023-Jun-27 02:04:06.pt/config.json


Epoch: 1 Step: 6800 Training loss: 0.12 accuracy: 0.851 
Em validação: loss: 0.109; accuracy: 0.870
Step: 6800 Amostras:217600  13.575%  Momento: [2023-Jun-27 02:04:06]
lr: 8.71207e-07 Train loss: 0.1212  accuracy: 0.8507
Valid loss: 0.1090 accuracy: 0.8699  


Model weights saved in ../../model/train/minilm/model_epoch_6800_treino_2023-Jun-27 02:04:06.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.00it/s]
Configuration saved in ../../model/train/minilm/model_epoch_7200_treino_2023-Jun-27 02:06:16.pt/config.json


Epoch: 1 Step: 7200 Training loss: 0.12 accuracy: 0.862 
Em validação: loss: 0.104; accuracy: 0.873
Step: 7200 Amostras:230400  14.374%  Momento: [2023-Jun-27 02:06:16]
lr: 8.63157e-07 Train loss: 0.1169  accuracy: 0.8621
Valid loss: 0.1035 accuracy: 0.8729  


Model weights saved in ../../model/train/minilm/model_epoch_7200_treino_2023-Jun-27 02:06:16.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 14.39it/s]
Configuration saved in ../../model/train/minilm/model_epoch_7600_treino_2023-Jun-27 02:08:27.pt/config.json


Epoch: 1 Step: 7600 Training loss: 0.11 accuracy: 0.862 
Em validação: loss: 0.100; accuracy: 0.875
Step: 7600 Amostras:243200  15.172%  Momento: [2023-Jun-27 02:08:27]
lr: 8.55107e-07 Train loss: 0.1124  accuracy: 0.8623
Valid loss: 0.1004 accuracy: 0.8754  


Model weights saved in ../../model/train/minilm/model_epoch_7600_treino_2023-Jun-27 02:08:27.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.85it/s]


Epoch: 1 Step: 8000 Training loss: 0.11 accuracy: 0.866 
Em validação: loss: 0.102; accuracy: 0.875
Step: 8000 Amostras:256000  15.971%  Momento: [2023-Jun-27 02:10:39]
lr: 8.47058e-07 Train loss: 0.1110  accuracy: 0.8660
Valid loss: 0.1019 accuracy: 0.8749  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.79it/s]
Configuration saved in ../../model/train/minilm/model_epoch_8400_treino_2023-Jun-27 02:12:50.pt/config.json


Epoch: 1 Step: 8400 Training loss: 0.11 accuracy: 0.863 
Em validação: loss: 0.095; accuracy: 0.881
Step: 8400 Amostras:268800  16.769%  Momento: [2023-Jun-27 02:12:50]
lr: 8.39008e-07 Train loss: 0.1085  accuracy: 0.8627
Valid loss: 0.0948 accuracy: 0.8808  


Model weights saved in ../../model/train/minilm/model_epoch_8400_treino_2023-Jun-27 02:12:50.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.03it/s]
Configuration saved in ../../model/train/minilm/model_epoch_8800_treino_2023-Jun-27 02:15:00.pt/config.json


Epoch: 1 Step: 8800 Training loss: 0.11 accuracy: 0.870 
Em validação: loss: 0.095; accuracy: 0.881
Step: 8800 Amostras:281600  17.568%  Momento: [2023-Jun-27 02:15:00]
lr: 8.30959e-07 Train loss: 0.1058  accuracy: 0.8702
Valid loss: 0.0951 accuracy: 0.8813  


Model weights saved in ../../model/train/minilm/model_epoch_8800_treino_2023-Jun-27 02:15:00.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 14.98it/s]
Configuration saved in ../../model/train/minilm/model_epoch_9200_treino_2023-Jun-27 02:17:13.pt/config.json


Epoch: 1 Step: 9200 Training loss: 0.10 accuracy: 0.878 
Em validação: loss: 0.089; accuracy: 0.891
Step: 9200 Amostras:294400  18.366%  Momento: [2023-Jun-27 02:17:13]
lr: 8.22909e-07 Train loss: 0.1012  accuracy: 0.8783
Valid loss: 0.0888 accuracy: 0.8913  


Model weights saved in ../../model/train/minilm/model_epoch_9200_treino_2023-Jun-27 02:17:13.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 14.64it/s]
Configuration saved in ../../model/train/minilm/model_epoch_9600_treino_2023-Jun-27 02:19:27.pt/config.json


Epoch: 1 Step: 9600 Training loss: 0.10 accuracy: 0.877 
Em validação: loss: 0.087; accuracy: 0.896
Step: 9600 Amostras:307200  19.165%  Momento: [2023-Jun-27 02:19:27]
lr: 8.14860e-07 Train loss: 0.1008  accuracy: 0.8770
Valid loss: 0.0867 accuracy: 0.8957  


Model weights saved in ../../model/train/minilm/model_epoch_9600_treino_2023-Jun-27 02:19:27.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.50it/s]
Configuration saved in ../../model/train/minilm/model_epoch_10000_treino_2023-Jun-27 02:21:38.pt/config.json


Epoch: 1 Step: 10000 Training loss: 0.10 accuracy: 0.884 
Em validação: loss: 0.084; accuracy: 0.899
Step: 10000 Amostras:320000  19.963%  Momento: [2023-Jun-27 02:21:38]
lr: 8.06810e-07 Train loss: 0.0957  accuracy: 0.8836
Valid loss: 0.0838 accuracy: 0.8987  


Model weights saved in ../../model/train/minilm/model_epoch_10000_treino_2023-Jun-27 02:21:38.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.72it/s]
Configuration saved in ../../model/train/minilm/model_epoch_10400_treino_2023-Jun-27 02:23:49.pt/config.json


Epoch: 1 Step: 10400 Training loss: 0.09 accuracy: 0.890 
Em validação: loss: 0.084; accuracy: 0.899
Step: 10400 Amostras:332800  20.762%  Momento: [2023-Jun-27 02:23:49]
lr: 7.98760e-07 Train loss: 0.0921  accuracy: 0.8895
Valid loss: 0.0841 accuracy: 0.8992  


Model weights saved in ../../model/train/minilm/model_epoch_10400_treino_2023-Jun-27 02:23:49.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.96it/s]
Configuration saved in ../../model/train/minilm/model_epoch_10800_treino_2023-Jun-27 02:26:00.pt/config.json


Epoch: 1 Step: 10800 Training loss: 0.09 accuracy: 0.891 
Em validação: loss: 0.082; accuracy: 0.904
Step: 10800 Amostras:345600  21.560%  Momento: [2023-Jun-27 02:26:00]
lr: 7.90711e-07 Train loss: 0.0891  accuracy: 0.8912
Valid loss: 0.0816 accuracy: 0.9037  


Model weights saved in ../../model/train/minilm/model_epoch_10800_treino_2023-Jun-27 02:26:00.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.75it/s]
Configuration saved in ../../model/train/minilm/model_epoch_11200_treino_2023-Jun-27 02:28:10.pt/config.json


Epoch: 1 Step: 11200 Training loss: 0.09 accuracy: 0.898 
Em validação: loss: 0.079; accuracy: 0.904
Step: 11200 Amostras:358400  22.359%  Momento: [2023-Jun-27 02:28:10]
lr: 7.82661e-07 Train loss: 0.0866  accuracy: 0.8978
Valid loss: 0.0789 accuracy: 0.9042  


Model weights saved in ../../model/train/minilm/model_epoch_11200_treino_2023-Jun-27 02:28:10.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.85it/s]
Configuration saved in ../../model/train/minilm/model_epoch_11600_treino_2023-Jun-27 02:30:21.pt/config.json


Epoch: 1 Step: 11600 Training loss: 0.08 accuracy: 0.904 
Em validação: loss: 0.077; accuracy: 0.911
Step: 11600 Amostras:371200  23.157%  Momento: [2023-Jun-27 02:30:21]
lr: 7.74612e-07 Train loss: 0.0831  accuracy: 0.9040
Valid loss: 0.0768 accuracy: 0.9111  


Model weights saved in ../../model/train/minilm/model_epoch_11600_treino_2023-Jun-27 02:30:21.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.93it/s]
Configuration saved in ../../model/train/minilm/model_epoch_12000_treino_2023-Jun-27 02:32:32.pt/config.json


Epoch: 1 Step: 12000 Training loss: 0.08 accuracy: 0.907 
Em validação: loss: 0.073; accuracy: 0.916
Step: 12000 Amostras:384000  23.956%  Momento: [2023-Jun-27 02:32:32]
lr: 7.66562e-07 Train loss: 0.0791  accuracy: 0.9066
Valid loss: 0.0727 accuracy: 0.9161  


Model weights saved in ../../model/train/minilm/model_epoch_12000_treino_2023-Jun-27 02:32:32.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.72it/s]


Epoch: 1 Step: 12400 Training loss: 0.08 accuracy: 0.904 
Em validação: loss: 0.071; accuracy: 0.916
Step: 12400 Amostras:396800  24.754%  Momento: [2023-Jun-27 02:34:42]
lr: 7.58512e-07 Train loss: 0.0812  accuracy: 0.9042
Valid loss: 0.0714 accuracy: 0.9156  


Train: 100%|██████████| 12523/12523 [1:08:16<00:00,  3.06it/s]
Epochs:  25%|██▌       | 1/4 [1:08:16<3:24:49, 4096.34s/it]
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.76it/s]
Configuration saved in ../../model/train/minilm/model_epoch_12800_treino_2023-Jun-27 02:36:53.pt/config.json


Epoch: 2 Step: 12800 Training loss: 0.08 accuracy: 0.912 
Em validação: loss: 0.070; accuracy: 0.918
Step: 12800 Amostras:409588  25.552%  Momento: [2023-Jun-27 02:36:53]
lr: 7.50463e-07 Train loss: 0.0755  accuracy: 0.9119
Valid loss: 0.0701 accuracy: 0.9176  


Model weights saved in ../../model/train/minilm/model_epoch_12800_treino_2023-Jun-27 02:36:53.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.91it/s]
Configuration saved in ../../model/train/minilm/model_epoch_13200_treino_2023-Jun-27 02:39:04.pt/config.json


Epoch: 2 Step: 13200 Training loss: 0.07 accuracy: 0.915 
Em validação: loss: 0.068; accuracy: 0.921
Step: 13200 Amostras:422388  26.351%  Momento: [2023-Jun-27 02:39:04]
lr: 7.42413e-07 Train loss: 0.0744  accuracy: 0.9151
Valid loss: 0.0680 accuracy: 0.9206  


Model weights saved in ../../model/train/minilm/model_epoch_13200_treino_2023-Jun-27 02:39:04.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.55it/s]
Configuration saved in ../../model/train/minilm/model_epoch_13600_treino_2023-Jun-27 02:41:17.pt/config.json


Epoch: 2 Step: 13600 Training loss: 0.07 accuracy: 0.915 
Em validação: loss: 0.069; accuracy: 0.923
Step: 13600 Amostras:435188  27.149%  Momento: [2023-Jun-27 02:41:17]
lr: 7.34364e-07 Train loss: 0.0730  accuracy: 0.9148
Valid loss: 0.0693 accuracy: 0.9225  


Model weights saved in ../../model/train/minilm/model_epoch_13600_treino_2023-Jun-27 02:41:17.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.47it/s]
Configuration saved in ../../model/train/minilm/model_epoch_14000_treino_2023-Jun-27 02:43:27.pt/config.json


Epoch: 2 Step: 14000 Training loss: 0.07 accuracy: 0.914 
Em validação: loss: 0.065; accuracy: 0.925
Step: 14000 Amostras:447988  27.948%  Momento: [2023-Jun-27 02:43:27]
lr: 7.26314e-07 Train loss: 0.0729  accuracy: 0.9139
Valid loss: 0.0653 accuracy: 0.9250  


Model weights saved in ../../model/train/minilm/model_epoch_14000_treino_2023-Jun-27 02:43:27.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.58it/s]


Epoch: 2 Step: 14400 Training loss: 0.07 accuracy: 0.920 
Em validação: loss: 0.066; accuracy: 0.924
Step: 14400 Amostras:460788  28.746%  Momento: [2023-Jun-27 02:45:39]
lr: 7.18265e-07 Train loss: 0.0695  accuracy: 0.9202
Valid loss: 0.0658 accuracy: 0.9240  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.54it/s]
Configuration saved in ../../model/train/minilm/model_epoch_14800_treino_2023-Jun-27 02:47:50.pt/config.json


Epoch: 2 Step: 14800 Training loss: 0.07 accuracy: 0.924 
Em validação: loss: 0.065; accuracy: 0.927
Step: 14800 Amostras:473588  29.545%  Momento: [2023-Jun-27 02:47:50]
lr: 7.10215e-07 Train loss: 0.0689  accuracy: 0.9237
Valid loss: 0.0647 accuracy: 0.9265  


Model weights saved in ../../model/train/minilm/model_epoch_14800_treino_2023-Jun-27 02:47:50.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.99it/s]
Configuration saved in ../../model/train/minilm/model_epoch_15200_treino_2023-Jun-27 02:50:01.pt/config.json


Epoch: 2 Step: 15200 Training loss: 0.07 accuracy: 0.925 
Em validação: loss: 0.064; accuracy: 0.930
Step: 15200 Amostras:486388  30.343%  Momento: [2023-Jun-27 02:50:01]
lr: 7.02165e-07 Train loss: 0.0661  accuracy: 0.9247
Valid loss: 0.0638 accuracy: 0.9305  


Model weights saved in ../../model/train/minilm/model_epoch_15200_treino_2023-Jun-27 02:50:01.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.70it/s]


Epoch: 2 Step: 15600 Training loss: 0.07 accuracy: 0.927 
Em validação: loss: 0.063; accuracy: 0.929
Step: 15600 Amostras:499188  31.142%  Momento: [2023-Jun-27 02:52:12]
lr: 6.94116e-07 Train loss: 0.0650  accuracy: 0.9266
Valid loss: 0.0630 accuracy: 0.9295  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.72it/s]


Epoch: 2 Step: 16000 Training loss: 0.07 accuracy: 0.922 
Em validação: loss: 0.062; accuracy: 0.930
Step: 16000 Amostras:511988  31.940%  Momento: [2023-Jun-27 02:54:22]
lr: 6.86066e-07 Train loss: 0.0674  accuracy: 0.9223
Valid loss: 0.0618 accuracy: 0.9305  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.71it/s]
Configuration saved in ../../model/train/minilm/model_epoch_16400_treino_2023-Jun-27 02:56:33.pt/config.json


Epoch: 2 Step: 16400 Training loss: 0.06 accuracy: 0.926 
Em validação: loss: 0.061; accuracy: 0.932
Step: 16400 Amostras:524788  32.739%  Momento: [2023-Jun-27 02:56:33]
lr: 6.78017e-07 Train loss: 0.0636  accuracy: 0.9261
Valid loss: 0.0605 accuracy: 0.9320  


Model weights saved in ../../model/train/minilm/model_epoch_16400_treino_2023-Jun-27 02:56:33.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.95it/s]
Configuration saved in ../../model/train/minilm/model_epoch_16800_treino_2023-Jun-27 02:58:44.pt/config.json


Epoch: 2 Step: 16800 Training loss: 0.06 accuracy: 0.930 
Em validação: loss: 0.061; accuracy: 0.932
Step: 16800 Amostras:537588  33.538%  Momento: [2023-Jun-27 02:58:44]
lr: 6.69967e-07 Train loss: 0.0618  accuracy: 0.9301
Valid loss: 0.0610 accuracy: 0.9325  


Model weights saved in ../../model/train/minilm/model_epoch_16800_treino_2023-Jun-27 02:58:44.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.97it/s]


Epoch: 2 Step: 17200 Training loss: 0.06 accuracy: 0.930 
Em validação: loss: 0.060; accuracy: 0.932
Step: 17200 Amostras:550388  34.336%  Momento: [2023-Jun-27 03:00:54]
lr: 6.61917e-07 Train loss: 0.0628  accuracy: 0.9301
Valid loss: 0.0596 accuracy: 0.9320  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.47it/s]


Epoch: 2 Step: 17600 Training loss: 0.06 accuracy: 0.929 
Em validação: loss: 0.060; accuracy: 0.931
Step: 17600 Amostras:563188  35.135%  Momento: [2023-Jun-27 03:03:04]
lr: 6.53868e-07 Train loss: 0.0621  accuracy: 0.9295
Valid loss: 0.0599 accuracy: 0.9310  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.49it/s]


Epoch: 2 Step: 18000 Training loss: 0.06 accuracy: 0.926 
Em validação: loss: 0.058; accuracy: 0.932
Step: 18000 Amostras:575988  35.933%  Momento: [2023-Jun-27 03:05:16]
lr: 6.45818e-07 Train loss: 0.0634  accuracy: 0.9261
Valid loss: 0.0584 accuracy: 0.9320  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.49it/s]


Epoch: 2 Step: 18400 Training loss: 0.06 accuracy: 0.931 
Em validação: loss: 0.058; accuracy: 0.931
Step: 18400 Amostras:588788  36.732%  Momento: [2023-Jun-27 03:07:28]
lr: 6.37769e-07 Train loss: 0.0603  accuracy: 0.9306
Valid loss: 0.0583 accuracy: 0.9315  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.72it/s]


Epoch: 2 Step: 18800 Training loss: 0.06 accuracy: 0.932 
Em validação: loss: 0.058; accuracy: 0.931
Step: 18800 Amostras:601588  37.530%  Momento: [2023-Jun-27 03:09:38]
lr: 6.29719e-07 Train loss: 0.0596  accuracy: 0.9317
Valid loss: 0.0583 accuracy: 0.9310  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.75it/s]


Epoch: 2 Step: 19200 Training loss: 0.06 accuracy: 0.934 
Em validação: loss: 0.058; accuracy: 0.932
Step: 19200 Amostras:614388  38.329%  Momento: [2023-Jun-27 03:11:49]
lr: 6.21669e-07 Train loss: 0.0566  accuracy: 0.9344
Valid loss: 0.0582 accuracy: 0.9320  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.73it/s]
Configuration saved in ../../model/train/minilm/model_epoch_19600_treino_2023-Jun-27 03:14:00.pt/config.json


Epoch: 2 Step: 19600 Training loss: 0.06 accuracy: 0.933 
Em validação: loss: 0.059; accuracy: 0.933
Step: 19600 Amostras:627188  39.127%  Momento: [2023-Jun-27 03:14:00]
lr: 6.13620e-07 Train loss: 0.0578  accuracy: 0.9333
Valid loss: 0.0589 accuracy: 0.9330  


Model weights saved in ../../model/train/minilm/model_epoch_19600_treino_2023-Jun-27 03:14:00.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.83it/s]


Epoch: 2 Step: 20000 Training loss: 0.06 accuracy: 0.931 
Em validação: loss: 0.056; accuracy: 0.933
Step: 20000 Amostras:639988  39.926%  Momento: [2023-Jun-27 03:16:11]
lr: 6.05570e-07 Train loss: 0.0596  accuracy: 0.9308
Valid loss: 0.0563 accuracy: 0.9330  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.82it/s]


Epoch: 2 Step: 20400 Training loss: 0.06 accuracy: 0.934 
Em validação: loss: 0.057; accuracy: 0.932
Step: 20400 Amostras:652788  40.724%  Momento: [2023-Jun-27 03:18:22]
lr: 5.97521e-07 Train loss: 0.0576  accuracy: 0.9339
Valid loss: 0.0570 accuracy: 0.9325  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.87it/s]


Epoch: 2 Step: 20800 Training loss: 0.06 accuracy: 0.937 
Em validação: loss: 0.057; accuracy: 0.932
Step: 20800 Amostras:665588  41.523%  Momento: [2023-Jun-27 03:20:31]
lr: 5.89471e-07 Train loss: 0.0561  accuracy: 0.9367
Valid loss: 0.0567 accuracy: 0.9325  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.89it/s]
Configuration saved in ../../model/train/minilm/model_epoch_21200_treino_2023-Jun-27 03:22:42.pt/config.json


Epoch: 2 Step: 21200 Training loss: 0.06 accuracy: 0.934 
Em validação: loss: 0.056; accuracy: 0.934
Step: 21200 Amostras:678388  42.321%  Momento: [2023-Jun-27 03:22:42]
lr: 5.81422e-07 Train loss: 0.0576  accuracy: 0.9345
Valid loss: 0.0557 accuracy: 0.9340  


Model weights saved in ../../model/train/minilm/model_epoch_21200_treino_2023-Jun-27 03:22:42.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.87it/s]


Epoch: 2 Step: 21600 Training loss: 0.06 accuracy: 0.935 
Em validação: loss: 0.055; accuracy: 0.934
Step: 21600 Amostras:691188  43.120%  Momento: [2023-Jun-27 03:24:52]
lr: 5.73372e-07 Train loss: 0.0571  accuracy: 0.9347
Valid loss: 0.0551 accuracy: 0.9340  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.91it/s]


Epoch: 2 Step: 22000 Training loss: 0.06 accuracy: 0.935 
Em validação: loss: 0.055; accuracy: 0.934
Step: 22000 Amostras:703988  43.918%  Momento: [2023-Jun-27 03:27:03]
lr: 5.65322e-07 Train loss: 0.0565  accuracy: 0.9351
Valid loss: 0.0550 accuracy: 0.9340  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.89it/s]
Configuration saved in ../../model/train/minilm/model_epoch_22400_treino_2023-Jun-27 03:29:13.pt/config.json


Epoch: 2 Step: 22400 Training loss: 0.06 accuracy: 0.935 
Em validação: loss: 0.055; accuracy: 0.934
Step: 22400 Amostras:716788  44.717%  Momento: [2023-Jun-27 03:29:13]
lr: 5.57273e-07 Train loss: 0.0563  accuracy: 0.9347
Valid loss: 0.0550 accuracy: 0.9345  


Model weights saved in ../../model/train/minilm/model_epoch_22400_treino_2023-Jun-27 03:29:13.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:05<00:00, 12.53it/s]
Configuration saved in ../../model/train/minilm/model_epoch_22800_treino_2023-Jun-27 03:31:24.pt/config.json


Epoch: 2 Step: 22800 Training loss: 0.06 accuracy: 0.934 
Em validação: loss: 0.055; accuracy: 0.936
Step: 22800 Amostras:729588  45.516%  Momento: [2023-Jun-27 03:31:24]
lr: 5.49223e-07 Train loss: 0.0560  accuracy: 0.9342
Valid loss: 0.0552 accuracy: 0.9359  


Model weights saved in ../../model/train/minilm/model_epoch_22800_treino_2023-Jun-27 03:31:24.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.10it/s]


Epoch: 2 Step: 23200 Training loss: 0.06 accuracy: 0.935 
Em validação: loss: 0.054; accuracy: 0.936
Step: 23200 Amostras:742388  46.314%  Momento: [2023-Jun-27 03:33:37]
lr: 5.41174e-07 Train loss: 0.0558  accuracy: 0.9352
Valid loss: 0.0541 accuracy: 0.9359  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.04it/s]
Configuration saved in ../../model/train/minilm/model_epoch_23600_treino_2023-Jun-27 03:35:46.pt/config.json


Epoch: 2 Step: 23600 Training loss: 0.05 accuracy: 0.938 
Em validação: loss: 0.054; accuracy: 0.937
Step: 23600 Amostras:755188  47.113%  Momento: [2023-Jun-27 03:35:46]
lr: 5.33124e-07 Train loss: 0.0539  accuracy: 0.9379
Valid loss: 0.0539 accuracy: 0.9374  


Model weights saved in ../../model/train/minilm/model_epoch_23600_treino_2023-Jun-27 03:35:46.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.75it/s]


Epoch: 2 Step: 24000 Training loss: 0.05 accuracy: 0.938 
Em validação: loss: 0.054; accuracy: 0.935
Step: 24000 Amostras:767988  47.911%  Momento: [2023-Jun-27 03:37:57]
lr: 5.25074e-07 Train loss: 0.0533  accuracy: 0.9382
Valid loss: 0.0543 accuracy: 0.9350  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.74it/s]


Epoch: 2 Step: 24400 Training loss: 0.05 accuracy: 0.937 
Em validação: loss: 0.053; accuracy: 0.937
Step: 24400 Amostras:780788  48.710%  Momento: [2023-Jun-27 03:40:08]
lr: 5.17025e-07 Train loss: 0.0542  accuracy: 0.9367
Valid loss: 0.0533 accuracy: 0.9374  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.77it/s]


Epoch: 2 Step: 24800 Training loss: 0.05 accuracy: 0.938 
Em validação: loss: 0.053; accuracy: 0.936
Step: 24800 Amostras:793588  49.508%  Momento: [2023-Jun-27 03:42:19]
lr: 5.08975e-07 Train loss: 0.0534  accuracy: 0.9376
Valid loss: 0.0531 accuracy: 0.9364  


Train: 100%|██████████| 12523/12523 [1:08:15<00:00,  3.06it/s]
Epochs:  50%|█████     | 2/4 [2:16:31<2:16:31, 4095.58s/it]
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.72it/s]


Epoch: 3 Step: 25200 Training loss: 0.05 accuracy: 0.938 
Em validação: loss: 0.052; accuracy: 0.937
Step: 25200 Amostras:806376  50.306%  Momento: [2023-Jun-27 03:44:29]
lr: 5.00926e-07 Train loss: 0.0536  accuracy: 0.9380
Valid loss: 0.0522 accuracy: 0.9374  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.79it/s]


Epoch: 3 Step: 25600 Training loss: 0.05 accuracy: 0.939 
Em validação: loss: 0.052; accuracy: 0.936
Step: 25600 Amostras:819176  51.104%  Momento: [2023-Jun-27 03:46:40]
lr: 4.92876e-07 Train loss: 0.0531  accuracy: 0.9387
Valid loss: 0.0519 accuracy: 0.9364  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.73it/s]


Epoch: 3 Step: 26000 Training loss: 0.05 accuracy: 0.941 
Em validação: loss: 0.052; accuracy: 0.937
Step: 26000 Amostras:831976  51.903%  Momento: [2023-Jun-27 03:48:50]
lr: 4.84827e-07 Train loss: 0.0515  accuracy: 0.9405
Valid loss: 0.0525 accuracy: 0.9369  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.76it/s]


Epoch: 3 Step: 26400 Training loss: 0.05 accuracy: 0.936 
Em validação: loss: 0.051; accuracy: 0.936
Step: 26400 Amostras:844776  52.702%  Momento: [2023-Jun-27 03:51:01]
lr: 4.76777e-07 Train loss: 0.0537  accuracy: 0.9364
Valid loss: 0.0515 accuracy: 0.9364  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.78it/s]


Epoch: 3 Step: 26800 Training loss: 0.05 accuracy: 0.940 
Em validação: loss: 0.052; accuracy: 0.937
Step: 26800 Amostras:857576  53.500%  Momento: [2023-Jun-27 03:53:11]
lr: 4.68727e-07 Train loss: 0.0510  accuracy: 0.9404
Valid loss: 0.0521 accuracy: 0.9369  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.76it/s]


Epoch: 3 Step: 27200 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.052; accuracy: 0.937
Step: 27200 Amostras:870376  54.299%  Momento: [2023-Jun-27 03:55:22]
lr: 4.60678e-07 Train loss: 0.0510  accuracy: 0.9421
Valid loss: 0.0516 accuracy: 0.9369  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.79it/s]
Configuration saved in ../../model/train/minilm/model_epoch_27600_treino_2023-Jun-27 03:57:32.pt/config.json


Epoch: 3 Step: 27600 Training loss: 0.05 accuracy: 0.938 
Em validação: loss: 0.051; accuracy: 0.938
Step: 27600 Amostras:883176  55.097%  Momento: [2023-Jun-27 03:57:32]
lr: 4.52628e-07 Train loss: 0.0526  accuracy: 0.9376
Valid loss: 0.0511 accuracy: 0.9379  


Model weights saved in ../../model/train/minilm/model_epoch_27600_treino_2023-Jun-27 03:57:32.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.96it/s]


Epoch: 3 Step: 28000 Training loss: 0.05 accuracy: 0.941 
Em validação: loss: 0.051; accuracy: 0.937
Step: 28000 Amostras:895976  55.896%  Momento: [2023-Jun-27 03:59:43]
lr: 4.44579e-07 Train loss: 0.0506  accuracy: 0.9415
Valid loss: 0.0513 accuracy: 0.9374  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.91it/s]


Epoch: 3 Step: 28400 Training loss: 0.05 accuracy: 0.939 
Em validação: loss: 0.051; accuracy: 0.936
Step: 28400 Amostras:908776  56.694%  Momento: [2023-Jun-27 04:01:52]
lr: 4.36529e-07 Train loss: 0.0509  accuracy: 0.9385
Valid loss: 0.0509 accuracy: 0.9364  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.83it/s]


Epoch: 3 Step: 28800 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.051; accuracy: 0.938
Step: 28800 Amostras:921576  57.493%  Momento: [2023-Jun-27 04:04:03]
lr: 4.28479e-07 Train loss: 0.0488  accuracy: 0.9448
Valid loss: 0.0506 accuracy: 0.9379  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.78it/s]


Epoch: 3 Step: 29200 Training loss: 0.05 accuracy: 0.938 
Em validação: loss: 0.050; accuracy: 0.937
Step: 29200 Amostras:934376  58.291%  Momento: [2023-Jun-27 04:06:12]
lr: 4.20430e-07 Train loss: 0.0528  accuracy: 0.9384
Valid loss: 0.0502 accuracy: 0.9369  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.85it/s]


Epoch: 3 Step: 29600 Training loss: 0.05 accuracy: 0.936 
Em validação: loss: 0.050; accuracy: 0.937
Step: 29600 Amostras:947176  59.090%  Momento: [2023-Jun-27 04:08:23]
lr: 4.12380e-07 Train loss: 0.0534  accuracy: 0.9357
Valid loss: 0.0502 accuracy: 0.9374  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.87it/s]


Epoch: 3 Step: 30000 Training loss: 0.05 accuracy: 0.939 
Em validação: loss: 0.050; accuracy: 0.937
Step: 30000 Amostras:959976  59.888%  Momento: [2023-Jun-27 04:10:33]
lr: 4.04331e-07 Train loss: 0.0517  accuracy: 0.9390
Valid loss: 0.0500 accuracy: 0.9369  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.86it/s]


Epoch: 3 Step: 30400 Training loss: 0.05 accuracy: 0.939 
Em validação: loss: 0.050; accuracy: 0.937
Step: 30400 Amostras:972776  60.687%  Momento: [2023-Jun-27 04:12:43]
lr: 3.96281e-07 Train loss: 0.0512  accuracy: 0.9387
Valid loss: 0.0498 accuracy: 0.9369  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.86it/s]
Configuration saved in ../../model/train/minilm/model_epoch_30800_treino_2023-Jun-27 04:14:53.pt/config.json


Epoch: 3 Step: 30800 Training loss: 0.05 accuracy: 0.940 
Em validação: loss: 0.049; accuracy: 0.939
Step: 30800 Amostras:985576  61.485%  Momento: [2023-Jun-27 04:14:53]
lr: 3.88232e-07 Train loss: 0.0495  accuracy: 0.9404
Valid loss: 0.0491 accuracy: 0.9389  


Model weights saved in ../../model/train/minilm/model_epoch_30800_treino_2023-Jun-27 04:14:53.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.97it/s]


Epoch: 3 Step: 31200 Training loss: 0.05 accuracy: 0.941 
Em validação: loss: 0.049; accuracy: 0.938
Step: 31200 Amostras:998376  62.284%  Momento: [2023-Jun-27 04:17:03]
lr: 3.80182e-07 Train loss: 0.0490  accuracy: 0.9415
Valid loss: 0.0495 accuracy: 0.9384  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.76it/s]


Epoch: 3 Step: 31600 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.049; accuracy: 0.938
Step: 31600 Amostras:1011176  63.082%  Momento: [2023-Jun-27 04:19:13]
lr: 3.72132e-07 Train loss: 0.0505  accuracy: 0.9416
Valid loss: 0.0489 accuracy: 0.9379  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.75it/s]


Epoch: 3 Step: 32000 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.049; accuracy: 0.938
Step: 32000 Amostras:1023976  63.881%  Momento: [2023-Jun-27 04:21:24]
lr: 3.64083e-07 Train loss: 0.0471  accuracy: 0.9444
Valid loss: 0.0491 accuracy: 0.9384  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.72it/s]


Epoch: 3 Step: 32400 Training loss: 0.05 accuracy: 0.940 
Em validação: loss: 0.049; accuracy: 0.937
Step: 32400 Amostras:1036776  64.679%  Momento: [2023-Jun-27 04:23:34]
lr: 3.56033e-07 Train loss: 0.0494  accuracy: 0.9402
Valid loss: 0.0495 accuracy: 0.9374  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.77it/s]


Epoch: 3 Step: 32800 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.050; accuracy: 0.939
Step: 32800 Amostras:1049576  65.478%  Momento: [2023-Jun-27 04:25:45]
lr: 3.47984e-07 Train loss: 0.0483  accuracy: 0.9424
Valid loss: 0.0498 accuracy: 0.9389  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.71it/s]


Epoch: 3 Step: 33200 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.049; accuracy: 0.937
Step: 33200 Amostras:1062376  66.277%  Momento: [2023-Jun-27 04:27:55]
lr: 3.39934e-07 Train loss: 0.0482  accuracy: 0.9440
Valid loss: 0.0486 accuracy: 0.9374  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.70it/s]


Epoch: 3 Step: 33600 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.049; accuracy: 0.939
Step: 33600 Amostras:1075176  67.075%  Momento: [2023-Jun-27 04:30:05]
lr: 3.31884e-07 Train loss: 0.0469  accuracy: 0.9445
Valid loss: 0.0485 accuracy: 0.9389  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.76it/s]


Epoch: 3 Step: 34000 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.048; accuracy: 0.939
Step: 34000 Amostras:1087976  67.874%  Momento: [2023-Jun-27 04:32:16]
lr: 3.23835e-07 Train loss: 0.0475  accuracy: 0.9428
Valid loss: 0.0483 accuracy: 0.9389  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.79it/s]


Epoch: 3 Step: 34400 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.048; accuracy: 0.938
Step: 34400 Amostras:1100776  68.672%  Momento: [2023-Jun-27 04:34:26]
lr: 3.15785e-07 Train loss: 0.0485  accuracy: 0.9434
Valid loss: 0.0483 accuracy: 0.9384  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.79it/s]


Epoch: 3 Step: 34800 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.048; accuracy: 0.939
Step: 34800 Amostras:1113576  69.471%  Momento: [2023-Jun-27 04:36:37]
lr: 3.07736e-07 Train loss: 0.0479  accuracy: 0.9442
Valid loss: 0.0483 accuracy: 0.9389  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.71it/s]


Epoch: 3 Step: 35200 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.048; accuracy: 0.939
Step: 35200 Amostras:1126376  70.269%  Momento: [2023-Jun-27 04:38:48]
lr: 2.99686e-07 Train loss: 0.0490  accuracy: 0.9416
Valid loss: 0.0484 accuracy: 0.9389  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.85it/s]
Configuration saved in ../../model/train/minilm/model_epoch_35600_treino_2023-Jun-27 04:40:58.pt/config.json


Epoch: 3 Step: 35600 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.048; accuracy: 0.939
Step: 35600 Amostras:1139176  71.068%  Momento: [2023-Jun-27 04:40:58]
lr: 2.91636e-07 Train loss: 0.0500  accuracy: 0.9417
Valid loss: 0.0479 accuracy: 0.9394  


Model weights saved in ../../model/train/minilm/model_epoch_35600_treino_2023-Jun-27 04:40:58.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.94it/s]


Epoch: 3 Step: 36000 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.049; accuracy: 0.939
Step: 36000 Amostras:1151976  71.866%  Momento: [2023-Jun-27 04:43:08]
lr: 2.83587e-07 Train loss: 0.0493  accuracy: 0.9419
Valid loss: 0.0485 accuracy: 0.9389  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 14.33it/s]


Epoch: 3 Step: 36400 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.048; accuracy: 0.939
Step: 36400 Amostras:1164776  72.665%  Momento: [2023-Jun-27 04:45:19]
lr: 2.75537e-07 Train loss: 0.0476  accuracy: 0.9447
Valid loss: 0.0482 accuracy: 0.9389  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.19it/s]
Configuration saved in ../../model/train/minilm/model_epoch_36800_treino_2023-Jun-27 04:47:30.pt/config.json


Epoch: 3 Step: 36800 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.048; accuracy: 0.940
Step: 36800 Amostras:1177576  73.463%  Momento: [2023-Jun-27 04:47:30]
lr: 2.67488e-07 Train loss: 0.0471  accuracy: 0.9434
Valid loss: 0.0476 accuracy: 0.9399  


Model weights saved in ../../model/train/minilm/model_epoch_36800_treino_2023-Jun-27 04:47:30.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.45it/s]


Epoch: 3 Step: 37200 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.048; accuracy: 0.939
Step: 37200 Amostras:1190376  74.262%  Momento: [2023-Jun-27 04:49:43]
lr: 2.59438e-07 Train loss: 0.0464  accuracy: 0.9443
Valid loss: 0.0477 accuracy: 0.9389  


Train: 100%|██████████| 12523/12523 [1:08:02<00:00,  3.07it/s]
Epochs:  75%|███████▌  | 3/4 [3:24:33<1:08:09, 4089.59s/it]
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.59it/s]


Epoch: 4 Step: 37600 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.047; accuracy: 0.939
Step: 37600 Amostras:1203164  75.060%  Momento: [2023-Jun-27 04:51:53]
lr: 2.51389e-07 Train loss: 0.0482  accuracy: 0.9423
Valid loss: 0.0474 accuracy: 0.9394  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.63it/s]


Epoch: 4 Step: 38000 Training loss: 0.05 accuracy: 0.947 
Em validação: loss: 0.048; accuracy: 0.940
Step: 38000 Amostras:1215964  75.858%  Momento: [2023-Jun-27 04:54:04]
lr: 2.43339e-07 Train loss: 0.0453  accuracy: 0.9470
Valid loss: 0.0478 accuracy: 0.9399  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.68it/s]


Epoch: 4 Step: 38400 Training loss: 0.05 accuracy: 0.941 
Em validação: loss: 0.048; accuracy: 0.939
Step: 38400 Amostras:1228764  76.657%  Momento: [2023-Jun-27 04:56:15]
lr: 2.35289e-07 Train loss: 0.0488  accuracy: 0.9409
Valid loss: 0.0478 accuracy: 0.9394  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.72it/s]
Configuration saved in ../../model/train/minilm/model_epoch_38800_treino_2023-Jun-27 04:58:26.pt/config.json


Epoch: 4 Step: 38800 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.047; accuracy: 0.941
Step: 38800 Amostras:1241564  77.455%  Momento: [2023-Jun-27 04:58:26]
lr: 2.27240e-07 Train loss: 0.0462  accuracy: 0.9448
Valid loss: 0.0470 accuracy: 0.9409  


Model weights saved in ../../model/train/minilm/model_epoch_38800_treino_2023-Jun-27 04:58:26.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.13it/s]


Epoch: 4 Step: 39200 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.048; accuracy: 0.940
Step: 39200 Amostras:1254364  78.254%  Momento: [2023-Jun-27 05:00:36]
lr: 2.19190e-07 Train loss: 0.0488  accuracy: 0.9417
Valid loss: 0.0477 accuracy: 0.9399  



[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 14.49it/s]


Epoch: 4 Step: 39600 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.047; accuracy: 0.940
Step: 39600 Amostras:1267164  79.052%  Momento: [2023-Jun-27 05:02:48]
lr: 2.11141e-07 Train loss: 0.0470  accuracy: 0.9432
Valid loss: 0.0472 accuracy: 0.9404  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.57it/s]


Epoch: 4 Step: 40000 Training loss: 0.05 accuracy: 0.941 
Em validação: loss: 0.048; accuracy: 0.940
Step: 40000 Amostras:1279964  79.851%  Momento: [2023-Jun-27 05:04:57]
lr: 2.03091e-07 Train loss: 0.0486  accuracy: 0.9413
Valid loss: 0.0479 accuracy: 0.9399  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.38it/s]


Epoch: 4 Step: 40400 Training loss: 0.05 accuracy: 0.941 
Em validação: loss: 0.047; accuracy: 0.940
Step: 40400 Amostras:1292764  80.649%  Momento: [2023-Jun-27 05:07:10]
lr: 1.95041e-07 Train loss: 0.0480  accuracy: 0.9408
Valid loss: 0.0469 accuracy: 0.9399  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.19it/s]


Epoch: 4 Step: 40800 Training loss: 0.05 accuracy: 0.946 
Em validação: loss: 0.047; accuracy: 0.939
Step: 40800 Amostras:1305564  81.448%  Momento: [2023-Jun-27 05:09:22]
lr: 1.86992e-07 Train loss: 0.0458  accuracy: 0.9459
Valid loss: 0.0473 accuracy: 0.9394  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 14.77it/s]


Epoch: 4 Step: 41200 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.047; accuracy: 0.939
Step: 41200 Amostras:1318364  82.246%  Momento: [2023-Jun-27 05:11:32]
lr: 1.78942e-07 Train loss: 0.0482  accuracy: 0.9424
Valid loss: 0.0469 accuracy: 0.9389  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.99it/s]


Epoch: 4 Step: 41600 Training loss: 0.05 accuracy: 0.946 
Em validação: loss: 0.047; accuracy: 0.939
Step: 41600 Amostras:1331164  83.045%  Momento: [2023-Jun-27 05:13:44]
lr: 1.70893e-07 Train loss: 0.0455  accuracy: 0.9462
Valid loss: 0.0469 accuracy: 0.9394  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.96it/s]


Epoch: 4 Step: 42000 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.047; accuracy: 0.940
Step: 42000 Amostras:1343964  83.843%  Momento: [2023-Jun-27 05:15:54]
lr: 1.62843e-07 Train loss: 0.0466  accuracy: 0.9438
Valid loss: 0.0467 accuracy: 0.9399  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.60it/s]


Epoch: 4 Step: 42400 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.047; accuracy: 0.940
Step: 42400 Amostras:1356764  84.642%  Momento: [2023-Jun-27 05:18:04]
lr: 1.54794e-07 Train loss: 0.0460  accuracy: 0.9454
Valid loss: 0.0468 accuracy: 0.9399  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.62it/s]


Epoch: 4 Step: 42800 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.047; accuracy: 0.940
Step: 42800 Amostras:1369564  85.441%  Momento: [2023-Jun-27 05:20:16]
lr: 1.46744e-07 Train loss: 0.0471  accuracy: 0.9444
Valid loss: 0.0472 accuracy: 0.9404  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.60it/s]


Epoch: 4 Step: 43200 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.046; accuracy: 0.941
Step: 43200 Amostras:1382364  86.239%  Momento: [2023-Jun-27 05:22:27]
lr: 1.38694e-07 Train loss: 0.0473  accuracy: 0.9431
Valid loss: 0.0465 accuracy: 0.9409  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.76it/s]


Epoch: 4 Step: 43600 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.047; accuracy: 0.941
Step: 43600 Amostras:1395164  87.038%  Momento: [2023-Jun-27 05:24:38]
lr: 1.30645e-07 Train loss: 0.0464  accuracy: 0.9441
Valid loss: 0.0468 accuracy: 0.9409  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.64it/s]


Epoch: 4 Step: 44000 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.047; accuracy: 0.941
Step: 44000 Amostras:1407964  87.836%  Momento: [2023-Jun-27 05:26:48]
lr: 1.22595e-07 Train loss: 0.0467  accuracy: 0.9438
Valid loss: 0.0469 accuracy: 0.9409  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.73it/s]


Epoch: 4 Step: 44400 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.047; accuracy: 0.941
Step: 44400 Amostras:1420764  88.635%  Momento: [2023-Jun-27 05:28:59]
lr: 1.14546e-07 Train loss: 0.0460  accuracy: 0.9450
Valid loss: 0.0466 accuracy: 0.9409  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.68it/s]
Configuration saved in ../../model/train/minilm/model_epoch_44800_treino_2023-Jun-27 05:31:09.pt/config.json


Epoch: 4 Step: 44800 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.047; accuracy: 0.941
Step: 44800 Amostras:1433564  89.433%  Momento: [2023-Jun-27 05:31:09]
lr: 1.06496e-07 Train loss: 0.0469  accuracy: 0.9436
Valid loss: 0.0466 accuracy: 0.9414  


Model weights saved in ../../model/train/minilm/model_epoch_44800_treino_2023-Jun-27 05:31:09.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 13.89it/s]


Epoch: 4 Step: 45200 Training loss: 0.04 accuracy: 0.947 
Em validação: loss: 0.047; accuracy: 0.941
Step: 45200 Amostras:1446364  90.232%  Momento: [2023-Jun-27 05:33:22]
lr: 9.84464e-08 Train loss: 0.0448  accuracy: 0.9473
Valid loss: 0.0468 accuracy: 0.9409  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.85it/s]


Epoch: 4 Step: 45600 Training loss: 0.04 accuracy: 0.947 
Em validação: loss: 0.046; accuracy: 0.941
Step: 45600 Amostras:1459164  91.030%  Momento: [2023-Jun-27 05:35:32]
lr: 9.03968e-08 Train loss: 0.0436  accuracy: 0.9472
Valid loss: 0.0462 accuracy: 0.9414  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.90it/s]
Configuration saved in ../../model/train/minilm/model_epoch_46000_treino_2023-Jun-27 05:37:42.pt/config.json


Epoch: 4 Step: 46000 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.046; accuracy: 0.942
Step: 46000 Amostras:1471964  91.829%  Momento: [2023-Jun-27 05:37:42]
lr: 8.23473e-08 Train loss: 0.0464  accuracy: 0.9452
Valid loss: 0.0465 accuracy: 0.9419  


Model weights saved in ../../model/train/minilm/model_epoch_46000_treino_2023-Jun-27 05:37:42.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.07it/s]


Epoch: 4 Step: 46400 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.047; accuracy: 0.940
Step: 46400 Amostras:1484764  92.627%  Momento: [2023-Jun-27 05:39:52]
lr: 7.42977e-08 Train loss: 0.0479  accuracy: 0.9420
Valid loss: 0.0469 accuracy: 0.9404  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.01it/s]


Epoch: 4 Step: 46800 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.046; accuracy: 0.941
Step: 46800 Amostras:1497564  93.426%  Momento: [2023-Jun-27 05:42:05]
lr: 6.62481e-08 Train loss: 0.0464  accuracy: 0.9448
Valid loss: 0.0464 accuracy: 0.9414  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.00it/s]


Epoch: 4 Step: 47200 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.941
Step: 47200 Amostras:1510364  94.224%  Momento: [2023-Jun-27 05:44:14]
lr: 5.81985e-08 Train loss: 0.0472  accuracy: 0.9437
Valid loss: 0.0463 accuracy: 0.9414  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.69it/s]


Epoch: 4 Step: 47600 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.047; accuracy: 0.941
Step: 47600 Amostras:1523164  95.023%  Momento: [2023-Jun-27 05:46:26]
lr: 5.01489e-08 Train loss: 0.0473  accuracy: 0.9427
Valid loss: 0.0465 accuracy: 0.9409  



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:05<00:00, 11.54it/s]


Epoch: 4 Step: 48000 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.046; accuracy: 0.941
Step: 48000 Amostras:1535964  95.821%  Momento: [2023-Jun-27 05:48:38]
lr: 4.20993e-08 Train loss: 0.0476  accuracy: 0.9431
Valid loss: 0.0464 accuracy: 0.9409  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.04it/s]


Epoch: 4 Step: 48400 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.942
Step: 48400 Amostras:1548764  96.620%  Momento: [2023-Jun-27 05:50:49]
lr: 3.40497e-08 Train loss: 0.0466  accuracy: 0.9437
Valid loss: 0.0462 accuracy: 0.9419  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.96it/s]


Epoch: 4 Step: 48800 Training loss: 0.05 accuracy: 0.946 
Em validação: loss: 0.046; accuracy: 0.941
Step: 48800 Amostras:1561564  97.418%  Momento: [2023-Jun-27 05:52:59]
lr: 2.60002e-08 Train loss: 0.0455  accuracy: 0.9457
Valid loss: 0.0464 accuracy: 0.9409  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.63it/s]
Configuration saved in ../../model/train/minilm/model_epoch_49200_treino_2023-Jun-27 05:55:10.pt/config.json


Epoch: 4 Step: 49200 Training loss: 0.05 accuracy: 0.941 
Em validação: loss: 0.046; accuracy: 0.943
Step: 49200 Amostras:1574364  98.217%  Momento: [2023-Jun-27 05:55:10]
lr: 1.79506e-08 Train loss: 0.0481  accuracy: 0.9413
Valid loss: 0.0462 accuracy: 0.9429  


Model weights saved in ../../model/train/minilm/model_epoch_49200_treino_2023-Jun-27 05:55:10.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.85it/s]


Epoch: 4 Step: 49600 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.046; accuracy: 0.943
Step: 49600 Amostras:1587164  99.016%  Momento: [2023-Jun-27 05:57:21]
lr: 9.90099e-09 Train loss: 0.0473  accuracy: 0.9430
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.79it/s]


Epoch: 4 Step: 50000 Training loss: 0.05 accuracy: 0.946 
Em validação: loss: 0.046; accuracy: 0.943
Step: 50000 Amostras:1599964  99.814%  Momento: [2023-Jun-27 05:59:31]
lr: 1.85140e-09 Train loss: 0.0452  accuracy: 0.9455
Valid loss: 0.0462 accuracy: 0.9429  


Train: 100%|██████████| 12523/12523 [1:08:20<00:00,  3.05it/s]
Epochs: 100%|██████████| 4/4 [4:32:54<00:00, 4093.70s/it]  
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.85it/s]
Configuration saved in ../../model/train/minilm/model_fim_treino_2023-Jun-27 06:00:04.pt/config.json


Epoch: 4 Step: 50092 Training loss: 0.04 accuracy: 0.007 
Em validação: loss: 0.046; accuracy: 0.943
** END ** Step: 50092 Amostras:1602896  99.997%  Momento: [2023-Jun-27 06:00:04]
lr: 0.00000e+00 Train loss: 0.0439  accuracy: 0.0069
Valid loss: 0.0462 accuracy: 0.9429  


Model weights saved in ../../model/train/minilm/model_fim_treino_2023-Jun-27 06:00:04.pt/pytorch_model.bin
Configuration saved in ../../model/train/minilm/model_fim_treino_2023-Jun-27 06:00:04.pt/config.json


Modelo final  (step 50092) salvo em ../../model/train/minilm/model_fim_treino_2023-Jun-27 06:00:04.pt


Model weights saved in ../../model/train/minilm/model_fim_treino_2023-Jun-27 06:00:04.pt/pytorch_model.bin


Modelo com melhor resultado em validação (step 49200) salvo após treino em ../../model/train/minilm/model_fim_treino_2023-Jun-27 06:00:04.pt
Shutting down background jobs, please wait a moment...
Done!
Waiting for the remaining 32 operations to synchronize with Neptune. Do not kill this process.
All 32 operations synced, thanks for waiting!
Explore the metadata in the Neptune app:
https://app.neptune.ai/marcusborela/IA386DD/e/IAD-104


In [142]:
print(history)

[{'train/loss': 27.61007399559021, 'train/accuracy': 0.727421875, 'train/steps': 400, 'train/n_examples': 12800, 'train/learning_rate': 1e-06, 'valid/loss': 6.575049271659245, 'valid/accuracy': 0.6097318768619663}, {'train/loss': 2.796027316749096, 'train/accuracy': 0.6128125, 'train/steps': 800, 'train/n_examples': 25600, 'train/learning_rate': 9.919504145536505e-07, 'valid/loss': 1.6967824848871382, 'valid/accuracy': 0.6216484607745779}, {'train/loss': 0.9683212437480688, 'train/accuracy': 0.65265625, 'train/steps': 1200, 'train/n_examples': 38400, 'train/learning_rate': 9.839008291073008e-07, 'valid/loss': 0.8495350532115452, 'valid/accuracy': 0.6261171797418074}, {'train/loss': 0.5432193081453442, 'train/accuracy': 0.691015625, 'train/steps': 1600, 'train/n_examples': 51200, 'train/learning_rate': 9.758512436609514e-07, 'valid/loss': 0.5295586685339609, 'valid/accuracy': 0.6658391261171798}, {'train/loss': 0.37736979246139524, 'train/accuracy': 0.7171875, 'train/steps': 2000, 'trai

In [143]:
gc.collect()

0

In [144]:
torch.cuda.empty_cache()

In [145]:
history2 = treina_modelo (model, hparam,  dataloader_train, dataloader_valid, parm_intervalo_print=1)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Epochs:   0%|          | 0/4 [00:00<?, ?it/s]
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.53it/s]
Configuration saved in ../../model/train/minilm/model_epoch_400_treino_2023-Jun-27 08:25:37.pt/config.json


Epoch: 1 Step: 400 Training loss: 0.04 accuracy: 0.947 
Em validação: loss: 0.046; accuracy: 0.943
Step: 400 Amostras:12800  0.799%  Momento: [2023-Jun-27 08:25:37]
lr: 0.00000e+00 Train loss: 0.0443  accuracy: 0.9467
Valid loss: 0.0462 accuracy: 0.9429  


Model weights saved in ../../model/train/minilm/model_epoch_400_treino_2023-Jun-27 08:25:37.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.63it/s]


Epoch: 1 Step: 800 Training loss: 0.05 accuracy: 0.941 
Em validação: loss: 0.046; accuracy: 0.943
Step: 800 Amostras:25600  1.597%  Momento: [2023-Jun-27 08:27:45]
lr: 0.00000e+00 Train loss: 0.0483  accuracy: 0.9407
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.59it/s]


Epoch: 1 Step: 1200 Training loss: 0.04 accuracy: 0.946 
Em validação: loss: 0.046; accuracy: 0.943
Step: 1200 Amostras:38400  2.396%  Momento: [2023-Jun-27 08:29:53]
lr: 0.00000e+00 Train loss: 0.0445  accuracy: 0.9463
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.08it/s]


Epoch: 1 Step: 1600 Training loss: 0.04 accuracy: 0.946 
Em validação: loss: 0.046; accuracy: 0.943
Step: 1600 Amostras:51200  3.194%  Momento: [2023-Jun-27 08:32:03]
lr: 0.00000e+00 Train loss: 0.0440  accuracy: 0.9463
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.50it/s]


Epoch: 1 Step: 2000 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.943
Step: 2000 Amostras:64000  3.993%  Momento: [2023-Jun-27 08:34:11]
lr: 0.00000e+00 Train loss: 0.0466  accuracy: 0.9442
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.57it/s]


Epoch: 1 Step: 2400 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.046; accuracy: 0.943
Step: 2400 Amostras:76800  4.791%  Momento: [2023-Jun-27 08:36:21]
lr: 0.00000e+00 Train loss: 0.0467  accuracy: 0.9447
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 13.94it/s]


Epoch: 1 Step: 2800 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.943
Step: 2800 Amostras:89600  5.590%  Momento: [2023-Jun-27 08:38:33]
lr: 0.00000e+00 Train loss: 0.0469  accuracy: 0.9440
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.33it/s]


Epoch: 1 Step: 3200 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.943
Step: 3200 Amostras:102400  6.388%  Momento: [2023-Jun-27 08:40:42]
lr: 0.00000e+00 Train loss: 0.0464  accuracy: 0.9444
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.53it/s]


Epoch: 1 Step: 3600 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.943
Step: 3600 Amostras:115200  7.187%  Momento: [2023-Jun-27 08:42:53]
lr: 0.00000e+00 Train loss: 0.0451  accuracy: 0.9441
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 14.95it/s]


Epoch: 1 Step: 4000 Training loss: 0.05 accuracy: 0.947 
Em validação: loss: 0.046; accuracy: 0.943
Step: 4000 Amostras:128000  7.985%  Momento: [2023-Jun-27 08:45:05]
lr: 0.00000e+00 Train loss: 0.0453  accuracy: 0.9470
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.22it/s]


Epoch: 1 Step: 4400 Training loss: 0.05 accuracy: 0.946 
Em validação: loss: 0.046; accuracy: 0.943
Step: 4400 Amostras:140800  8.784%  Momento: [2023-Jun-27 08:47:15]
lr: 0.00000e+00 Train loss: 0.0458  accuracy: 0.9464
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.52it/s]


Epoch: 1 Step: 4800 Training loss: 0.05 accuracy: 0.939 
Em validação: loss: 0.046; accuracy: 0.943
Step: 4800 Amostras:153600  9.582%  Momento: [2023-Jun-27 08:49:26]
lr: 0.00000e+00 Train loss: 0.0494  accuracy: 0.9391
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.03it/s]


Epoch: 1 Step: 5200 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.943
Step: 5200 Amostras:166400  10.381%  Momento: [2023-Jun-27 08:51:37]
lr: 0.00000e+00 Train loss: 0.0461  accuracy: 0.9442
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.01it/s]


Epoch: 1 Step: 5600 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.046; accuracy: 0.943
Step: 5600 Amostras:179200  11.179%  Momento: [2023-Jun-27 08:53:48]
lr: 0.00000e+00 Train loss: 0.0481  accuracy: 0.9423
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.96it/s]


Epoch: 1 Step: 6000 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.046; accuracy: 0.943
Step: 6000 Amostras:192000  11.978%  Momento: [2023-Jun-27 08:55:58]
lr: 0.00000e+00 Train loss: 0.0476  accuracy: 0.9425
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.94it/s]


Epoch: 1 Step: 6400 Training loss: 0.04 accuracy: 0.948 
Em validação: loss: 0.046; accuracy: 0.943
Step: 6400 Amostras:204800  12.776%  Momento: [2023-Jun-27 08:58:09]
lr: 0.00000e+00 Train loss: 0.0444  accuracy: 0.9475
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.93it/s]


Epoch: 1 Step: 6800 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.046; accuracy: 0.943
Step: 6800 Amostras:217600  13.575%  Momento: [2023-Jun-27 09:00:20]
lr: 0.00000e+00 Train loss: 0.0465  accuracy: 0.9430
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.92it/s]


Epoch: 1 Step: 7200 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.046; accuracy: 0.943
Step: 7200 Amostras:230400  14.374%  Momento: [2023-Jun-27 09:02:30]
lr: 0.00000e+00 Train loss: 0.0480  accuracy: 0.9424
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.91it/s]


Epoch: 1 Step: 7600 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.046; accuracy: 0.943
Step: 7600 Amostras:243200  15.172%  Momento: [2023-Jun-27 09:04:41]
lr: 0.00000e+00 Train loss: 0.0475  accuracy: 0.9430
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.91it/s]


Epoch: 1 Step: 8000 Training loss: 0.05 accuracy: 0.946 
Em validação: loss: 0.046; accuracy: 0.943
Step: 8000 Amostras:256000  15.971%  Momento: [2023-Jun-27 09:06:51]
lr: 0.00000e+00 Train loss: 0.0461  accuracy: 0.9458
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.87it/s]
Train:  67%|██████▋   | 8400/12523 [45:33<22:21,  3.07it/s]  
Epochs:  25%|██▌       | 1/4 [45:33<2:16:40, 2733.43s/it]

Epoch: 1 Step: 8400 Training loss: 0.05 accuracy: 0.947 
Em validação: loss: 0.046; accuracy: 0.943
Step: 8400 Amostras:268800  16.769%  Momento: [2023-Jun-27 09:09:02]
lr: 0.00000e+00 Train loss: 0.0450  accuracy: 0.9466
Valid loss: 0.0462 accuracy: 0.9429  
Parando por critério de early_stop na época 1 step 8400 sendo best_step 400 e ealy_stop 8000
Shutting down background jobs, please wait a moment...





Done!
Waiting for the remaining 56 operations to synchronize with Neptune. Do not kill this process.
All 56 operations synced, thanks for waiting!
Explore the metadata in the Neptune app:
https://app.neptune.ai/marcusborela/IA386DD/e/IAD-105


StatisticsError: mean requires at least one data point

In [147]:
history3 = treina_modelo (model, hparam,  dataloader_train, dataloader_valid, parm_intervalo_print=1)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Epochs:   0%|          | 0/4 [00:00<?, ?it/s]
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.63it/s]
Configuration saved in ../../model/train/minilm/model_epoch_400_treino_2023-Jun-27 10:03:39.pt/config.json


Epoch: 1 Step: 400 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.943
Step: 400 Amostras:12800  0.799%  Momento: [2023-Jun-27 10:03:39]
lr: 0.00000e+00 Train loss: 0.0473  accuracy: 0.9443
Valid loss: 0.0462 accuracy: 0.9429  


Model weights saved in ../../model/train/minilm/model_epoch_400_treino_2023-Jun-27 10:03:39.pt/pytorch_model.bin

[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.62it/s]


Epoch: 1 Step: 800 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.943
Step: 800 Amostras:25600  1.597%  Momento: [2023-Jun-27 10:05:48]
lr: 0.00000e+00 Train loss: 0.0470  accuracy: 0.9439
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.58it/s]


Epoch: 1 Step: 1200 Training loss: 0.05 accuracy: 0.946 
Em validação: loss: 0.046; accuracy: 0.943
Step: 1200 Amostras:38400  2.396%  Momento: [2023-Jun-27 10:07:56]
lr: 0.00000e+00 Train loss: 0.0465  accuracy: 0.9457
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.52it/s]


Epoch: 1 Step: 1600 Training loss: 0.05 accuracy: 0.941 
Em validação: loss: 0.046; accuracy: 0.943
Step: 1600 Amostras:51200  3.194%  Momento: [2023-Jun-27 10:10:05]
lr: 0.00000e+00 Train loss: 0.0476  accuracy: 0.9412
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.21it/s]


Epoch: 1 Step: 2000 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.046; accuracy: 0.943
Step: 2000 Amostras:64000  3.993%  Momento: [2023-Jun-27 10:12:14]
lr: 0.00000e+00 Train loss: 0.0454  accuracy: 0.9448
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.14it/s]


Epoch: 1 Step: 2400 Training loss: 0.04 accuracy: 0.948 
Em validação: loss: 0.046; accuracy: 0.943
Step: 2400 Amostras:76800  4.791%  Momento: [2023-Jun-27 10:14:24]
lr: 0.00000e+00 Train loss: 0.0448  accuracy: 0.9477
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 13.83it/s]


Epoch: 1 Step: 2800 Training loss: 0.05 accuracy: 0.947 
Em validação: loss: 0.046; accuracy: 0.943
Step: 2800 Amostras:89600  5.590%  Momento: [2023-Jun-27 10:16:36]
lr: 0.00000e+00 Train loss: 0.0451  accuracy: 0.9470
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.13it/s]


Epoch: 1 Step: 3200 Training loss: 0.04 accuracy: 0.948 
Em validação: loss: 0.046; accuracy: 0.943
Step: 3200 Amostras:102400  6.388%  Momento: [2023-Jun-27 10:18:46]
lr: 0.00000e+00 Train loss: 0.0438  accuracy: 0.9477
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.09it/s]


Epoch: 1 Step: 3600 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.046; accuracy: 0.943
Step: 3600 Amostras:115200  7.187%  Momento: [2023-Jun-27 10:20:56]
lr: 0.00000e+00 Train loss: 0.0479  accuracy: 0.9423
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.42it/s]


Epoch: 1 Step: 4000 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.046; accuracy: 0.943
Step: 4000 Amostras:128000  7.985%  Momento: [2023-Jun-27 10:23:08]
lr: 0.00000e+00 Train loss: 0.0472  accuracy: 0.9452
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.16it/s]


Epoch: 1 Step: 4400 Training loss: 0.05 accuracy: 0.946 
Em validação: loss: 0.046; accuracy: 0.943
Step: 4400 Amostras:140800  8.784%  Momento: [2023-Jun-27 10:25:23]
lr: 0.00000e+00 Train loss: 0.0459  accuracy: 0.9459
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 13.30it/s]


Epoch: 1 Step: 4800 Training loss: 0.05 accuracy: 0.940 
Em validação: loss: 0.046; accuracy: 0.943
Step: 4800 Amostras:153600  9.582%  Momento: [2023-Jun-27 10:27:35]
lr: 0.00000e+00 Train loss: 0.0485  accuracy: 0.9403
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.23it/s]


Epoch: 1 Step: 5200 Training loss: 0.05 accuracy: 0.947 
Em validação: loss: 0.046; accuracy: 0.943
Step: 5200 Amostras:166400  10.381%  Momento: [2023-Jun-27 10:29:45]
lr: 0.00000e+00 Train loss: 0.0454  accuracy: 0.9466
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:04<00:00, 15.10it/s]


Epoch: 1 Step: 5600 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.046; accuracy: 0.943
Step: 5600 Amostras:179200  11.179%  Momento: [2023-Jun-27 10:31:57]
lr: 0.00000e+00 Train loss: 0.0458  accuracy: 0.9448
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 16.00it/s]


Epoch: 1 Step: 6000 Training loss: 0.05 accuracy: 0.943 
Em validação: loss: 0.046; accuracy: 0.943
Step: 6000 Amostras:192000  11.978%  Momento: [2023-Jun-27 10:34:08]
lr: 0.00000e+00 Train loss: 0.0461  accuracy: 0.9431
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.97it/s]


Epoch: 1 Step: 6400 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.943
Step: 6400 Amostras:204800  12.776%  Momento: [2023-Jun-27 10:36:18]
lr: 0.00000e+00 Train loss: 0.0457  accuracy: 0.9442
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.79it/s]


Epoch: 1 Step: 6800 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.046; accuracy: 0.943
Step: 6800 Amostras:217600  13.575%  Momento: [2023-Jun-27 10:38:30]
lr: 0.00000e+00 Train loss: 0.0477  accuracy: 0.9417
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.80it/s]


Epoch: 1 Step: 7200 Training loss: 0.05 accuracy: 0.942 
Em validação: loss: 0.046; accuracy: 0.943
Step: 7200 Amostras:230400  14.374%  Momento: [2023-Jun-27 10:40:41]
lr: 0.00000e+00 Train loss: 0.0484  accuracy: 0.9422
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.80it/s]


Epoch: 1 Step: 7600 Training loss: 0.05 accuracy: 0.944 
Em validação: loss: 0.046; accuracy: 0.943
Step: 7600 Amostras:243200  15.172%  Momento: [2023-Jun-27 10:42:52]
lr: 0.00000e+00 Train loss: 0.0464  accuracy: 0.9435
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.79it/s]


Epoch: 1 Step: 8000 Training loss: 0.05 accuracy: 0.945 
Em validação: loss: 0.046; accuracy: 0.943
Step: 8000 Amostras:256000  15.971%  Momento: [2023-Jun-27 10:45:03]
lr: 0.00000e+00 Train loss: 0.0458  accuracy: 0.9449
Valid loss: 0.0462 accuracy: 0.9429  



[A
[A
[A
[A
[A
[A
[A
Valid: 100%|██████████| 63/63 [00:03<00:00, 15.82it/s]
Train:  67%|██████▋   | 8400/12523 [45:42<22:26,  3.06it/s]  
Epochs:  25%|██▌       | 1/4 [45:42<2:17:07, 2742.41s/it]


Epoch: 1 Step: 8400 Training loss: 0.04 accuracy: 0.945 
Em validação: loss: 0.046; accuracy: 0.943
Step: 8400 Amostras:268800  16.769%  Momento: [2023-Jun-27 10:47:13]
lr: 0.00000e+00 Train loss: 0.0448  accuracy: 0.9450
Valid loss: 0.0462 accuracy: 0.9429  
Parando por critério de early_stop na época 1 step 8400 sendo best_step 400 e ealy_stop 8000


Valid: 100%|██████████| 63/63 [00:03<00:00, 15.87it/s]
Configuration saved in ../../model/train/minilm/model_fim_treino_2023-Jun-27 10:47:17.pt/config.json


Epoch: 2 Step: 8400 Training loss: 0.04 accuracy: 0.000 
Em validação: loss: 0.046; accuracy: 0.943
** END ** Step: 8400 Amostras:268800  16.769%  Momento: [2023-Jun-27 10:47:17]
lr: 0.00000e+00 Train loss: 0.0448  accuracy: 0.0000
Valid loss: 0.0462 accuracy: 0.9429  


Model weights saved in ../../model/train/minilm/model_fim_treino_2023-Jun-27 10:47:17.pt/pytorch_model.bin
Configuration saved in ../../model/train/minilm/model_fim_treino_2023-Jun-27 10:47:18.pt/config.json


Modelo final  (step 8400) salvo em ../../model/train/minilm/model_fim_treino_2023-Jun-27 10:47:17.pt


Model weights saved in ../../model/train/minilm/model_fim_treino_2023-Jun-27 10:47:18.pt/pytorch_model.bin


Modelo com melhor resultado em validação (step 400) salvo após treino em ../../model/train/minilm/model_fim_treino_2023-Jun-27 10:47:18.pt
Shutting down background jobs, please wait a moment...
Done!
Waiting for the remaining 32 operations to synchronize with Neptune. Do not kill this process.
All 32 operations synced, thanks for waiting!
Explore the metadata in the Neptune app:
https://app.neptune.ai/marcusborela/IA386DD/e/IAD-106


In [148]:
%%time
accuracy, mean_loss = evaluate(model=model, dataloader=dataloader_valid, set_name='Valid')
print(f'Em validação: loss: {mean_loss:0.3f}; accuracy: {accuracy:0.3f}')

Valid: 100%|██████████| 63/63 [00:03<00:00, 16.88it/s]

Em validação: loss: 0.046; accuracy: 0.943
CPU times: user 3.72 s, sys: 20.2 ms, total: 3.74 s
Wall time: 3.74 s



