# Aula 9 - InPars

[Unicamp - IA368DD: Deep Learning aplicado a sistemas de busca.](https://www.cpg.feec.unicamp.br/cpg/lista/caderno_horario_show.php?id=1779)

Autor: Marcus Vinícius Borela de Castro

[Repositório no github](https://github.com/marcusborela/deep_learning_em_buscas_unicamp)


# Enunciado exercício


Objetivo: gerar dataset para treino de modelos de buscas usando a técnica do InPars e avaliar um modelo reranqueador treinado neste dataset no TREC-COVID:

Entrada: 3-5 exemplos few-shot + documento amostrado da coleção do TREC-COVID

Saída: query que seja relevante para o documento amostrado

É opcional fazer a etapa de filtragem usando as queries de maior prob descrita no Artigo.

Como modelo gerador, use um dos seguintes modelos:
ChatGPT-3.5-turbo: ~1 USD para cada 1k exemplos
FLAN-T5 (base, large ou XL), LLAMA-(7,13B), Alpaca-(7/13B), que são possiveis de rodar no Colab Pro.

Também tem a inference-api da HF: https://huggingface.co/inference-api.

Com exceção do LLAMA, é possivel usar zero-shot ao inves de few-shot.
Dado 1k-10k pares <query sintética; documento>, treinar um modelo reranqueador miniLM igual ao da aula 2/3.

Exemplos negativos (i.e., <query sintética; doc não relevant) vem do BM25: dado a query sintetica, retornar top 1000 com o BM25, e amostrar aleatoriamente alguns documentos como negativo

Começar treino do miniLM já treinado no MS MARCO

Avaliar no TREC-COVID e comparar com o reranqueador apenas treinado no MSMARCO

Nota: Também usar o dataset dos colegas para obter diversidade de exemplos: Assim que tiver gerado o dataset sintético, favor colocar na planilha, assim outras pessoas podem usa-lo.
- Para aumentar a aleatoriedade, seed usada deve o seu numero na planilha.

Colocar dataset no formato jsonlines:
{"query": query, "positive_doc_id": doc_id, "negative_doc_ids": [opcional]}\n 



Dicas: (do exercício da aula 2)

- Siga sempre um padrão ao criar os exemplos few-shot. Aqui tem uma pagina com dicas para prompt engineering: https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api


- Usar a API do LLAMA fornecida por nós (licença exclusiva para pesquisa). [Colab demo da API do LLAMA](https://colab.research.google.com/drive/1zZ-ch29LTicNPA62t2MaOwMROywnqUxf?usp=sharing) (obrigado, Thales Rogério)
- Opcionalmente, usar a API do code-davinci-002, que é de graça e trás resultados muito bons.
CUIDADO: NÃO USAR O TEXT-DAVINCI-002/003, que é pago

- Opcionalmente, usar a API do ChatGPT (gpt-3.5-turbo) que é barata: ~1 centavo de real por 1000 tokens (uma página)
  
- Opcionalmente, usar o Alpaca: https://alpaca-ai.ngrok.io/



Este caderno contempla os passos 1 (criar prompt) e 2 (gerar queries) do [fluxo de processamento](https://github.com/marcusborela/deep_learning_em_buscas_unicamp/blob/main/presentations/articles/Aula%208%20-%20InPars%20Process.png)

# Organizando o ambiente

## Importações

In [1]:
import requests  # para Llama
import time # para Llama

In [2]:
from tqdm import tqdm

In [3]:
import string

In [4]:
# import the OpenAI Python library for calling the OpenAI API
import openai

In [5]:
import gzip

In [6]:
import tiktoken

In [7]:
import getpass

In [8]:
import json, time

In [9]:
from psutil import virtual_memory

In [10]:
import pandas as pd
import os

In [11]:
import random
import numpy as np
import torch

In [12]:
import multiprocessing as mp
mp.set_start_method('spawn')


In [13]:
import transformers

  from .autonotebook import tqdm as notebook_tqdm


In [14]:
# para limpar texto
import ftfy

## Definindo paths

In [15]:
DIRETORIO_LOCAL = '/home/borela/fontes/deep_learning_em_buscas_unicamp/local'
DIRETORIO_TRABALHO = F'{DIRETORIO_LOCAL}/inpars'
DIRETORIO_TREC_COVID = F'{DIRETORIO_LOCAL}/trec_covid'

In [16]:
if os.path.exists(DIRETORIO_LOCAL):
    print('pasta já existia!')
else:
    os.makedirs(DIRETORIO_LOCAL)
    print('pasta criada!')


pasta já existia!


## Função de verificação de memória

In [17]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Mon May  1 02:50:02 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.03   Driver Version: 525.116.03   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 58%   48C    P8    28W / 370W |     58MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [18]:
def mostra_memoria(lista_mem=['cpu']):
  """
  Esta função exibe informações de memória da CPU e/ou GPU, conforme parâmetros fornecidos.

  Parâmetros:
  -----------
  lista_mem : list, opcional
      Lista com strings 'cpu' e/ou 'gpu'. 
      'cpu' - exibe informações de memória da CPU.
      'gpu' - exibe informações de memória da GPU (se disponível).
      O valor padrão é ['cpu'].

  Saída:
  -------
  A função não retorna nada, apenas exibe as informações na tela.

  Exemplo de uso:
  ---------------
  Para exibir informações de memória da CPU:
      mostra_memoria(['cpu'])

  Para exibir informações de memória da CPU e GPU:
      mostra_memoria(['cpu', 'gpu'])
  
  Autor: Marcus Vinícius Borela de Castro

  """  
  if 'cpu' in lista_mem:
    vm = virtual_memory()
    ram={}
    ram['total']=round(vm.total / 1e9,2)
    ram['available']=round(virtual_memory().available / 1e9,2)
    # ram['percent']=round(virtual_memory().percent / 1e9,2)
    ram['used']=round(virtual_memory().used / 1e9,2)
    ram['free']=round(virtual_memory().free / 1e9,2)
    ram['active']=round(virtual_memory().active / 1e9,2)
    ram['inactive']=round(virtual_memory().inactive / 1e9,2)
    ram['buffers']=round(virtual_memory().buffers / 1e9,2)
    ram['cached']=round(virtual_memory().cached/1e9 ,2)
    print(f"Your runtime RAM in gb: \n total {ram['total']}\n available {ram['available']}\n used {ram['used']}\n free {ram['free']}\n cached {ram['cached']}\n buffers {ram['buffers']}")
    print('/nGPU')
    gpu_info = !nvidia-smi
  if 'gpu' in lista_mem:
    gpu_info = '\n'.join(gpu_info)
    if gpu_info.find('failed') >= 0:
      print('Not connected to a GPU')
    else:
      print(gpu_info)


In [19]:
mostra_memoria(['cpu','gpu'])

Your runtime RAM in gb: 
 total 67.35
 available 59.13
 used 7.17
 free 58.39
 cached 1.68
 buffers 0.11
/nGPU
Mon May  1 02:50:02 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.03   Driver Version: 525.116.03   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 58%   47C    P8    28W / 370W |     58MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                         

### Vinculando pasta do google drive para salvar dados

## Fixando as seeds

In [20]:
def inicializa_seed(num_semente:int=123):
  """
  Inicializa as sementes para garantir a reprodutibilidade dos resultados do modelo.
  Essa é uma prática recomendada, já que a geração de números aleatórios pode influenciar os resultados do modelo.
  Além disso, a função também configura as sementes da GPU para garantir a reprodutibilidade quando se utiliza aceleração por GPU. 
  
  Args:
      num_semente (int): número da semente a ser utilizada para inicializar as sementes das bibliotecas.
  
  References:
      http://nlp.seas.harvard.edu/2018/04/03/attention.html
      https://github.com/CyberZHG/torch-multi-head-attention/blob/master/torch_multi_head_attention/multi_head_attention.py#L15
  """
  # Define as sementes das bibliotecas random, numpy e pytorch
  random.seed(num_semente)
  np.random.seed(num_semente)
  torch.manual_seed(num_semente)
  
  # Define as sementes da GPU
  torch.backends.cudnn.deterministic = True
  torch.backends.cudnn.benchmark = False

  #torch.cuda.manual_seed(num_semente)
  #Cuda algorithms
  #torch.backends.cudnn.deterministic = True


A semente dever ser meu número na planilha da tarefa no [classoom](https://docs.google.com/spreadsheets/u/0/d/1mvA8ZrN2FjPxIDwJkYJGsVdZNH49Z1w1L0_3pIAZhQo/htmlview#gid=0)

In [21]:
num_semente = 13 

In [22]:
inicializa_seed(num_semente)

## Preparando para debug e display

In [23]:
def config_display():
  """
  Esta função configura as opções de display do Pandas.
  """

  # Configurando formato saída Pandas
  # define o número máximo de colunas que serão exibidas
  pd.options.display.max_columns = None

  # define a largura máxima de uma linha
  pd.options.display.width = 1000

  # define o número máximo de linhas que serão exibidas
  pd.options.display.max_rows = 100

  # define o número máximo de caracteres por coluna
  pd.options.display.max_colwidth = 50

  # se deve exibir o número de linhas e colunas de um DataFrame.
  pd.options.display.show_dimensions = True

  # número de dígitos após a vírgula decimal a serem exibidos para floats.
  pd.options.display.precision = 7


In [24]:
def config_debug():
  """
  Esta função configura as opções de debug do PyTorch e dos pacotes
  transformers e datasets.
  """

  # Define opções de impressão de tensores para o modo científico
  torch.set_printoptions(sci_mode=True) 
  """
    Significa que valores muito grandes ou muito pequenos são mostrados em notação científica.
    Por exemplo, em vez de imprimir o número 0.0000012345 como 0.0000012345, 
    ele seria impresso como 1.2345e-06. Isso é útil em situações em que os valores dos tensores 
    envolvidos nas operações são muito grandes ou pequenos, e a notação científica permite 
    uma melhor compreensão dos números envolvidos.  
  """

  # Habilita detecção de anomalias no autograd do PyTorch
  torch.autograd.set_detect_anomaly(True)
  """
    Permite identificar operações que podem causar problemas de estabilidade numérica, 
    como gradientes explodindo ou desaparecendo. Quando essa opção é ativada, 
    o PyTorch verifica se há operações que geram valores NaN ou infinitos nos tensores 
    envolvidos no cálculo do gradiente. Se for detectado um valor anômalo, o PyTorch 
    interrompe a execução e gera uma exceção, permitindo que o erro seja corrigido 
    antes que se torne um problema maior.

    É importante notar que a detecção de anomalias pode ter um impacto significativo 
    no desempenho, especialmente em modelos grandes e complexos. Por esse motivo,
    ela deve ser usada com cautela e apenas para depuração.
  """

  # Configura variável de ambiente para habilitar a execução síncrona (bloqueante) das chamadas da API do CUDA.
  os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
  """
    o Python aguarda o término da execução de uma chamada da API do CUDA antes de executar a próxima chamada. 
    Isso é útil para depurar erros no código que envolve operações na GPU, pois permite que o erro seja capturado 
    no momento em que ocorre, e não depois de uma sequência de operações que pode tornar a origem do erro mais difícil de determinar.
    No entanto, é importante lembrar que esse modo de execução é significativamente mais lento do que a execução assíncrona, 
    que é o comportamento padrão do CUDA. Por isso, é recomendado utilizar esse comando apenas em situações de depuração 
    e removê-lo após a solução do problema.
  """

  # Define o nível de verbosity do pacote transformers para info
  # transformers.utils.logging.set_verbosity_info() 
  
  
  """
    Define o nível de detalhamento das mensagens de log geradas pela biblioteca Hugging Face Transformers 
    para o nível info. Isso significa que a biblioteca irá imprimir mensagens de log informativas sobre
    o andamento da execução, tais como tempo de execução, tamanho de batches, etc.

    Essas informações podem ser úteis para entender o que está acontecendo durante a execução da tarefa 
    e auxiliar no processo de debug. É importante notar que, em alguns casos, a quantidade de informações
    geradas pode ser muito grande, o que pode afetar o desempenho do sistema e dificultar a visualização
    das informações relevantes. Por isso, é importante ajustar o nível de detalhamento de acordo com a 
    necessidade de cada tarefa.
  
    Caso queira reduzir a quantidade de mensagens, comentar a linha acima e 
      descomentar as duas linhas abaixo, para definir o nível de verbosity como error ou warning
  
    transformers.utils.logging.set_verbosity_error()
    transformers.utils.logging.set_verbosity_warning()
  """


  # Define o modo verbose do xmode, que é utilizado no debug
  # %xmode Verbose 

  """
    Comando usado no Jupyter Notebook para controlar o modo de exibição das informações de exceções.
    O modo verbose é um modo detalhado que exibe informações adicionais ao imprimir as exceções.
    Ele inclui as informações de pilha de chamadas completa e valores de variáveis locais e globais 
    no momento da exceção. Isso pode ser útil para depurar e encontrar a causa de exceções em seu código.
    Ao usar %xmode Verbose, as informações de exceção serão impressas com mais detalhes e informações adicionais serão incluídas.

    Caso queira desabilitar o modo verbose e utilizar o modo plain, 
    comentar a linha acima e descomentar a linha abaixo:
    %xmode Plain
  """

  """
    Dica:
    1.  pdb (Python Debugger)
      Quando ocorre uma exceção em uma parte do código, o programa para a execução e exibe uma mensagem de erro 
      com informações sobre a exceção, como a linha do código em que ocorreu o erro e o tipo da exceção.

      Se você estiver depurando o código e quiser examinar o estado das variáveis ​​e executar outras operações 
      no momento em que a exceção ocorreu, pode usar o pdb (Python Debugger). Para isso, é preciso colocar o comando %debug 
      logo após ocorrer a exceção. Isso fará com que o programa pare na linha em que ocorreu a exceção e abra o pdb,
      permitindo que você explore o estado das variáveis, examine a pilha de chamadas e execute outras operações para depurar o código.


    2. ipdb
      O ipdb é um depurador interativo para o Python que oferece recursos mais avançados do que o pdb,
      incluindo a capacidade de navegar pelo código fonte enquanto depura.
      
      Você pode começar a depurar seu código inserindo o comando ipdb.set_trace() em qualquer lugar do 
      seu código onde deseja pausar a execução e começar a depurar. Quando a execução chegar nessa linha, 
      o depurador entrará em ação, permitindo que você examine o estado atual do seu programa e execute 
      comandos para investigar o comportamento.

      Durante a depuração, você pode usar comandos:
        next (para executar a próxima linha de código), 
        step (para entrar em uma função chamada na próxima linha de código) 
        continue (para continuar a execução normalmente até o próximo ponto de interrupção).

      Ao contrário do pdb, o ipdb é um depurador interativo que permite navegar pelo código fonte em que
      está trabalhando enquanto depura, permitindo que você inspecione variáveis, defina pontos de interrupção
      adicionais e até mesmo execute expressões Python no contexto do seu programa.
  """


In [25]:
config_display()

In [26]:
config_debug()

# Baixando o dataset para geração de queries (trec-covid)

In [27]:
if not os.path.exists(f"{DIRETORIO_TREC_COVID}/corpus.jsonl.gz"):
    !wget https://huggingface.co/datasets/BeIR/trec-covid/resolve/main/corpus.jsonl.gz
    !mv corpus.jsonl.gz {DIRETORIO_TREC_COVID}
    print('Baixado')
else:
    print('Já existia a pasta')

Já existia a pasta


In [28]:
# Descompacte o arquivo para a memória
with gzip.open(f'{DIRETORIO_TREC_COVID}/corpus.jsonl.gz', 'rt') as f:
    # Leia o conteúdo do arquivo descompactado
    corpus = [json.loads(line) for line in f]

In [29]:
# Exiba os dados carregados
print(f"{type(corpus)} len(corpus): {len(corpus)} corpus[0] {corpus[0]}" )

<class 'list'> len(corpus): 171332 corpus[0] {'_id': 'ug7v899j', 'title': 'Clinical features of culture-proven Mycoplasma pneumoniae infections at King Abdulaziz University Hospital, Jeddah, Saudi Arabia', 'text': 'OBJECTIVE: This retrospective chart review describes the epidemiology and clinical features of 40 patients with culture-proven Mycoplasma pneumoniae infections at King Abdulaziz University Hospital, Jeddah, Saudi Arabia. METHODS: Patients with positive M. pneumoniae cultures from respiratory specimens from January 1997 through December 1998 were identified through the Microbiology records. Charts of patients were reviewed. RESULTS: 40 patients were identified, 33 (82.5%) of whom required admission. Most infections (92.5%) were community-acquired. The infection affected all age groups but was most common in infants (32.5%) and pre-school children (22.5%). It occurred year-round but was most common in the fall (35%) and spring (30%). More than three-quarters of patients (77.5%

# Construção do prompt

In [30]:
few_shot_example_1 = random.randint(0, len(corpus))
few_shot_example_2 = random.randint(0, len(corpus))
few_shot_example_3 = random.randint(0, len(corpus))

In [31]:
lista_few_shot_example = [few_shot_example_1, few_shot_example_2, few_shot_example_3]
print(lista_few_shot_example)

[67897, 76220, 48686]


In [32]:
assert few_shot_example_1 == 67897, f"Números gerados não repetem o prompt construído"
assert few_shot_example_2 == 76220, f"Números gerados não repetem o prompt construído"
assert few_shot_example_3 == 48686, f"Números gerados não repetem o prompt construído"
# Abaixo lista construída para prompt (original)
# lista_few_shot_example = [67897, 76220, 48686]

In [33]:
print([(id, corpus[id]) for id in lista_few_shot_example])

[(67897, {'_id': 'i138ax5u', 'title': 'Oncology during the COVID-19 pandemic: challenges, dilemmas and the psychosocial impact on cancer patients', 'text': 'COVID-19 has caused unprecedented societal turmoil, triggering a rapid, still ongoing, transformation of healthcare provision on a global level. In this new landscape, it is highly important to acknowledge the challenges this pandemic poses on the care of the particularly vulnerable cancer patients and the subsequent psychosocial impact on them. We have outlined our clinical experience in managing patients with gastrointestinal, hematological, gynaecological, dermatological, neurological, thyroid, lung and paediatric cancers in the COVID-19 era and have reviewed the emerging literature around barriers to care of oncology patients and how this crisis affects them. Moreover, evolving treatment strategies and novel ways of addressing the needs of oncology patients in the new context of the pandemic are discussed.', 'metadata': {'url':

Geração de perguntas usando em paralelo, via browser, o [chatgpt](https://chat.openai.com/)

Prompt usado:

        """Consider the text below: 

        Text: <text>

        Generate a question such that the answer is inside the text above. Suppose you are a <patient/doctor>.

        Question:"""


In [34]:
question_shot_example_1_doctor = "What is the importance of acknowledging the challenges posed by the COVID-19 pandemic on vulnerable cancer patients?"
question_shot_example_1_patient = "what does the text suggest are the challenges faced by vulnerable cancer patients during the COVID-19 pandemic?"


In [35]:
question_shot_example_1 = question_shot_example_1_doctor

In [36]:
question_shot_example_2_doctor = "What method was used to investigate the presence of HCoV-NL63 in the collected specimens from children with acute respiratory infection?"
question_shot_example_2_patient = "What is the prevalence of coronavirus NL63 among children under the age of five years with acute respiratory infection according to the study?"


In [37]:
question_shot_example_2 = question_shot_example_2_doctor

In [38]:
question_shot_example_3_patient = 'What is the significance of early recognition and targeted treatment in improving clinical outcomes in ARDS?'
question_shot_example_3_doctor = 'What does this article cover regarding acute respiratory distress syndrome (ARDS)?'

In [39]:
question_shot_example_3 = question_shot_example_3_patient

In [40]:
shortened_text_shot_example_1 = "COVID-19 has caused unprecedented societal turmoil, triggering a rapid, still ongoing, transformation of healthcare provision on a global level. In this new landscape, it is highly important to acknowledge the challenges this pandemic poses on the care of the particularly vulnerable cancer patients and the subsequent psychosocial impact on them. We have reviewed the emerging literature around barriers to care of oncology patients and how this crisis affects them. Moreover, evolving treatment strategies and novel ways of addressing the needs of oncology patients in the new context of the pandemic are discussed."

In [41]:
shortened_text_shot_example_2 = "In a retrospective study, we investigated the incidence of coronavirus infection in children under the age of five years Methods: We collected 138 specimens (nasal and throat swabs) from children less than five-years-old with acute respiratory infection from October 2018 to December 2019 Then, HCoV-NL63 was investigated using real-time PCR"

In [42]:
shortened_text_shot_example_3 = "Acute respiratory distress syndrome (ARDS) is an inflammatory form of lung injury in response to various clinical entities or inciting events, quite frequently due to an underlying infection. Morbidity and mortality associated with ARDS are significant. Hence, early recognition and targeted treatment are crucial to improve clinical outcomes."

In [43]:
instrucao = 'Instruction: Based on the text, generate just one question succinctly, answered by the text, avoiding repeating words. See examples below:\n\n'
exemplo1 = f'Text: {shortened_text_shot_example_1}\n\nQuestion: {question_shot_example_1}\n\n'
exemplo2 = f'Text: {shortened_text_shot_example_2}\n\nQuestion: {question_shot_example_2}\n\n'
exemplo3 = f'Text: {shortened_text_shot_example_3}\n\nQuestion: {question_shot_example_3}\n\n'
texto_a_completar = 'Text: {context}\n\nQuestion: '


In [44]:
prompt_for_question = instrucao + exemplo1 + exemplo2 + exemplo3 + texto_a_completar

In [45]:
print(prompt_for_question)

Instruction: Based on the text, generate just one question succinctly, answered by the text, avoiding repeating words. See examples below:

Text: COVID-19 has caused unprecedented societal turmoil, triggering a rapid, still ongoing, transformation of healthcare provision on a global level. In this new landscape, it is highly important to acknowledge the challenges this pandemic poses on the care of the particularly vulnerable cancer patients and the subsequent psychosocial impact on them. We have reviewed the emerging literature around barriers to care of oncology patients and how this crisis affects them. Moreover, evolving treatment strategies and novel ways of addressing the needs of oncology patients in the new context of the pandemic are discussed.

Question: What is the importance of acknowledging the challenges posed by the COVID-19 pandemic on vulnerable cancer patients?

Text: In a retrospective study, we investigated the incidence of coronavirus infection in children under th

In [46]:
text_test = corpus[100]['text'][:300]
print(text_test)

The thiol-disulfide oxidoreductase thioredoxin-1 (Trx1) is known to be secreted by leukocytes and to exhibit cytokine-like properties. Extracellular effects of Trx1 require a functional active site, suggesting a redox-based mechanism of action. However, specific cell surface proteins and pathways co


In [47]:
prompt_teste = prompt_for_question.replace('{context}', text_test)

In [48]:
print(prompt_teste)

Instruction: Based on the text, generate just one question succinctly, answered by the text, avoiding repeating words. See examples below:

Text: COVID-19 has caused unprecedented societal turmoil, triggering a rapid, still ongoing, transformation of healthcare provision on a global level. In this new landscape, it is highly important to acknowledge the challenges this pandemic poses on the care of the particularly vulnerable cancer patients and the subsequent psychosocial impact on them. We have reviewed the emerging literature around barriers to care of oncology patients and how this crisis affects them. Moreover, evolving treatment strategies and novel ways of addressing the needs of oncology patients in the new context of the pandemic are discussed.

Question: What is the importance of acknowledging the challenges posed by the COVID-19 pandemic on vulnerable cancer patients?

Text: In a retrospective study, we investigated the incidence of coronavirus infection in children under th

# Gerando queries

## Com modelo opensource EleutherAI/gpt-j-6B

Fonte de apoio para construção do código: [Projeto Extractive Q&A - Performance Comparison between Learning Methods: Context and Transfer - exqa-complearning - Borela e Léo em diciplina anterior](https://github.com/marcusborela/exqa-complearning)

In [49]:
from transformers import GPT2Tokenizer, GPTJForCausalLM

In [50]:
from typing import  List, Dict

In [51]:
from transformers import StoppingCriteria, StoppingCriteriaList, pipeline

In [52]:
class KeywordsStoppingCriteria(StoppingCriteria):
    """
      Fonte: https://stackoverflow.com/questions/69403613/how-to-early-stop-autoregressive-model-with-a-list-of-stop-words
    """
    def __init__(self, keywords_ids:list):
        self.keywords = keywords_ids

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        if input_ids[0][-1] in self.keywords:
            return True
        return False

In [66]:
class TextGeneration(): # pylint: disable=missing-class-docstring

    """
        Fonte dos parâmetros

            https://github.com/huggingface/transformers/blob/main/src/transformers/generation_utils.py
            linha: 920

                length_penalty (`float`, *optional*, defaults to 1.0):
                 Exponential penalty to the length. 1.0 means that the beam score is penalized by the sequence length.
                 0.0 means no penalty. Set to values < 0.0 in order to encourage the model to generate longer
                 sequences, to a value > 0.0 in order to encourage the model to produce shorter sequences.

                temperature (`float`, *optional*, defaults to 1.0):
                The value used to module the next token probabilities.

                num_return_sequences(`int`, *optional*, defaults to 1):
                The number of independently computed returned sequences for each element in the batch.
    """

    _dict_parameters_example = {
                    # parâmetros tarefa
                    "num_top_k": 1,
                    "num_max_answer_length":64,
                    # parâmetros context - text generation
                    'list_stop_words': ['.', '\n', '!'],
                    "val_length_penalty":2,
                    'if_do_sample': False,
                    'val_temperature': 1,
}

    def __init__(self,
                 parm_name_model: str,
                 parm_dict_config:Dict):


        # para evitar estouro memória gpu
        torch.cuda.empty_cache()

        # Since we are using our model only for inference, switch to `eval` mode:
        self.name_model = parm_name_model
        self.path_model = "/home/borela/fontes/deep_learning_em_buscas_unicamp/modelo/"+self.name_model
        # print(self.path_model)

        self.name_device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
        self.device = torch.device(self.name_device)


        self.pipe = pipeline("text-generation", model=self.get_model(self.path_model).to(self.device).eval(),\
                              tokenizer=self.get_tokenizer(self.path_model),\
                              device=self.device,framework='pt')

        # setting to avoid warning msg
        #   Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
        #   The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior.
        #   Please pass your input's `attention_mask` to obtain reliable results.
        self.tokenizer.pad_token = self.tokenizer.eos_token

        self.atualiza_parametros_resposta(parm_dict_config)

        # Se desejar salvar o modelo, para ficar mais rápido a carga
        #   gptj6b: muda de 4 minutos para 12 segundos
        # torch.save(self.pipe.model, self.path_model+"/pytorch_model_saved.pt")
        self.max_seq_len = self.pipe.model.config.max_position_embeddings


    def atualiza_parametros_resposta(self, parm_dict_config:Dict):

        if parm_dict_config is None:
            parm_dict_config = deepcopy(Reader._dict_parameters_example)

        self.num_top_k = parm_dict_config["num_top_k"]
        # self.num_doc_stride = parm_dict_config["num_doc_stride"]
        self.num_max_answer_length = parm_dict_config["num_max_answer_length"]
        self.if_do_sample = parm_dict_config['if_do_sample']
        self.val_temperature = parm_dict_config['val_temperature']
        self.list_stop_words = parm_dict_config["list_stop_words"]
        self.val_length_penalty = parm_dict_config["val_length_penalty"]
        self.prompt = parm_dict_config["prompt_missing_context"]
        assert '{context}' in self.prompt, "Prompt must have {context} in the text."
        self.prompt_length = len(self.pipe.tokenizer(self.prompt)['input_ids'])
        stop_ids = [self.tokenizer.encode(w)[0] for w in self.list_stop_words]
        self.stop_criteria = KeywordsStoppingCriteria(stop_ids)
        # para evitar gerar máscar com "___"
        # self.bad_words = ['_','__','___','____', '______', '________',  '________________', ]
        self.bad_words = [self.tokenizer.decode(pos) for pos in range(self.tokenizer.vocab_size) if '_' in self.tokenizer.decode(pos)]
        self.bad_words_ids = [[self.tokenizer.get_vocab()[word]] \
                                    for word in self.bad_words \
                                      if word in self.tokenizer.get_vocab()]
        # self.bad_words.append('\xa0') # id 1849
        self.bad_words_ids += [[1849]]

    @property
    def info(self):
        return {"name":self.name_model,
                "device": self.name_device,
                "top_k": self.num_top_k,
                "if_do_sample": self.if_do_sample,
                "val_temperature": self.val_temperature,
                "list_stop_words": self.list_stop_words,
                "val_length_penalty": self.val_length_penalty,
                "length_penalty": self.val_length_penalty,
                "num_max_answer_length":self.num_max_answer_length,
                "max_seq_len": self.max_seq_len,
                "bad_words": self.bad_words,
                "prompt_length": self.prompt_length,
                "prompt" : self.prompt
                }

    @property
    def tokenizer(self):
        return self.pipe.tokenizer


    @property
    def model(self):
        return self.pipe.model

    @staticmethod
    def get_model(path_model: str,
                  *args, **kwargs):
        # return GPTJForCausalLM.from_pretrained(path_model, *args, **kwargs)
        # return GPTNeoForCausalLM.from_pretrained(path_model, *args, **kwargs)
        if 'neo' in path_model:
            return GPTNeoForCausalLM.from_pretrained(
                path_model)
        else:
            return GPTJForCausalLM.from_pretrained(
            path_model,
            revision="float16",
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True)


    @staticmethod
    def get_tokenizer(path_model: str,
                      *args, batch_size: int = 128, **kwargs) -> GPT2Tokenizer:
        return GPT2Tokenizer.from_pretrained(path_model, use_fast=False, *args, **kwargs)

    def generate_text(self,
                      texto_contexto: str,
                      parm_dict_config_generation:Dict=None,
                      parm_print:bool=False) -> List[Dict]:

        assert isinstance(texto_contexto, str), f"Expected just one context, not {type(texto_contexto)}"


        prompt = self.prompt.replace('{context}', texto_contexto)

        #print('Tokenizing...')
        # tokenized = self.tokenizer(prompt, return_tensors="pt")
        # prompt_len = len(# .input_ids[0])
        # print(f"chars(prompt): {len(prompt)}")
        # print(f"tokens(prompt): {prompt_len}")

        val_temperature = self.val_temperature
        if_do_sample = self.if_do_sample
        num_top_k = self.num_top_k
        val_length_penalty = self.val_length_penalty
        list_stop_words = self.list_stop_words
        stop_criteria = self.stop_criteria
        bad_words = self.bad_words
        bad_words_ids = self.bad_words_ids
        num_max_answer_length = self.num_max_answer_length

        if parm_dict_config_generation:
            if "val_temperature" in parm_dict_config_generation:
                val_temperature = parm_dict_config_generation["val_temperature"]

            if "if_do_sample" in parm_dict_config_generation:
                if_do_sample = parm_dict_config_generation["if_do_sample"]

            if "num_top_k" in parm_dict_config_generation:
                num_top_k = parm_dict_config_generation["num_top_k"]

            if "val_length_penalty" in parm_dict_config_generation:
                val_length_penalty = parm_dict_config_generation["val_length_penalty"]

            if "list_stop_words" in parm_dict_config_generation:
                list_stop_words = parm_dict_config_generation["list_stop_words"]
                stop_ids = [self.tokenizer.encode(w)[0] for w in list_stop_words]
                stop_criteria = KeywordsStoppingCriteria(stop_ids)                

            if "bad_words" in parm_dict_config_generation:
                bad_words = parm_dict_config_generation['bad_words']
                bad_words_ids = [self.pipe.tokenizer.get_vocab()[word] for word in bad_words]

            if "num_max_answer_length" in parm_dict_config_generation:
                num_max_answer_length = parm_dict_config_generation["num_max_answer_length"]

        length_text_input = len(self.tokenizer(texto_contexto)['input_ids'])
        max_length = length_text_input + self.prompt_length + num_max_answer_length

        if max_length > self.max_seq_len:
            pos_truncar = self.max_seq_len-num_max_answer_length-3
            prompt = prompt[:pos_truncar]
            print(f"Max_length desejado {max_length} é maior do que o tratado pelo modelo {self.max_seq_len}. Truncado em {pos_truncar}.")


        if parm_print:
            print({"num_top_k": num_top_k,
                    "if_do_sample": if_do_sample,
                    "val_temperature": val_temperature,
                    "val_length_penalty": val_length_penalty,
                    "list_stop_words": list_stop_words,
                    "bad_words": bad_words,
                    "num_max_answer_length":num_max_answer_length,
                    "max_length": max_length,
                    "prompt" : prompt
                    })



        respostas = self.pipe(prompt,
            return_tensors=False,
            return_text=True,
            return_full_text=False,
            clean_up_tokenization_spaces=True,
            # max_question_len = len(texto_pergunta), # número de tokens.. #chars>#tokens
            max_length= max_length,
            # min_length = 2,
            num_return_sequences = num_top_k,
            num_beams =  num_top_k,
            do_sample = if_do_sample,
            temperature = val_temperature,
            length_penalty = val_length_penalty, # self.num_length_penalty \
            stopping_criteria=StoppingCriteriaList([stop_criteria]),
            bad_words_ids=bad_words_ids
            )

        if not isinstance(respostas, list):
            lista_respostas = [respostas]
        else:
            lista_respostas = respostas

        if len(lista_respostas) < self.num_top_k:
            print(f"Warning: #answers=len(lista_respostas)<self.num_top_k {len(lista_respostas)} < {self.num_top_k}. ")

        # retira caracteres inválidos
        for resp in lista_respostas:
            resp['generated_text'] = ftfy.fix_text(resp['generated_text']).strip()

        return lista_respostas[:self.num_top_k]

In [54]:
import gc

In [68]:
gc.collect()

31

In [69]:
torch.cuda.empty_cache()

In [57]:
dict_config_model = {
                # parâmetros complementares
                "num_top_k": 1,
                "num_max_answer_length":80,
                "prompt_missing_context" : prompt_for_question,
                # parâmetros context
                'list_stop_words': ['\n', '?', 'Text:', '\n\n'],
                'if_do_sample': False,
                "val_length_penalty":0.,
                'val_temperature': 0.1,
}


In [58]:
name_model = "EleutherAI/gpt-j-6B"
dict_config_model['list_stop_words'] = ['.', '\n','!']


In [59]:
texto_teste = prompt_for_question.replace('{context}', corpus[100]['text'][:500])

In [60]:
print(texto_teste)

Instruction: Based on the text, generate just one question succinctly, answered by the text, avoiding repeating words. See examples below:

Text: COVID-19 has caused unprecedented societal turmoil, triggering a rapid, still ongoing, transformation of healthcare provision on a global level. In this new landscape, it is highly important to acknowledge the challenges this pandemic poses on the care of the particularly vulnerable cancer patients and the subsequent psychosocial impact on them. We have reviewed the emerging literature around barriers to care of oncology patients and how this crisis affects them. Moreover, evolving treatment strategies and novel ways of addressing the needs of oncology patients in the new context of the pandemic are discussed.

Question: What is the importance of acknowledging the challenges posed by the COVID-19 pandemic on vulnerable cancer patients?

Text: In a retrospective study, we investigated the incidence of coronavirus infection in children under th

In [62]:
raise Exception("parar aqui")

Exception: parar aqui

In [70]:
%%time
generator = TextGeneration(parm_name_model=name_model, 
        parm_dict_config=dict_config_model)



CPU times: user 20 s, sys: 26.6 s, total: 46.6 s
Wall time: 2min 39s


In [71]:
len(generator.tokenizer(texto_teste)['input_ids'])

479

In [72]:
print(f"generator.info()\n {generator.info}")

generator.info()
 {'name': 'EleutherAI/gpt-j-6B', 'device': 'cuda:0', 'top_k': 1, 'if_do_sample': False, 'val_temperature': 0.1, 'list_stop_words': ['.', '\n', '!'], 'val_length_penalty': 0.0, 'length_penalty': 0.0, 'num_max_answer_length': 80, 'max_seq_len': 2048, 'bad_words': ['_', '__', '____', '________', ' _', '________________', '________________________________', ' __', '._', '___', '_-', '_{', '______', '________________________________________________________________', '(_', '_____', '[_', '_-_', '________________________', '_______', ' $_', '_(', ' (_', ' ______', '_.', ' "_', ' ___', '._', '/_', '_>'], 'prompt_length': 371, 'prompt': 'Instruction: Based on the text, generate just one question succinctly, answered by the text, avoiding repeating words. See examples below:\n\nText: COVID-19 has caused unprecedented societal turmoil, triggering a rapid, still ongoing, transformation of healthcare provision on a global level. In this new landscape, it is highly important to ackn

In [73]:
resultado = generator.generate_text(texto_contexto=corpus[100]['text'][:500], parm_print=True)

{'num_top_k': 1, 'if_do_sample': False, 'val_temperature': 0.1, 'val_length_penalty': 0.0, 'list_stop_words': ['.', '\n', '!'], 'bad_words': ['_', '__', '____', '________', ' _', '________________', '________________________________', ' __', '._', '___', '_-', '_{', '______', '________________________________________________________________', '(_', '_____', '[_', '_-_', '________________________', '_______', ' $_', '_(', ' (_', ' ______', '_.', ' "_', ' ___', '._', '/_', '_>'], 'num_max_answer_length': 80, 'max_length': 562, 'prompt': 'Instruction: Based on the text, generate just one question succinctly, answered by the text, avoiding repeating words. See examples below:\n\nText: COVID-19 has caused unprecedented societal turmoil, triggering a rapid, still ongoing, transformation of healthcare provision on a global level. In this new landscape, it is highly important to acknowledge the challenges this pandemic poses on the care of the particularly vulnerable cancer patients and the su

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [74]:
resultado

[{'generated_text': 'What is the significance of identifying cell surface proteins and pathways coupling extracellular Trx1 redox activity to cellular responses?'}]

In [None]:
dict_config_generation = {
                # parâmetros complementares
                "num_top_k": 2,
                "num_max_answer_length":50, # o prompt tem 402
                "prompt_missing_context" : prompt_for_question,
                'list_stop_words': ['\n', '?', 'Text:', '\n\n'],
                'if_do_sample': True,
                "val_length_penalty":0.,
                'val_temperature': 1.,
}


In [None]:
resultado = generator.generate_text(texto_contexto=corpus[100]['text'][:500], 
                                                parm_print=False)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
resultado

[{'generated_text': 'What is the significance of identifying cell surface proteins and pathways coupling extracellular Trx1 redox activity to cellular responses?'}]

In [None]:
# Limpa o cache da memória da GPU
# torch.cuda.empty_cache()

In [None]:
resultado = generator.pipe(texto_teste,
            return_tensors=True,
            output_scores=True,
            return_text=True,
            return_full_text=False,
            clean_up_tokenization_spaces=True,
            max_length= 550, # prompt_len: 402 + self.num_max_answer_length,
            num_return_sequences = 1,
            num_beams =  1,
            do_sample = False,
            temperature = 0.1,
            bad_words_ids=generator.bad_words_ids,
            stopping_criteria=StoppingCriteriaList([generator.stop_criteria])
            
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
resultado

[{'generated_text': '\xa0What is the significance of identifying cell surface proteins and pathways coupling extracellular Trx1 redox activity to cellular responses?'}]

### Gerar queries para trecc

In [75]:
import logging
logging.disable(logging.WARNING)

In [76]:
generator.info

{'name': 'EleutherAI/gpt-j-6B',
 'device': 'cuda:0',
 'top_k': 1,
 'if_do_sample': False,
 'val_temperature': 0.1,
 'list_stop_words': ['.', '\n', '!'],
 'val_length_penalty': 0.0,
 'length_penalty': 0.0,
 'num_max_answer_length': 80,
 'max_seq_len': 2048,
 'bad_words': ['_',
  '__',
  '____',
  '________',
  ' _',
  '________________',
  '________________________________',
  ' __',
  '._',
  '___',
  '_-',
  '_{',
  '______',
  '________________________________________________________________',
  '(_',
  '_____',
  '[_',
  '_-_',
  '________________________',
  '_______',
  ' $_',
  '_(',
  ' (_',
  ' ______',
  '_.',
  ' "_',
  ' ___',
  '._',
  '/_',
  '_>'],
 'prompt_length': 371,
 'prompt': 'Instruction: Based on the text, generate just one question succinctly, answered by the text, avoiding repeating words. See examples below:\n\nText: COVID-19 has caused unprecedented societal turmoil, triggering a rapid, still ongoing, transformation of healthcare provision on a global level. I

In [97]:
dict_queries_geradas = {}
dict_erros_doctos = {}
for cnt, docto in tqdm(enumerate(corpus)):
    try:
        if len(docto['text']) > 20:
            if ('title' in docto) and len(docto['title']) >= 5 :
                texto = docto['title'] + '. ' + docto['text'] 
            else:
                texto = docto['text'] 
            resultado = generator.generate_text(texto_contexto=texto)
            dict_queries_geradas[docto['_id']] = resultado[0]['generated_text']
    except Exception as e:
        dict_erros_doctos[docto['_id']] = str(e)
        print(f"Erro ao gerar query para o documento {docto['_id']}: {e}")            

2333it [56:18,  1.37s/it]

Max_length desejado 10449 é maior do que o tratado pelo modelo 2048. Truncado em 1965.


2782it [1:07:49,  1.71s/it]

Max_length desejado 25243 é maior do que o tratado pelo modelo 2048. Truncado em 1965.


4863it [1:59:02,  1.83s/it]

Max_length desejado 2095 é maior do que o tratado pelo modelo 2048. Truncado em 1965.


9999it [3:19:17,  1.48s/it]

Max_length desejado 2473 é maior do que o tratado pelo modelo 2048. Truncado em 1965.


11493it [3:46:48,  1.02s/it]

In [None]:
print(f"Foram geradas queries para {len(dict_queries_geradas)} documentos. Deixados sem queries: {len(corpus) - len(dict_queries_geradas)}")

Foram geradas queries para 6 documentos. Deixados sem queries: 171326


In [None]:
import pickle

In [None]:
with open(f"{DIRETORIO_TRABALHO}/queries_geradas.pickle", 'wb') as outputFile:
    pickle.dump(dict_queries_geradas, outputFile, pickle.HIGHEST_PROTOCOL)
        

NameError: name 'pickle' is not defined

In [None]:
with open(f"{DIRETORIO_TRABALHO}/erros_doctos_geracao.pickle", 'wb') as outputFile:
    pickle.dump(dict_erros_doctos, outputFile, pickle.HIGHEST_PROTOCOL)

In [94]:
dict_queries_geradas

{'ug7v899j': 'What is the significance of the finding that infections were more common in infants and preschool children?',
 '02tnwd4m': 'What is the significance of the presumed contribution of NO• to inflammatory diseases of the lung?',
 'ejv2xln0': 'What is the significance of SP-D in the innate response to inhaled microorganisms and organic antigens?',
 '2b73a28n': 'What is the significance of ET-1 in lung disease?',
 '9785vg6d': 'What is the significance of the mucosal responses in the respiratory epithelium in response to pneumovirus infection?',
 'zjufx4fo': 'What is the significance of the body TRS in discontinuous RNA synthesis?'}

In [None]:
Para cada docto que teve query gerada:
   Calcular índice de relevância conforme rankeador

In [None]:
Ordenar pelos primeiros 10 mil documentos

# Outras opções de modelo para geração

## Chat GPT (Modelo gpt-3.5-turbo)

Para uso do gpt-3.5-turbo, usamos como referência o caderno da [openai: How_to_format_inputs_to_ChatGPT_models.ipynb](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb)

### How to format inputs to ChatGPT models

ChatGPT is powered by `gpt-3.5-turbo`, OpenAI's most advanced model.

You can build your own applications with `gpt-3.5-turbo` using the OpenAI API.

Chat models take a series of messages as input, and return an AI-written message as output.

This guide illustrates the chat format with a few example API calls.

### An example chat API call

A chat API call has two required inputs:
- `model`: the name of the model you want to use (e.g., `gpt-3.5-turbo`)
- `messages`: a list of message objects, where each object has at least two fields:
    - `role`: the role of the messenger (either `system`, `user`, or `assistant`)
    - `content`: the content of the message (e.g., `Write me a beautiful poem`)

Typically, a conversation will start with a system message, followed by alternating user and assistant messages, but you are not required to follow this format.

Let's look at an example chat API calls to see how the chat format works in practice.

In [None]:
MODEL_NAME_CHAT_GPT = "gpt-3.5-turbo"

In [None]:
openai.api_key = getpass.getpass("Entre a OPENAI_API_KEY")

In [None]:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

In [None]:
os.environ['OPENAI_DISABLE_SSL_CERT_VERIFICATION'] = '1'

In [None]:
os.environ['verify_ssl_certs']=0

TypeError: str expected, not int

In [None]:
response = openai.ChatCompletion.create(
    model=MODEL_NAME_CHAT_GPT,
    messages=[
        {"role": "system", "content": "You are a helpful question generator."},
        {"role": "user", "content": prompt_teste},
    ],
    temperature=0
)

APIConnectionError: Error communicating with OpenAI: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))

In [None]:
response


<OpenAIObject chat.completion id=chatcmpl-7B7PZvfnx1ejDQaycSOYTMfWLNh2C at 0x7f3d88a36a40> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "What is the suggested mechanism of action for the extracellular effects of thioredoxin-1 (Trx1)?",
        "role": "assistant"
      }
    }
  ],
  "created": 1682884453,
  "id": "chatcmpl-7B7PZvfnx1ejDQaycSOYTMfWLNh2C",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 24,
    "prompt_tokens": 436,
    "total_tokens": 460
  }
}

In [None]:
response = openai.ChatCompletion.create(
    model=MODEL_NAME_CHAT_GPT,
    messages=[
        {"role": "system", "content": "You are an assistant who will help me answer questions."},
        {"role": "user", "content": prompt_teste},
    ],
    temperature=0
)

APIConnectionError: Error communicating with OpenAI: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))

As you can see, the response object has a few fields:
- `id`: the ID of the request
- `object`: the type of object returned (e.g., `chat.completion`)
- `created`: the timestamp of the request
- `model`: the full name of the model used to generate the response
- `usage`: the number of tokens used to generate the replies, counting prompt, completion, and total
- `choices`: a list of completion objects (only one, unless you set `n` greater than 1)
    - `message`: the message object generated by the model, with `role` and `content`
    - `finish_reason`: the reason the model stopped generating text (either `stop`, or `length` if `max_tokens` limit was reached)
    - `index`: the index of the completion in the list of choices

Extract just the reply with:

In [None]:
response['choices'][0]['message']['content']

'What is the suggested mechanism of action for the extracellular effects of thioredoxin-1 (Trx1)?'

Even non-conversation-based tasks can fit into the chat format, by placing the instruction in the first user message.

For example, to ask the model to explain asynchronous programming in the style of the pirate Blackbeard, we can structure conversation as follows:

### Tips for instructing gpt-3.5-turbo-0301

Best practices for instructing models may change from model version to model version. The advice that follows applies to `gpt-3.5-turbo-0301` and may not apply to future models.

#### System messages

The system message can be used to prime the assistant with different personalities or behaviors.

However, the model does not generally pay as much attention to the system message, and therefore we recommend placing important instructions in the user message instead.

An example of a system message that primes the assistant to explain concepts in great depth

response = openai.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a friendly and helpful teaching assistant. You explain concepts in great depth using simple terms, and you give examples to help people learn. At the end of each explanation, you ask a question to check for understanding"},
        {"role": "user", "content": "Can you explain how fractions work?"},
    ],
    temperature=0,
)

print(response["choices"][0]["message"]["content"])


### Counting tokens OpenAI Models

Mais detalhes em [OpenAI: How_to_count_tokens_with_tiktoken.ipynb](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb)

When you submit your request, the API transforms the messages into a sequence of tokens.

The number of tokens used affects:
- the cost of the request
- the time it takes to generate the response
- when the reply gets cut off from hitting the maximum token limit (4096 for `gpt-3.5-turbo`)

As of Mar 01, 2023, you can use the following function to count the number of tokens that a list of messages will use.

In [None]:
print(f'{response["usage"]["prompt_tokens"]} prompt tokens used.')

436 prompt tokens used.


In [None]:
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [None]:
encoding.encode("tiktoken is great!")

[83, 1609, 5963, 374, 2294, 0]

In [None]:
[encoding.decode_single_token_bytes(token) for token in [83, 1609, 5963, 374, 2294, 0]]

[b't', b'ik', b'token', b' is', b' great', b'!']

In [None]:
def num_tokens_from_string(string: str, model_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.encoding_for_model(model_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

In [None]:
num_tokens_from_string("tiktoken is great!", "gpt-3.5-turbo")

6

## LLAMA 

[Colab demo da API do LLAMA](https://colab.research.google.com/drive/1zZ-ch29LTicNPA62t2MaOwMROywnqUxf?usp=sharing) (obrigado, Thales Rogério)

In [None]:
base_url="http://143.106.167.108/api"

In [None]:
data={
	"prompt":"""Given table, specify which rows have repeated values for both "Item number" and "Local". If no row is repeated say "no repeats".

Example 1:
|Row | Item number | Local |
|1 |  3 5 7 | New York |
|2|  5 8 2 | Madagascar |
|3|  3 4 5 | New York |
|4|  3 4 5 | Paris |

Explanation: Rows 1 and 3 have the same local "New York" and the same item number "3 4 5". Therefore they are repeated.

Answer: (1,3).

Example 2:
|Row | Item number | Local |
|1 |  0 9 2 4 | Amsterdam |
|2|  9 4 2 4 | Barcelona |
|3|  7 3 2 | London |
|4|  7 3 1 | London |
|5|  7 3 2 | London |
|6|  7 3 2 | London |
|7|  7 3 2 | London |
|8|  7 3  2 |  New York |
|9 |  0 9 2 4 | Amsterdam |

Explanation:""",

"temperature": 0.0,
"top_p": 1,
"max_length": 250
}

r=requests.post(f"{base_url}/complete", json=data)

ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

In [None]:
if r.ok:
  response=r.json()
  print(response)

{'prompt': 'Given table, specify which rows have repeated values for both "Item number" and "Local". If no row is repeated say "no repeats".\n\nExample 1:\n|Row | Item number | Local |\n|1 |  3 5 7 | New York |\n|2|  5 8 2 | Madagascar |\n|3|  3 4 5 | New York |\n|4|  3 4 5 | Paris |\n\nExplanation: Rows 1 and 3 have the same local "New York" and the same item number "3 4 5". Therefore they are repeated.\n\nAnswer: (1,3).\n\nExample 2:\n|Row | Item number | Local |\n|1 |  0 9 2 4 | Amsterdam |\n|2|  9 4 2 4 | Barcelona |\n|3|  7 3 2 | London |\n|4|  7 3 1 | London |\n|5|  7 3 2 | London |\n|6|  7 3 2 | London |\n|7|  7 3 2 | London |\n|8|  7 3  2 |  New York |\n|9 |  0 9 2 4 | Amsterdam |\n\nExplanation:', 'temperature': 0.0, 'top_p': 1.0, 'max_length': 250, 'stopping_tokens': [], 'request_uuid': '403647f2-fd6d-40e6-a21c-098ea4870703'}


In [None]:
response

{'prompt': 'Given table, specify which rows have repeated values for both "Item number" and "Local". If no row is repeated say "no repeats".\n\nExample 1:\n|Row | Item number | Local |\n|1 |  3 5 7 | New York |\n|2|  5 8 2 | Madagascar |\n|3|  3 4 5 | New York |\n|4|  3 4 5 | Paris |\n\nExplanation: Rows 1 and 3 have the same local "New York" and the same item number "3 4 5". Therefore they are repeated.\n\nAnswer: (1,3).\n\nExample 2:\n|Row | Item number | Local |\n|1 |  0 9 2 4 | Amsterdam |\n|2|  9 4 2 4 | Barcelona |\n|3|  7 3 2 | London |\n|4|  7 3 1 | London |\n|5|  7 3 2 | London |\n|6|  7 3 2 | London |\n|7|  7 3 2 | London |\n|8|  7 3  2 |  New York |\n|9 |  0 9 2 4 | Amsterdam |\n\nExplanation:',
 'temperature': 0.0,
 'top_p': 1.0,
 'max_length': 250,
 'stopping_tokens': [],
 'request_uuid': '403647f2-fd6d-40e6-a21c-098ea4870703'}

We will use the request_uuid to check if the completion job is done

In [None]:
request_uuid=response["request_uuid"]

In [None]:
import time

In [None]:
ready = False
while not ready:
    r = requests.get(f"{base_url}/get_result/{request_uuid}")
    response = r.json()
    ready = response['ready']
    if ready:
        print(response['generated_text'])
        break
    # Wait 10 seconds before checking again
    print(f"Aguardando 10 segundos")
    time.sleep(10)


 Rows 3-7 all have the same local "London", but their item numbers differ. Therefore there are no repeats in this example.


when consulting the result you may find 3 scenarios
- Your job did not run yet, you should try again in a couple of seconds (Ready=False, message=None)
- Your job did run and everything worked (Ready=True, message=your response)
- Your job did run but it failed (Ready=True, message=None)

### Rate limiting

We may adjust this during the week, but due to computational constrains we will apply a rate limit of about 2 requests per 5 seconds. If you exceed this limit you will receive an error 429. You should adjust your code accordingly.

Please remember that the whole class is using a shared resource, so avoid excessive requests even if they are under the rate limit.

If you encounter any errors or problems, let us know in the classroom.

In [None]:
for i in range(30):
  r=requests.get(f"{base_url}/get_result/{request_uuid}")
  print(i, "->", r.status_code)

0 -> 200
1 -> 200
2 -> 200
3 -> 429
4 -> 429
5 -> 429
6 -> 429
7 -> 429
8 -> 429
9 -> 429
10 -> 429
11 -> 429
12 -> 200
13 -> 429
14 -> 429
15 -> 429
16 -> 429
17 -> 429
18 -> 429
19 -> 429
20 -> 429
21 -> 429
22 -> 429
23 -> 200
24 -> 429
25 -> 429
26 -> 429
27 -> 429
28 -> 429
29 -> 429


In [None]:
# tokenizer = transformers.AutoTokenizer.from_pretrained("decapoda-research/llama-7b-hf-int4")
# llama-7b-hf, llama-65b-hf, llama-smallint-pt, llama-13b-hf-int4, llama-7b-hf-int4

# Rascunho

Alternativa a chamar pipe

In [None]:
# configurando os parâmetros
model_inputs = generator.tokenizer(texto_teste, return_tensors='pt')

model_kwargs = {
    'max_length': 550, # prompt_len: 402 + self.num_max_answer_length,
    'num_return_sequences': generator.num_top_k,
    'num_beams': generator.num_top_k,
    'do_sample': generator.if_do_sample,
    'temperature': generator.val_temperature,
    'length_penalty': generator.val_length_penalty,
    'bad_words_ids': generator.bad_words_ids,
    'stopping_criteria':StoppingCriteriaList([generator.stop_criteria])

}


In [None]:
# gerando as sequências de texto
output = generator.model.generate(**model_inputs.to(generator.device), **model_kwargs, return_dict=True)


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
output.shape

torch.Size([1, 505])

In [None]:
# convertendo para texto e calculando as probabilidades
texts = generator.tokenizer.batch_decode(output, skip_special_tokens=True)


In [None]:
texts

['Instruction: Based on the text, generate just one question succinctly, answered by the text, avoiding repeating words. See examples below:\n\nText: COVID-19 has caused unprecedented societal turmoil, triggering a rapid, still ongoing, transformation of healthcare provision on a global level. In this new landscape, it is highly important to acknowledge the challenges this pandemic poses on the care of the particularly vulnerable cancer patients and the subsequent psychosocial impact on them. We have reviewed the emerging literature around barriers to care of oncology patients and how this crisis affects them. Moreover, evolving treatment strategies and novel ways of addressing the needs of oncology patients in the new context of the pandemic are discussed.\n\nQuestion: What is the importance of acknowledging the challenges posed by the COVID-19 pandemic on vulnerable cancer patients?\n\nText: In a retrospective study, we investigated the incidence of coronavirus infection in children 

In [None]:
probabilities = generator.pipe.model(**model_inputs.to(generator.device), **model_kwargs, return_dict=True)['probs']


TypeError: forward() got an unexpected keyword argument 'max_length'

Tentando rodar com multiprocessamento

In [None]:
def process_docto(docto, generator):
    try:
        if len(docto['text']) > 20:
            if len(docto['title']) >= 5 :
                texto = docto['title'] + '. ' + docto['text'] 
            else:
                texto = docto['text'] 
            resultado = generator.generate_text(texto_contexto=texto)
            return (docto['_id'], 'ok', resultado[0]['generated_text'])
    except Exception as e:
        return (docto['_id'], 'erro', str(e))


In [None]:
from itertools import repeat


In [None]:

dict_queries_geradas = {}
dict_erros_doctos = {}

with mp.Pool(processes=1) as pool, tqdm(total=len(corpus)) as pbar:
    results = pool.imap_unordered(process_docto, zip(corpus[0:100], [generator]))    
    for res in results:
        if res is not None:
            if res[1] == 'ok':
                dict_queries_geradas[res[0]] = res[2]
            else:
                dict_erros_doctos[res[0]] = res[2]
                print(f"Erro ao gerar query para o documento {res[0]}: {res[2]}")
        pbar.update()


  0%|          | 0/171332 [00:00<?, ?it/s]Process SpawnPoolWorker-324:
Traceback (most recent call last):
  File "/home/borela/miniconda3/envs/treinapython39/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/borela/miniconda3/envs/treinapython39/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/borela/miniconda3/envs/treinapython39/lib/python3.9/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/home/borela/miniconda3/envs/treinapython39/lib/python3.9/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'process_docto' on <module '__main__' (built-in)>
  0%|          | 0/171332 [00:05<?, ?it/s]


KeyboardInterrupt: 