Essa é uma implementação simples do BioGPT a partir do [hugging face](https://huggingface.co/docs/transformers/model_doc/biogpt). O funcionamento dele, para quem não conhece o Colab, é bem simples: Dê play nas duas primeiras seções para que os pacotes sejam instalados e importados e nos dois últimos para rodar a aplicação.


Para trocar a seed (a frase que irá motivar a busca e gerar uma mensagem) basta alterar a frase no trecho
- `sentence = "Influenza is a virus"` Para o modo de geração de um parágrafo:

- `generator("E. coly in metagenomics results", max_length=500, num_return_sequences=10, do_sample=True)` para geração de frases individuais.



Ele não utiliza nenhuma API, então todo processamento dele ocorre nesse notebook, que tem uma quantidade bastante limitada de memória RAM, o que pode fazer com que ele esgote os recursos antes de demonstrar uma resposta caso sua requisição seja muito complexa.

Caso você queira aprender mais, vá nos links abaixo:

- [ArxIv do projeto](https://arxiv.org/abs/2210.10341)
- [GitHub do projeto](https://github.com/microsoft/BioGPT)

In [1]:
!pip install -q -U transformers sacremoses

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m53.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m880.6/880.6 KB[0m [31m32.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m63.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone


In [2]:
import torch
from transformers import BioGptTokenizer, BioGptForCausalLM, set_seed, pipeline

In [5]:
tokenizer = BioGptTokenizer.from_pretrained("microsoft/biogpt")
model = BioGptForCausalLM.from_pretrained("microsoft/biogpt")

sentence = "Influenza is a virus"
inputs = tokenizer(sentence, return_tensors="pt")

set_seed(42)

with torch.no_grad():
    beam_output = model.generate(**inputs,
                                min_length=100,
                                max_length=1024,
                                num_beams=5,
                                early_stopping=True
                                )
tokenizer.decode(beam_output[0], skip_special_tokens=True)

'Influenza is a virus of the family Paramyxoviridae, which causes seasonal epidemics and occasional pandemics of varying severity in humans and animals, and is a significant cause of morbidity and mortality in humans and domestic and wild animals, including birds, reptiles, amphibians, fish, and marine mammals, as well as in domestic and wild animals and humans in contact with these animals and their products (e.g., eggs, meat, milk, and honey). (1-3) Influenza A viruses (IAVs) are classified into six antigenic groups (A, B, C, D, E, and F) based on their hemagglutinin (HA) proteins.'

In [None]:
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
set_seed(42)
generator("E. coly in metagenomics results", max_length=500, num_return_sequences=10, do_sample=True)

[{'generated_text': 'E. coly in metagenomics results means that the functional composition of a microbial community is significantly modified than in metagenomic libraries, and this has great practical implications.'},
 {'generated_text': 'E. coly in metagenomics results means more opportunities for microbial engineering, but also more challenges.'},
 {'generated_text': 'E. coly in metagenomics results means it is an interesting application of the theory of ecological systems.'},
 {'generated_text': 'E. coly in metagenomics results means to have a greater access to basic research.'},
 {'generated_text': 'E. coly in metagenomics results means it is difficult to assess whether such results are trustworthy and do not change the course of the research.'},
 {'generated_text': 'E. coly in metagenomics results means that the whole-sample pipeline will be improved, but the methods and applications for metagenomics are still not adequate.'},
 {'generated_text': 'E. coly in metagenomics results 