In [1]:
from transformers import BloomForCausalLM,BloomTokenizerFast
import transformers
import torch

Using pretrained bloom with 1.7b params

In [2]:
model = BloomForCausalLM.from_pretrained("bigscience/bloom-1b7")
tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom-1b7")

In [3]:
prompt = "I hope to be able to enter into this university"
result_length = 50
input = tokenizer(prompt,return_tensors="pt")


Before we send the model our prompt, we need to think about which decoding / search strategies might work best for our use case. With autoregressive transformers (trained for next token prediction) we have a number of options to search the answer space for the most “reasonable” output. This great article by Patrick von Platen (Huggingface) does an excellent job explaining the details and math behind the 3 techniques we’ll be trying, so I won’t reinvent the wheel here. I will however, give you the TL;DR version of each:

Greedy Search simply chooses the next word at each timestep t+1 that has the highest predicted probability of following the word at t. One of the main issues here is that greedy search will miss words with a high probability at t+1 if it is preceded by a word with a low probability at t.


Beam Search keeps track of the n-th (num_beams) most likely word sequences and outputs the most likely sequence. Sounds great, but this method breaks down when the output length can be highly variable — as in the case of open-ended text generation. Both greedy and beam search also produce outputs whose distribution does not align very well with the way humans might perform the same task (i.e. both are liable to produce fairly repetitive, boring text).


Sampling With Top-k + Top-p is a combination of three methods. By sampling, we mean that the next word is chosen randomly based on its conditional probability distribution (von Platen, 2020). In Top-k, we choose the k most likely words, and then redistribute the probability mass amongst them before the next draw. Top-p adds an additional constraint to top-k, in that we’re choosing from the smallest set of words whose cumulative probability exceed p.

Testing of commit

In [4]:
#greedy search
res = model.generate(input["input_ids"],
                     max_length=result_length)[0]

print(tokenizer.decode(res))

I hope to be able to enter into this university in the future. I am very grateful to the university for the opportunity to study here. I am very grateful to the university for the opportunity to study here. I am very grateful to the university for


In [1]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [18]:
#beam search
res = model.generate(input["input_ids"],
                     max_length=result_length,
                     num_beams=2,
                     no_repeat_ngram_size=2,
                     early_stopping=True)[0]
print(tokenizer.decode(res))

I hope to be able to enter into this university in the near future. I am looking forward to learning more about the university and its programs.”
“I am very excited to attend the University of Illinois at Urbana-Champaign. The university is a


In [19]:
#sampling top-k + top-p
res = model.generate(input["input_ids"],
                     max_length=result_length,
                     do_sample=True,
                     top_k = 50,
                     top_p = 0.9)[0]

print(tokenizer.decode(res))

I hope to be able to enter into this university, please help me. Do you know adversário para enfrentar nesta semifinal?
J.C.: Eu espero vencer. Vamos ser bem melhor no segundo tempo, porque é melhor. O jogo está bom,
