# Sentence Autocompletion Model
https://huggingface.co/docs/transformers/notebooks 
https://github.com/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb 

In [33]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# add the EOS token as PAD token to avoid warnings
model = GPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

In [39]:
# encode context the generation is conditioned on
start = input("enter start of sentence: ")
input_ids = tokenizer.encode(start, return_tensors='pt')

# generate text until the output length (which includes the context length) reaches 50
greedy_output = model.generate(input_ids, max_length=50)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

enter start of sentence:  my experience with software engineering


Output:
----------------------------------------------------------------------------------------------------
my experience with software engineering is that it's a very difficult job. It's not easy to get a job that you want to do.

I've been in the software engineering field for about 10 years. I've been in the software engineering


In [40]:
# activate beam search and early_stopping
beam_outputs = model.generate(
    input_ids,  
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=2,
    num_return_sequences=5,
    early_stopping=True
)

print("Output:\n" + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
  print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: my experience with software engineering is that it's hard to know what you're talking about when you talk about software development.

I'm not going to go into too much detail here, but I'll just say that I've had a lot of
1: my experience with software engineering is that it's hard to know what you're talking about when you talk about software development.

I'm not going to go into too much detail here, but I'll just say that I'm a software engineer.
2: my experience with software engineering is that it's hard to know what you're talking about when you talk about software development.

I'm not going to go into too much detail here, but I'll just say that I'm a software engineer,
3: my experience with software engineering is that it's hard to know what you're talking about when you talk about software development.

I'm not going to go into too much detail here, but I'll just 

In [41]:
# set seed to reproduce results. Feel free to change the seed though to get different results
torch.manual_seed(0)

# activate sampling and deactivate top_k by setting top_k sampling to 0
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=0,
    temperature=0.7
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
my experience with software engineering, when it came to the digital ish, I had the idea that the best thing to do would be to build something that can be used to make a digital ish. That's when I signed up.




In [42]:
torch.manual_seed(0)

# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
sample_outputs = model.generate(
    input_ids,
    do_sample=True, 
    max_length=50, 
    top_k=50, 
    top_p=0.95, 
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))


Output:
----------------------------------------------------------------------------------------------------
0: my experience with software engineering, I wanted to share the process with my company. Here's how you can share the process with others.

One more thing: I started using open-source software after seeing it make the rounds on Facebook and Google
1: my experience with software engineering when it comes to my work is that it's very personal. I was always a big fan of Microsoft, and had the opportunity to work with them. It's hard to take things away when you have your mind on something
2: my experience with software engineering: "The best way to learn is to build it and let it grow. Build it. You can make it better, as long as you have the necessary skills to understand how it works." (My own experience with that


In [46]:
# generate 3 options for sentence completion, stopping at the end of the sentence
start = input("enter start of sentence: ")
input_ids = tokenizer.encode(start, return_tensors='pt')

sample_outputs = model.generate(
    input_ids,
    do_sample=True,
    max_length=50,
    top_k=50,
    top_p=0.95,
    num_return_sequences=3,
    eos_token_id=tokenizer.eos_token_id
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
    # Decode the output, stopping at the first EOS token
    decoded_text = tokenizer.decode(sample_output, skip_special_tokens=True)
    first_sentence = decoded_text.split('.')[0] + '.'
    print("{}: {}".format(i, first_sentence))

enter start of sentence:  my skills in swe include


Output:
----------------------------------------------------------------------------------------------------
0: my skills in swe include the occasional little-used, little-used tool that can be used as a backstop, or as a tool that's less important to someone else but that you think is important to you.
1: my skills in swe include running and juggling.
2: my skills in swe include, an 8mm (2x) 3/4 in.
