## Problem Statement 2 - Text Generation

- Create a basic text generation model using a pre-trained transformer

In [1]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

In [20]:
def generate_text(prompt, model_name='gpt2', max_length=150):
    # Load the tokenizer for the given model
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)
    
    # Load the pre-trained GPT-2 model, using eos_token_id as the pad token
    model = GPT2LMHeadModel.from_pretrained(model_name, pad_token_id=tokenizer.eos_token_id)

    # Encode the input prompt into token IDs
    numeric_ids = tokenizer.encode(prompt, return_tensors='pt')

    # Generate text using beam search with no repeating n-grams
    result = model.generate(
        numeric_ids,
        max_length=max_length,  # Limit the length of the generated text
        num_beams=5,            # Use beam search for better results
        no_repeat_ngram_size=2,  # Prevent repetition of n-grams
        early_stopping=True      # Stop early if all beams reach the EOS token
    )
    
    # Decode the generated token IDs back into text
    generated_text = tokenizer.decode(result[0], skip_special_tokens=True)
    
    return generated_text  # Return the generated text

In [21]:
prompt = "Once upon a time," # Define a prompt for text generation
generated_text = generate_text(prompt)
print(generated_text)

Once upon a time, there was no such thing as a man who was not a member of the Church of Jesus Christ of Latter-day Saints.

The Church was founded by Joseph Smith, the Prophet of Mormonism, and his wife, Lucy Mack Smith. They were the first women to be ordained to the priesthood. The Church's first president, Joseph Fielding Smith Jr., was born in Nauvoo, Illinois, in 1844. He was the youngest of five children. His father was a Methodist minister and the other two were members of his church. Joseph was raised by his mother and grandmother in a small town in Illinois. When he was three years old, his father died of a heart attack, leaving him with two sisters.
