<a href="https://colab.research.google.com/github/gstripling00/conferences/blob/main/1.11.24_04_09_adv_nlp_end.ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#TEXT GENERATION

DistilBERT, as a distilled version of BERT, is primarily designed for tasks like question answering, text classification, and named entity recognition. It may not be the most suitable model for text generation, as its architecture is optimized for understanding context and relationships in input text rather than generating new content.

However, for text generation tasks, models like GPT (Generative Pre-trained Transformer) are more commonly used. Since DistilBERT doesn't have a generative architecture, I'll provide you with an example using GPT-2 for text generation:

In [None]:
!pip install transformers

In [None]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer for text generation
model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Prompt for text generation
prompt = "Once upon a time in a land far, far away"

# Tokenize and encode the prompt
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate text using the model
output = model.generate(input_ids, max_length=100, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated Text:")
print(generated_text)


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text:
Once upon a time in a land far, far away, the world was a place of great beauty and great danger. The world of the gods was the land of darkness and darkness. And the darkness of this world, which was far from the light of day, was not the place where the sun and the moon met. It was in the midst of all the worlds, and it was there that the stars and all that were in them met, that they were all in one place.




In this example, I've used GPT-2 for text generation. You can customize the prompt variable to set the initial context for the generated text. Feel free to experiment with the parameters of the model.generate function to adjust the length, diversity, and other aspects of the generated text.

Please note that GPT-2 is more suitable for text generation tasks, whereas DistilBERT is better suited for tasks like classification, question answering, and named entity recognition. If you specifically need text generation capabilities, GPT-2 or similar models would be more appropriate.

#MULTIPLE PROMPTS

In [None]:
# NEW EXERCISE
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer for text generation
model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# List of prompts for text generation
prompts = ["Once upon a time in a land far, far away",
           "In a galaxy not so distant, there was a hero",
           "The mysterious door creaked open revealing"]

# Generate text for each prompt
for prompt in prompts:
    # Tokenize and encode the prompt
    input_ids = tokenizer.encode(prompt, return_tensors='pt')

    # Generate text using the model
    output = model.generate(input_ids, max_length=100, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

    # Decode and print the generated text
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

    print(f"Prompt: {prompt}")
    print("Generated Text:")
    print(generated_text)
    print("\n" + "="*50 + "\n")  # Separator between generated texts


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt: Once upon a time in a land far, far away
Generated Text:
Once upon a time in a land far, far away, the world was a place of great beauty and great danger. The world of the gods was the land of darkness and darkness. And the darkness of this world, which was far from the light of day, was not the place where the sun and the moon met. It was in the midst of all the worlds, and it was there that the stars and all that were in them met, that they were all in one place.






The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt: In a galaxy not so distant, there was a hero
Generated Text:
In a galaxy not so distant, there was a hero who could save the galaxy.

The hero was the hero of the Galactic Empire. He was known as the Emperor. The Emperor was an Emperor who had been born in the past, and had become a Jedi. His name was Darth Vader. Vader was born on the planet of Coruscant, where he was raised by his father, Darth Sidious. After his birth, Vader became a Sith Lord, a master of Sith sorcery.


Prompt: The mysterious door creaked open revealing
Generated Text:
The mysterious door creaked open revealing a man in a black suit and a white shirt. He was wearing a red hooded sweatshirt and black pants.

"I'm a doctor," he said. "I've been in the hospital for a year. I've never seen anything like this."
...
, a former nurse who worked at the University of California, Berkeley, and was a member of the medical staff at UC Berkeley's medical center. She was killed in an


