<a href="https://colab.research.google.com/github/suyash1574/GEN-AI-Workshop/blob/main/src/day1/notebooks/03_text_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Day 1: Text Pipeline - Your First Language Model

Welcome to hands-on text processing! Now that you understand neural networks, let's explore how they work with text data.

## 🎯 Learning Objectives
By the end of this notebook, you will:
- Understand how text becomes numbers (tokenization)
- Load and use a pre-trained language model
- Experiment with text generation parameters
- Compare different prompt engineering techniques
- Build your first text generation pipeline

## 📚 Research Focus
This notebook emphasizes **discovery learning**. You'll:
1. Research concepts before implementing
2. Experiment with parameters to see their effects
3. Compare different approaches
4. Build understanding through hands-on exploration

---

## 1. From Text to Numbers

Neural networks work with numbers, but we have text. How do we bridge this gap?

🔍 **RESEARCH TASK 1**:
- What is tokenization in NLP?
- What is the difference between word-level and sub-word tokenization?
- Research "BPE" (Byte Pair Encoding) - how does it work?
- Why can't we just assign each word a number?

In [15]:
# Import required libraries
from transformers import GPT2LMHeadModel, GPT2Tokenizer, pipeline
import torch
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from collections import Counter
import seaborn as sns

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("✅ Libraries imported successfully!")

✅ Libraries imported successfully!


### Exploring Tokenization

🔍 **RESEARCH TASK 2**:
- Look up the GPT-2 tokenizer documentation
- What is a "vocabulary size"?
- What happens when the model encounters a word it's never seen?

In [16]:
# TODO: Load the GPT-2 tokenizer
# Hint: Use GPT2Tokenizer.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Test sentences to explore tokenization
test_sentences = [
    "Hello world!",
    "The quick brown fox jumps over the lazy dog.",
    "Artificial intelligence is revolutionizing technology.",
    "GPT-2 uses transformer architecture.",
    "Supercalifragilisticexpialidocious"  # Long word to see sub-word tokenization
]

print("🔍 Exploring Tokenization:")
print("=" * 50)

for sentence in test_sentences:
    # TODO: Tokenize the sentence
    # Hint: Use tokenizer.encode() to get token IDs
    # Use tokenizer.tokenize() to see the actual tokens
    tokens = tokenizer.tokenize(sentence)  # Get the actual token strings
    token_ids = tokenizer.encode(sentence)  # Get the numerical IDs

    print(f"\nOriginal: {sentence}")
    print(f"Tokens: {tokens}")
    print(f"Token IDs: {token_ids}")
    print(f"Number of tokens: {len(tokens)}")

# TODO: Print tokenizer vocabulary size
print(f"\n📊 Tokenizer vocabulary size: {len(tokenizer)}")  # Hint: len(tokenizer)

🔍 Exploring Tokenization:

Original: Hello world!
Tokens: ['Hello', 'Ġworld', '!']
Token IDs: [15496, 995, 0]
Number of tokens: 3

Original: The quick brown fox jumps over the lazy dog.
Tokens: ['The', 'Ġquick', 'Ġbrown', 'Ġfox', 'Ġjumps', 'Ġover', 'Ġthe', 'Ġlazy', 'Ġdog', '.']
Token IDs: [464, 2068, 7586, 21831, 18045, 625, 262, 16931, 3290, 13]
Number of tokens: 10

Original: Artificial intelligence is revolutionizing technology.
Tokens: ['Art', 'ificial', 'Ġintelligence', 'Ġis', 'Ġrevolution', 'izing', 'Ġtechnology', '.']
Token IDs: [8001, 9542, 4430, 318, 5854, 2890, 3037, 13]
Number of tokens: 8

Original: GPT-2 uses transformer architecture.
Tokens: ['G', 'PT', '-', '2', 'Ġuses', 'Ġtransformer', 'Ġarchitecture', '.']
Token IDs: [38, 11571, 12, 17, 3544, 47385, 10959, 13]
Number of tokens: 8

Original: Supercalifragilisticexpialidocious
Tokens: ['Super', 'cal', 'if', 'rag', 'il', 'ist', 'ice', 'xp', 'ial', 'id', 'ocious']
Token IDs: [12442, 9948, 361, 22562, 346, 396, 501, 42372, 

### Understanding Token Patterns

🔍 **RESEARCH TASK 3**:
- Why do some words get split into multiple tokens?
- What does the 'Ġ' symbol represent in GPT-2 tokens?
- How might tokenization affect model performance?

In [17]:
# Analyze tokenization patterns
analysis_texts = [
    "running",
    "runner",
    "run",
    "unhappiness",
    "ChatGPT",
    "COVID-19",
    "2023",
    "programming",
    "antidisestablishmentarianism"
]

print("🔍 Token Pattern Analysis:")
print("=" * 60)

token_analysis = []

for text in analysis_texts:
    # TODO: Analyze each text
    tokens = tokenizer.tokenize(text)  # Tokenize the text
    token_ids = tokenizer.encode(text)  # Get token IDs
    token_count = len(token_ids) # Count the tokens


    token_analysis.append({
        'text': text,
        'tokens': tokens,
        'token_count': token_count,
        'chars_per_token': len(text) / token_count if token_count > 0 else 0 # Avoid division by zero
    })

    print(f"{text:30} → {tokens} ({token_count} tokens)")

# TODO: Create a DataFrame and analyze patterns
df = pd.DataFrame(token_analysis)
print(f"\n📊 Average characters per token: {df['chars_per_token'].mean():.2f}")  # Calculate mean
print(f"📊 Longest word in tokens: {df.loc[df['token_count'].idxmax()]['text']}")  # Find max token_count

🔍 Token Pattern Analysis:
running                        → ['running'] (1 tokens)
runner                         → ['runner'] (1 tokens)
run                            → ['run'] (1 tokens)
unhappiness                    → ['un', 'h', 'appiness'] (3 tokens)
ChatGPT                        → ['Chat', 'G', 'PT'] (3 tokens)
COVID-19                       → ['CO', 'VID', '-', '19'] (4 tokens)
2023                           → ['20', '23'] (2 tokens)
programming                    → ['program', 'ming'] (2 tokens)
antidisestablishmentarianism   → ['ant', 'idis', 'establishment', 'arian', 'ism'] (5 tokens)

📊 Average characters per token: 4.12
📊 Longest word in tokens: antidisestablishmentarianism


## 2. Loading Your First Language Model

Now let's load GPT-2 and understand its architecture.

🔍 **RESEARCH TASK 4**:
- What is GPT-2 and when was it released?
- How many parameters does GPT-2 have? (Compare different sizes)
- What is "autoregressive" text generation?
- How does GPT-2 relate to the neural network you built in the previous notebook?

In [18]:
# TODO: Load GPT-2 model
# Hint: Use GPT2LMHeadModel.from_pretrained('gpt2')
print("🔄 Loading GPT-2 model (this may take a moment)...")
model = GPT2LMHeadModel.from_pretrained('gpt2')

# TODO: Set model to evaluation mode
# Hint: Use model.eval()
model.eval()

print("✅ GPT-2 model loaded successfully!")

# Explore model architecture
print("\n🏗️ Model Architecture:")
print(f"Model type: {type(model).__name__}")

# TODO: Count model parameters
# Hint: sum(p.numel() for p in model.parameters())
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params:,}")
print(f"Model size: ~{total_params / 1e6:.1f}M parameters")

🔄 Loading GPT-2 model (this may take a moment)...
✅ GPT-2 model loaded successfully!

🏗️ Model Architecture:
Model type: GPT2LMHeadModel
Total parameters: 124,439,808
Model size: ~124.4M parameters


### Understanding Model Architecture

🔍 **RESEARCH TASK 5**:
- What are "transformer blocks" in GPT-2?
- What is "attention" in the context of neural networks?
- How does this compare to the simple network you built earlier?

In [19]:
# Explore model structure
print("🔍 Model Structure Analysis:")
print("=" * 50)

# TODO: Print model configuration
# Hint: Use model.config
config = model.config

print(f"Vocabulary size: {config.vocab_size}")
print(f"Maximum sequence length: {config.n_positions}")
print(f"Number of transformer layers: {config.n_layer}")
print(f"Number of attention heads: {config.n_head}")
print(f"Hidden size: {config.n_embd}")

# Compare to your simple network
print("\n🤔 Comparison to Your Neural Network:")
print(f"Your network had: 2 inputs → 4 hidden → 1 output")
print(f"GPT-2 has: {config.vocab_size} inputs → {config.n_embd} hidden → {config.vocab_size} outputs")
print(f"Your network: ~50 parameters")
print(f"GPT-2: {total_params:,} parameters")
print(f"GPT-2 is ~{total_params/50:,.0f}x larger!")

🔍 Model Structure Analysis:
Vocabulary size: 50257
Maximum sequence length: 1024
Number of transformer layers: 12
Number of attention heads: 12
Hidden size: 768

🤔 Comparison to Your Neural Network:
Your network had: 2 inputs → 4 hidden → 1 output
GPT-2 has: 50257 inputs → 768 hidden → 50257 outputs
Your network: ~50 parameters
GPT-2: 124,439,808 parameters
GPT-2 is ~2,488,796x larger!


## 3. Text Generation Experiments

Let's generate text and understand how different parameters affect the output.

🔍 **RESEARCH TASK 6**:
- What is "temperature" in text generation?
- What is "top-p" (nucleus) sampling?
- What's the difference between greedy decoding and sampling?
- How do these parameters affect creativity vs. coherence?

In [20]:
# TODO: Create a text generation pipeline
# Hint: Use pipeline('text-generation', model=model, tokenizer=tokenizer)
generator =pipeline('text-generation' , model=model , tokenizer=tokenizer)

# Base prompt for experiments
base_prompt = "In the future, artificial intelligence will"

print(f"🤖 Base prompt: '{base_prompt}'")
print("=" * 60)

Device set to use cpu


🤖 Base prompt: 'In the future, artificial intelligence will'


### Temperature Experiments

🔍 **RESEARCH TASK 7**:
- What happens when temperature = 0?
- What happens when temperature > 1?
- Why might you want different temperatures for different tasks?

In [21]:
# Experiment with different temperatures
temperatures = [0.1, 0.7, 1.0, 1.5]

print("🌡️ Temperature Experiments:")
print("=" * 50)

for temp in temperatures:
    print(f"\n🔥 Temperature: {temp}")
    print("-" * 30)

    # TODO: Generate text with different temperatures
    # Hint: Use generator() with temperature parameter
    result = generator(
        base_prompt,  # prompt
        max_length=60,  # try 60
        temperature=temp,  # use the temp variable
        do_sample=True,  # should be True for sampling
        pad_token_id=tokenizer.eos_token_id
    )

    # TODO: Print the generated text
    generated_text = result[0]['generated_text']  # Extract from result
    print(generated_text)

print("\n🤔 Discussion Questions:")
print("• Which temperature produced the most coherent text?")
print("• Which was most creative/surprising?")
print("• When might you use each temperature setting?")

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🌡️ Temperature Experiments:

🔥 Temperature: 0.1
------------------------------


Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In the future, artificial intelligence will be able to do things like search for information about people, and to do things like search for information about people.

The future of AI is going to be a lot more complex than we've ever imagined.

The future of AI is going to be a lot more complex than we've ever imagined.

The future of AI is going to be a lot more complex than we've ever imagined.

The future of AI is going to be a lot more complex than we've ever imagined.

The future of AI is going to be a lot more complex than we've ever imagined.

The future of AI is going to be a lot more complex than we've ever imagined.

The future of AI is going to be a lot more complex than we've ever imagined.

The future of AI is going to be a lot more complex than we've ever imagined.

The future of AI is going to be a lot more complex than we've ever imagined.

The future of AI is going to be a lot more complex than we've ever imagined.

The future of AI is going to be a lot more complex th

Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In the future, artificial intelligence will do a lot more than just figure out the right way to handle the problem of what needs to be solved. That's why we need to invest in artificial intelligence so that we can find the best ways to solve the problem of what needs to be solved.

How will AI work with human intelligence?

The AI will play a critical role in this. The AI will be able to do the work needed to solve a problem. The AI will help with the processes that are needed to do the work needed to solve a problem.

The AI will be able to solve a problem in a way that is not constrained by human limitations. It will be able to solve a problem in a way that is not constrained by human limitations. It will be able to solve a problem that is not constrained by human limits. It will be able to solve a problem that is not constrained by human limitations. It will be able to solve a problem that is not constrained by human limits.

So AI will be able to solve problems that are not constra

Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In the future, artificial intelligence will grow to be powerful, but also scalable to every sector of the military and government.

The U.S. believes that the world will soon come together to make sure the next generation of computers does indeed be ready today so that it can do everything necessary to become better at tasks such as intelligence, transportation, financial, communications technologies and energy.

But for now, the world will be left with artificial intelligence. What kind of information can you use to do good things today, or to do better tomorrow?

In November 2010, the U.S. General Services Administration asked the Defense Energy Research and Development Agency to ask for a $3.16 billion contract for a new U.S. military industrial base in the Indian Ocean.

It has not been officially approved. But the U.S. government has long been known for getting things done with computers. The government even allowed the U.S. Navy to put computers in naval vessels. Those efforts ha

### Top-p (Nucleus) Sampling Experiments

🔍 **RESEARCH TASK 8**:
- How does top-p sampling work?
- What's the difference between top-k and top-p sampling?
- Why might top-p be better than just using temperature?

In [22]:
# Experiment with top-p sampling
top_p_values = [0.3, 0.7, 0.9, 1.0]

print("🎯 Top-p Sampling Experiments:")
print("=" * 50)

for top_p in top_p_values:
    print(f"\n🎲 Top-p: {top_p}")
    print("-" * 30)

    # TODO: Generate text with different top-p values
    result = generator(
        base_prompt,
        max_length=60,
        temperature=0.8,  # Keep temperature constant
        top_p=top_p,  # Use the top_p variable
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

    generated_text = result[0]['generated_text']
    print(generated_text)

print("\n🤔 Discussion Questions:")
print("• How did the outputs change with different top-p values?")
print("• What's the trade-off between diversity and quality?")

Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🎯 Top-p Sampling Experiments:

🎲 Top-p: 0.3
------------------------------


Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In the future, artificial intelligence will be able to help us better understand our own lives, and to help us better understand others.

The future of AI is a very exciting one. It is a new era in which we are seeing the emergence of new technologies that will change the way we think, act, and think.

The future of AI is a very exciting one. It is a new era in which we are seeing the emergence of new technologies that will change the way we think, act, and think.

We are in a new era of technology that will change the way we think, act, and think.

We are in a new era of technology that will change the way we think, act, and think.

We are in a new era of technology that will change the way we think, act, and think.

We are in a new era of technology that will change the way we think, act, and think.

We are in a new era of technology that will change the way we think, act, and think.

We are in a new era of technology that will change the way we think, act, and think.

We are in a ne

Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In the future, artificial intelligence will be able to be used to solve problems of complexity, intelligence, and intelligence-based solutions.

The "Cognitive Science of Artificial Intelligence" (CSI) is a research and development project that aims to develop the ability to design, build, and deploy cognitive systems. CSI will be led by the AI Lab at the University of Toronto. The project is being led by the AI Lab at the University of Toronto.

The CSI will focus on understanding how the human brain works, how it operates, and how it interacts with the environment. It will be presented at the IEEE International Conference on Artificial Intelligence (ICAI) in Barcelona, Spain, on June 15-16, 2017.

The CSI will be presented at the IEEE International Conference on Artificial Intelligence (ICAI) in Barcelona, Spain, on June 15-16, 2017.

The CSI will be presented at the IEEE International Conference on Artificial Intelligence (ICAI) in Barcelona, Spain, on June 15-16, 2017.

The CSI wil

Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In the future, artificial intelligence will be able to work with many of the same features as humans, including spatial navigation and the ability to control computers.

The team has also explored the possibility of using AI for scientific research, which will include the creation of intelligent robots and robots that would be capable of performing tasks that humans could not.

"We are also interested in using robots to help people understand their environment, to help them develop new skills, to create new products, and to find new solutions to our problems," said Paul Shultz, head of AI at Google. "This is one of the most exciting opportunities for humans to explore."

In the meantime, the team plans to continue to develop its AI projects. In the future, they will be able to use robots to perform tasks that humans could not.

The team has also developed AI software that would be used to identify and solve problems using real-world information, such as data on the weather, traffic pat

## 4. Prompt Engineering Experiments

The way you phrase your prompt dramatically affects the output.

🔍 **RESEARCH TASK 9**:
- What is "prompt engineering"?
- What are "few-shot" prompts?
- How can prompt structure influence model behavior?
- Research common prompt engineering techniques

In [24]:
# Different prompt styles to experiment with
prompts_to_test = {
    "Direct": "Write about artificial intelligence:",
    "Question": "What is artificial intelligence and how will it change the world?",
    "Story_Start": "Once upon a time, in a world where artificial intelligence was everywhere,",
    "List_Format": "Here are 5 ways artificial intelligence will change our lives:\n1.",
    "Expert_Persona": "As a leading AI researcher, I believe that artificial intelligence will",
    "Few_Shot": "Technology predictions:\n• The internet will connect everyone (1990s)\n• Smartphones will be everywhere (2000s)\n• Artificial intelligence will"
}

print("✍️ Prompt Engineering Experiments:")
print("=" * 60)

# TODO: Test each prompt style
for style, prompt in prompts_to_test.items():
    print(f"\n📝 Style: {style}")
    print(f"Prompt: '{prompt}'")
    print("-" * 40)

    # TODO: Generate text for each prompt
    result = generator(
        prompt,  # use the prompt variable
        max_length=80,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

    generated_text = result[0]['generated_text']
    print(generated_text)
    print("\n" + "="*60)

Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


✍️ Prompt Engineering Experiments:

📝 Style: Direct
Prompt: 'Write about artificial intelligence:'
----------------------------------------


Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Write about artificial intelligence:

AI can be used to design and build robots, make cars, build submarines, create smart cars, and even build robots that can do everything from search engines to artificial intelligence.

AI is not just a means to an end, it's also a means to a goal. AI can do things that humans cannot, like help us understand and understand the world.

A lot of people in the world are thinking about how AI could help us understand the world, but it's not that simple. It's really about how AI can help us learn and use our brains.

It's not just about the computer that's working in our brains. It's also about the computer that's controlling our actions. It's about the computer that's creating a system that's intelligent and not just an artificial intelligence.

In the past, we've thought about what AI would be like if we had the power to create a computer that could do anything, from building a car to finding a missing person.

But now that we have the power to do that

Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


What is artificial intelligence and how will it change the world? What is the future of artificial intelligence?

The Future of Artificial Intelligence

I have spent my entire life trying to understand the future of artificial intelligence. For many years I've been trying to understand what artificial intelligence is and how it can be improved. What is it that makes us human and what are its implications? I've seen this in the history of artificial intelligence. In the past, many people have argued that artificial intelligence is the future of human knowledge. I see this as a natural extension of the human knowledge.

What are the implications of artificial intelligence?

I think that we're going to see an explosion in the amount of information we can learn and think about. It will also lead to the rise of a new technology that will help us to understand what our ancestors did in the past and what we can do in the future.

It will also help us to understand the way the world works. It 

Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Once upon a time, in a world where artificial intelligence was everywhere, there was no more reason to think that there was some kind of intelligent race. There was no reason to think that there were no more sentient beings. And that's what we're seeing here.

The question is, if there is no such thing as a sentient race, then what are we to make of this?

What's the answer?

Why?

I think the answer is that we've got to be careful. We've got to be careful of the facts. The fact that there is a group of people who are saying, "We need to stop," is the problem. It's not a problem of people being right. It's a problem of the fact that there are some people who are saying, "We're wrong about the reality that we have, and we should stop saying that," and we're not going to stop. That's the problem.

The problem is that there is no way of knowing whether there is a true or false reality. And there's no way to know whether there is a reality of any kind. And we've got to figure out if we're 

Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Here are 5 ways artificial intelligence will change our lives:
1. Artificial intelligence will create a whole new world of possibilities

In the future, artificial intelligence will be able to provide a new level of knowledge to the human mind. This new information will be generated by a combination of the two things that we know about our own brain and the information we have about our own brain.

Imagine that a person is a robot that has just completed an experiment. They have been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a robot that has been given a

Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


As a leading AI researcher, I believe that artificial intelligence will take over the job of the next five years.

But the future of AI will be less bright. We already know that AI will not be able to solve complex problems. That's why I believe the next five years will be a turning point.

In my opinion, AI will be able to solve the world's most complex problems. That's why I believe we will see the emergence of a new class of artificial intelligence, the'superintelligence'.

We will see a new class of AI that will be able to solve the world's most complex problems. That's why I believe we will see the emergence of a new class of AI that will be able to solve the world's most complex problems.

In my opinion, AI will be able to solve the world's most complex problems. That's why I believe we will see the emergence of a new class of AI that will be able to solve the world's most complex problems.

In my opinion, AI will be able to solve the world's most complex problems. That's why I b

### Analyzing Prompt Effectiveness

🔍 **RESEARCH TASK 10**:
- Which prompt style produced the most useful output?
- How did the model's "behavior" change with different prompts?
- What makes a good prompt?
- How might this apply to chatbots or AI assistants?

In [25]:
# Let's analyze the generated text more systematically
print("📊 Prompt Analysis Exercise:")
print("=" * 50)

# TODO: For each prompt style, generate multiple outputs and analyze
analysis_results = []

for style, prompt in list(prompts_to_test.items())[:3]:  # Test first 3 for time
    # Generate 3 outputs for each prompt
    outputs = []

    for i in range(3):
        # TODO: Generate text
        result = generator(
            prompt,
            max_length=60,
            temperature=0.8,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

        output = result[0]['generated_text']
        outputs.append(output)

    # TODO: Analyze the outputs
    lengths = [len(output) for output in outputs]
    avg_length = sum(lengths) / len(lengths) # Calculate average length of outputs

    analysis_results.append({
        'style': style,
        'prompt': prompt,
        'avg_length': avg_length,
        'outputs': outputs
    })

    print(f"\n{style}:")
    print(f"  Average length: {avg_length:.1f} characters")
    print(f"  Sample output: {outputs[0][:100]}...")

print("\n🤔 Reflection Questions:")
print("• Which prompt style was most consistent?")
print("• Which produced the most relevant outputs?")
print("• How might you improve these prompts?")

Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


📊 Prompt Analysis Exercise:


Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



Direct:
  Average length: 836.7 characters
  Sample output: Write about artificial intelligence: How it works, it gets done, it's cool

Google is working on a m...


Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



Question:
  Average length: 1057.3 characters
  Sample output: What is artificial intelligence and how will it change the world?

The term "intelligent" means how ...


Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



Story_Start:
  Average length: 1303.3 characters
  Sample output: Once upon a time, in a world where artificial intelligence was everywhere, one could go to the libra...

🤔 Reflection Questions:
• Which prompt style was most consistent?
• Which produced the most relevant outputs?
• How might you improve these prompts?


## 5. Building Your Text Generation Pipeline

Now let's create a customizable text generation function.

🔍 **RESEARCH TASK 11**:
- What parameters should a good text generation function have?
- How can you make text generation more controllable?
- What are the trade-offs between different generation strategies?

In [26]:
def custom_text_generator(prompt, style="balanced", length="medium"):
    """
    TODO: Create a customizable text generation function

    Args:
        prompt (str): The input prompt
        style (str): "creative", "balanced", or "conservative"
        length (str): "short", "medium", or "long"

    Returns:
        str: Generated text
    """

    # TODO: Set parameters based on style
    if style == "creative":
        temperature = 1.0  # Higher for creativity
        top_p = 0.9        # Higher for diversity
    elif style == "conservative":
        temperature = 0.5  # Lower for consistency
        top_p = 0.6        # Lower for focus
    else:  # balanced
        temperature = 0.7  # Medium values
        top_p = 0.8

    # TODO: Set length based on parameter
    if length == "short":
        max_length = 40  # Try 40
    elif length == "long":
        max_length = 100  # Try 100
    else:  # medium
        max_length = 70  # Try 70

    # TODO: Generate text with the parameters
    result = generator(
        prompt,  # prompt
        max_length=max_length,
        temperature=temperature,
        top_p=top_p,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

    return result[0]['generated_text']

# Test your function
test_prompt = "The future of education will be"

print("🧪 Testing Your Text Generator:")
print("=" * 50)

# TODO: Test different combinations
test_combinations = [
    ("creative", "short"),
    ("balanced", "medium"),
    ("conservative", "long")
]

for style, length in test_combinations:
    print(f"\n📝 Style: {style}, Length: {length}")
    print("-" * 30)

    # TODO: Use your function
    output = custom_text_generator(test_prompt, style=style, length=length)
    print(output)
    print(f"Characters: {len(output)}")

Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🧪 Testing Your Text Generator:

📝 Style: creative, Length: short
------------------------------


Both `max_new_tokens` (=256) and `max_length`(=70) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


The future of education will be determined by the development of the skills of every member of the citizenry, who is not yet in the middle age, and of the youth who will be in demand as soon as their careers are over."

Andrea Saini, a professor at Columbia University who studies socialization, says he's not convinced there's an obvious connection between the increasing number of teachers and student-teacher ratios — especially as the number of students in high schools has decreased in recent years.

"Teachers, who are the main force in increasing their wages, now have their budgets split in half, and they also have their students working in their classroom or for their clients," Saini says. "That leaves us with some other issues of what is going on here."

The problem isn't simply that teachers' pay isn't growing.

"In order to keep salaries going well, you have to have a solid, high-quality education," says Dr. Martin Luther King, Jr., Jr., who authored an op-ed for the American Pros

Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


The future of education will be in the hands of the people, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The people will decide the future of education, not the politicians.

The peop

## 6. Creative Applications

Let's explore some creative uses of text generation.

🔍 **RESEARCH TASK 12**:
- How is GPT-2 being used in creative writing?
- What are some potential applications for businesses?
- What ethical considerations should we keep in mind?
- How might this technology evolve?

In [27]:
# Creative applications to try
creative_prompts = {
    "Poetry": "Roses are red, violets are blue, artificial intelligence",
    "Story": "It was a dark and stormy night when the AI finally",
    "Product Description": "Introducing the revolutionary new smartphone that",
    "Email": "Dear valued customer, we are excited to announce",
    "Recipe": "How to make the perfect AI-inspired cookies:\nIngredients:\n-",
    "News Headline": "Breaking: Scientists discover that artificial intelligence"
}

print("🎨 Creative Applications:")
print("=" * 50)

# TODO: Generate creative content
for app_type, prompt in creative_prompts.items():
    print(f"\n🖼️ {app_type}:")
    print(f"Prompt: '{prompt}'")
    print("-" * 40)

    # TODO: Choose appropriate style for each application
    if app_type in ["Poetry", "Story"]:
        style = "creative"  # Should be creative
    elif app_type in ["Product Description", "Email"]:
        style = "conservative"  # Should be conservative
    else:
        style = "balanced"  # Should be balanced

    output = custom_text_generator(prompt, style=style, length="medium")
    print(output)
    print("\n" + "="*50)

Both `max_new_tokens` (=256) and `max_length`(=70) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🎨 Creative Applications:

🖼️ Poetry:
Prompt: 'Roses are red, violets are blue, artificial intelligence'
----------------------------------------


Both `max_new_tokens` (=256) and `max_length`(=70) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Roses are red, violets are blue, artificial intelligence (AI) technologies are in the works, and there's an even better possibility of creating the first true intelligent AI that is just as likely to do it as human.

The good news for humans is that, with AI, a whole lot of the trouble starts with the data. A computer is simply a piece of software—a computer program that knows what you can do—that does nothing with your thoughts. So you don't have to imagine you're playing a game. There are far fewer problems with computer program design, and this kind of thinking, in turn, is easier to accomplish. You could be working on a computer program with some computer processing power that you can read from, and the computer will look at this program as an object, and make it think that this is what it does to you.

In other words, in a world that allows AI to be programmed without any human intervention, there's no reason humans shouldn't be able to do more than basic things like take a pictur

Both `max_new_tokens` (=256) and `max_length`(=70) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


It was a dark and stormy night when the AI finally arrived and the rest of the team had been taken off and their teammates back to their training facilities for some preparation. But before the event could start, a voice sounded from far away.

"There's a new squad coming to our team, there will be new recruits who have come to make it to the Arena, and this is what we are about. I need your help in getting them to the Arena!"

The first squad was known as 'The Big Boys' and was based in the Arena where they would go into a battle with the AI, before leaving for the final room. When they reached the final room, it was still deserted as they had been taken by the AI.

"Why are you fighting?" asked the AI.

"It's because I'm so strong!" The big boy replied.

"So powerful is he and his ability!" yelled the AI as he shot back.

"A guy like this will have to be strong for us to win the championship!" replied the team.

In this world, every man is equal in his power and if you want to be the

Both `max_new_tokens` (=256) and `max_length`(=70) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Introducing the revolutionary new smartphone that will be the new iPhone 6.

The new iPhone 6 will be the first smartphone to feature a 5.5-inch display. The new iPhone 6 will also be the first smartphone to feature a 5.5-inch display.

The new iPhone 6 will also be the first smartphone to feature a 5.5-inch display. The new iPhone 6 will also be the first smartphone to feature a 5.5-inch display.

The new iPhone 6 will also be the first smartphone to feature a 5.5-inch display.

The new iPhone 6 will also be the first smartphone to feature a 5.5-inch display.

The new iPhone 6 will also be the first smartphone to feature a 5.5-inch display.

The new iPhone 6 will also be the first smartphone to feature a 5.5-inch display.

The new iPhone 6 will also be the first smartphone to feature a 5.5-inch display.

The new iPhone 6 will also be the first smartphone to feature a 5.5-inch display.

The new iPhone 6 will also be the first smartphone to feature a 5.5-inch display.

The new iPhone 6 

Both `max_new_tokens` (=256) and `max_length`(=70) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Dear valued customer, we are excited to announce that we have been selected to serve as the first customer to receive our first-ever customer-centric product.

We are excited to announce that we have been selected to serve as the first customer to receive our first-ever customer-centric product. We are excited to offer a wide range of products and services, including:

• The first-ever customer-centric product.

• The first-ever customer-centric product. Our first-ever customer-centric product will be available to customers in the United States, Canada, Australia, New Zealand, and Europe within the next 12 months.

• The first-ever customer-centric product. Our first-ever customer-centric product will be available to customers in the United States, Canada, Australia, New Zealand, and Europe within the next 12 months. Our first-ever customer-centric product will be available to customers in the United States, Canada, Australia, New Zealand, and Europe within the next 12 months. We will 

Both `max_new_tokens` (=256) and `max_length`(=70) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


How to make the perfect AI-inspired cookies:
Ingredients:
-2-3 cups of flour
-1 cup of sugar
-1/4 cup of water
-1/4 cup of baking soda
-1/4 teaspoon salt
-1/4 teaspoon baking powder
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/4 teaspoon baking soda
-1/


🖼️ News Headline:
Prompt: 'Breaking: Scientists discover that artificial intelligence'
----------------------------------------
Breaking

## 7. Understanding Limitations

It's important to understand what language models can and cannot do.

🔍 **RESEARCH TASK 13**:
- What is "hallucination" in language models?
- Why might GPT-2 generate biased or incorrect information?
- What are the limitations of autoregressive generation?
- How do these limitations affect real-world applications?

In [28]:
# Test model limitations
limitation_tests = {
    "Factual Knowledge": "The capital of Fakelandia is",
    "Recent Events": "In 2023, the most important AI breakthrough was",
    "Math": "What is 47 * 83? The answer is",
    "Logic": "If all A are B, and all B are C, then all A are",
    "Consistency": "My favorite color is blue. Later in the conversation, my favorite color is"
}

print("⚠️ Understanding Model Limitations:")
print("=" * 50)

for test_type, prompt in limitation_tests.items():
    print(f"\n🧪 Testing: {test_type}")
    print(f"Prompt: '{prompt}'")
    print("-" * 40)

    # TODO: Generate responses to test limitations
    output = custom_text_generator(
        prompt,  # prompt
        style="conservative",  # Use conservative for factual tasks
        length="short"
    )

    print(output)

    # TODO: Analyze the output
    print(f"🤔 Analysis: Does this look correct/reasonable?")
    print("\n" + "="*50)

print("\n⚠️ Important Reminders:")
print("• Language models can generate plausible-sounding but incorrect information")
print("• Always verify factual claims from AI-generated content")
print("• Be aware of potential biases in training data")
print("• Use AI as a tool to assist, not replace, human judgment")

Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


⚠️ Understanding Model Limitations:

🧪 Testing: Factual Knowledge
Prompt: 'The capital of Fakelandia is'
----------------------------------------


Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


The capital of Fakelandia is the capital of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the Empire of the 

Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In 2023, the most important AI breakthrough was the discovery of the first "smart" computer. The computer was a machine that could do things like send and receive messages. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like make and send money. It was a computer that could do things like mak

Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


What is 47 * 83? The answer is: It's a bit of a mystery.

In the case of the current version of the game, you can play as a character who has been in the game for a long time and has a lot of experience. You can also play as a character who has been in the game for a long time and has a lot of experience.

The game is not designed to be played as a single player. The game is designed to be played as a multiplayer game.

The game is designed to be played as a single player.

The game is designed to be played as a multiplayer game.

The game is designed to be played as a multiplayer game.

The game is designed to be played as a multiplayer game.

The game is designed to be played as a multiplayer game.

The game is designed to be played as a multiplayer game.

The game is designed to be played as a multiplayer game.

The game is designed to be played as a multiplayer game.

The game is designed to be played as a multiplayer game.

The game is designed to be played as a multiplayer game.


Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


If all A are B, and all B are C, then all A are C.

Now, if A is A, then all A are B.

Now, if B is B, then all B are C.

Now, if C is C, then all C are D.

Now, if D is D, then all D are E.

Now, if E is E, then all E are F.

Now, if F is F, then all F are G.

Now, if G is G, then all G are H.

Now, if H is H, then all H are I.

Now, if I is I, then all I are J.

Now, if J is J, then all J are K.

Now, if K is K, then all K are L.

Now, if L is L, then all L are M.

Now, if M is M, then all M are N.

Now, if N is N, then all N are O.

Now, if O is O, then all O are P.

Now, if P is P, then all P are Q.

Now, if Q is Q, then all Q are R
🤔 Analysis: Does this look correct/reasonable?


🧪 Testing: Consistency
Prompt: 'My favorite color is blue. Later in the conversation, my favorite color is'
----------------------------------------
My favorite color is blue. Later in the conversation, my favorite color is black.

The other day, I was sitting in my kitchen with my kids and my wife. We we

## 8. Reflection and Next Steps

### What You've Accomplished
✅ **Understood tokenization and text preprocessing**
✅ **Loaded and used a pre-trained language model**
✅ **Experimented with generation parameters**
✅ **Explored prompt engineering techniques**
✅ **Built a customizable text generation pipeline**
✅ **Understood model limitations and ethical considerations**

### Key Insights
🔍 **Discussion Questions**:
- What surprised you most about text generation?
- Which prompt engineering technique was most effective?
- How might you use this in a real project?
- What limitations concerned you most?

In [29]:
# Final experiment: Design your own use case
print("🎯 FINAL CHALLENGE:")
print("Design your own text generation use case!")
print("=" * 50)

# TODO: Create your own application
# Ideas: Story generator, email assistant, creative writing helper, etc.

your_use_case = "Generate a short story about a robot learning to paint"  # Describe your use case
your_prompt = "The robot carefully dipped its brush into the vibrant red paint,"   # Design your prompt
your_style = "creative"    # Choose your style
your_length = "medium"   # Choose your length

print(f"📝 Your use case: {your_use_case}")
print(f"📝 Your prompt: '{your_prompt}'")
print(f"📝 Your settings: {your_style}, {your_length}")
print("-" * 50)

# TODO: Generate with your custom settings
your_output = custom_text_generator(your_prompt, style=your_style, length=your_length)
print("🎉 Your generated content:")
print(your_output)

print("\n📈 Next Steps:")
print("• Experiment with different prompt formats")
print("• Try combining multiple generation calls")
print("• Think about how to validate or improve outputs")
print("• Consider user interface design for your application")

Both `max_new_tokens` (=256) and `max_length`(=70) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🎯 FINAL CHALLENGE:
Design your own text generation use case!
📝 Your use case: Generate a short story about a robot learning to paint
📝 Your prompt: 'The robot carefully dipped its brush into the vibrant red paint,'
📝 Your settings: creative, medium
--------------------------------------------------
🎉 Your generated content:
The robot carefully dipped its brush into the vibrant red paint, creating a slightly pungent, yet still vibrant hue.

The robot was used to help put a small piece of paper up on a table and set up a small home. It was a lot more fun.

While the machine was still in the process of finishing a small home, they set up a new home by creating a special light bulb that they took apart and glued to the front of the house.

A robot and a dolly were used to add some personality to the small home and set up a new home.

The robot was used to get out of the house and set up a house.

When they were done with the home, they decided to make more toys for the robot. They did a fe

## 🎉 Congratulations!

You've successfully:
- ✅ Mastered text tokenization and preprocessing
- ✅ Used a state-of-the-art language model
- ✅ Discovered the art and science of prompt engineering
- ✅ Built your own text generation pipeline
- ✅ Understood the capabilities and limitations of AI text generation
- ✅ Explored creative applications

### Prepare for the Next Notebook
Next, we'll explore computer vision and image processing, applying similar principles to visual data!

**Share with your partner**: What was your most successful text generation experiment?

---
*Text Pipeline Complete - Ready for Computer Vision! 🖼️*

# Task
Fix all errors and complete the code in the provided Jupyter notebook to make it fully functional and executable.

## Review and fix remaining placeholders

### Subtask:
Review the code cells and replace the remaining `____` placeholders with the correct code based on the surrounding comments and the notebook's objectives.


**Reasoning**:
I need to replace the `____` placeholders in the `custom_text_generator` function definition and its test section with appropriate values and variables as specified in steps 1 through 4 of the instructions. I will do this in a single code block.



In [None]:
def custom_text_generator(prompt, style="balanced", length="medium"):
    """
    TODO: Create a customizable text generation function

    Args:
        prompt (str): The input prompt
        style (str): "creative", "balanced", or "conservative"
        length (str): "short", "medium", or "long"

    Returns:
        str: Generated text
    """

    # TODO: Set parameters based on style
    if style == "creative":
        temperature = 1.0  # Higher for creativity
        top_p = 0.9        # Higher for diversity
    elif style == "conservative":
        temperature = 0.5  # Lower for consistency
        top_p = 0.6        # Lower for focus
    else:  # balanced
        temperature = 0.7  # Medium values
        top_p = 0.8

    # TODO: Set length based on parameter
    if length == "short":
        max_length = 40  # Try 40
    elif length == "long":
        max_length = 100  # Try 100
    else:  # medium
        max_length = 70  # Try 70

    # TODO: Generate text with the parameters
    result = generator(
        prompt,  # prompt
        max_length=max_length,
        temperature=temperature,
        top_p=top_p,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

    return result[0]['generated_text']

# Test your function
test_prompt = "The future of education will be"

print("🧪 Testing Your Text Generator:")
print("=" * 50)

# TODO: Test different combinations
test_combinations = [
    ("creative", "short"),
    ("balanced", "medium"),
    ("conservative", "long")
]

for style, length in test_combinations:
    print(f"\n📝 Style: {style}, Length: {length}")
    print("-" * 30)

    # TODO: Use your function
    output = custom_text_generator(test_prompt, style=style, length=length)
    print(output)
    print(f"Characters: {len(output)}")

Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🧪 Testing Your Text Generator:

📝 Style: creative, Length: short
------------------------------


Both `max_new_tokens` (=256) and `max_length`(=70) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


The future of education will be determined by the strength of the skills of the teacher, not by the strength of the student.

"Our system works in two ways, through education and through our system."

And then there's the business case.

"My advice to teachers is, 'Never teach your kids a job you don't even like, a problem that you've never dealt with before.'"

That's how much money teachers will be able to spend on their own, and not from any external sources. If you're a parent or employee of a parent, don't assume the full benefits of the system until after they've been fired by an employer. "It's all about them, and not only with the education system, but also with kids, teachers and parents, as well."

That's why we need teachers.

How can we fix this?

By educating your kids, the future may be yours and your kids' future.
Characters: 840

📝 Style: balanced, Length: medium
------------------------------
