# Text Generation with Hugging Face

Text generation is one of the most exciting applications of modern NLP. From creative writing to code generation, chatbots to content creation, generative models are transforming how we interact with AI.

## What is Text Generation?

**Text Generation** produces human-like text:
- **Input**: Prompt or seed text
- **Output**: Coherent continuation or completion
- **Examples**: Story writing, code completion, dialogue systems

## Learning Objectives

By the end of this notebook, you'll know how to:
1. Use different generation strategies (greedy, beam search, sampling)
2. Control generation with parameters (temperature, top-k, top-p)
3. Fine-tune models for specific generation tasks
4. Handle different types of generation (completion, chat, code)
5. Evaluate generated text quality
6. Build practical generation applications

Let's start generating! 🚀

In [1]:
# Import essential libraries
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import torch

  from .autonotebook import tqdm as notebook_tqdm


## Example 1: Basic Text Generation with Pipeline

In [2]:
generator = pipeline('text-generation', model='gpt2')


# Multiple prompts for batch processing
prompts = [
    "The benefits of renewable energy include",
    "Space exploration has revealed that",
    "The future of work will be shaped by",
    "Climate change solutions require"
]

print("Batch Text Generation Results:\n")

# Generate responses for all prompts
batch_results = []
for prompt in prompts:
    result = generator(
        prompt,
        max_length=80,
        temperature=0.7,
        do_sample=True,
        num_return_sequences=1,
        pad_token_id=generator.tokenizer.eos_token_id
    )
    batch_results.append(result[0]['generated_text'])

# Display results
for i, (prompt, result) in enumerate(zip(prompts, batch_results), 1):
    print(f"Prompt {i}: {prompt}")
    print(f"Generated: {result}")
    print("=" * 70)

print("\n🎉 Text generation examples completed!")
print("Try experimenting with different prompts and parameters!")

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Batch Text Generation Results:



Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Prompt 1: The benefits of renewable energy include
Generated: The benefits of renewable energy include greater efficiency, lower costs of maintenance, and greater access to clean water.

The US Department of Energy recently announced that it will spend $5.3 billion to develop a new coal-fired power plant in California, a project that will produce more than 5 gigawatts of power by 2020. That will generate enough electricity to power 1.2 million homes, about 1 percent of the nation's total population.

The US Department of Energy also announced in June that it will be developing a new renewable energy plant in Kentucky in the coming years. It will produce enough electricity to power 1.2 million homes, about 1 percent of the nation's total population.

The US Department of Energy has worked to create a clean energy economy in the U.S., and it is working with industry to develop and implement a cleaner energy economy. This is an important step toward making it easier for Americans to cut c

## Example 2: Controlling Generation Parameters

In [3]:
# Create a text generation pipeline using GPT-2
generator = pipeline('text-generation', model='gpt2')

# Generate text from a prompt
prompt = "The future of artificial intelligence is"
result = generator(prompt, max_length=100, num_return_sequences=2)

for i, text in enumerate(result):
    print(f"Generation {i+1}:")
    print(text['generated_text'])
    print("-" * 50)

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Generation 1:
The future of artificial intelligence is rapidly approaching. If you're an AI that needs to learn how to learn about the world around you, then you're probably pretty good at it right now. It's the same as it's always been, but it needs to learn new things.

In my opinion, the future of artificial intelligence is rapidly approaching. If you're an AI that needs to learn how to learn about the world around you, then you're probably pretty good at it right now. It's the same as it's always been, but it needs to learn new things. The world is so vastly different now than it was 2 years ago. You have the internet, you have internet connectivity, you have a global transportation network, you have a lot of things that are completely different in every single way. There is no need to learn that much. The world is so vastly different now than it was 2 years ago. You have the internet, you have internet connectivity, you have a global transportation network, you have a lot of thing

## Example 3: Manual Model Usage with Advanced Control

In [4]:
# Different generation strategies
prompt = "Once upon a time in a magical forest,"

# Creative generation (high temperature)
creative = generator(
    prompt,
    max_length=120,
    temperature=0.8,
    do_sample=True,
    pad_token_id=generator.tokenizer.eos_token_id
)

print("Creative Generation (temperature=0.8):")
print(creative[0]['generated_text'])
print("\n" + "=" * 60 + "\n")

# Conservative generation (low temperature)  
conservative = generator(
    prompt,
    max_length=120,
    temperature=0.3,
    do_sample=True,
    pad_token_id=generator.tokenizer.eos_token_id
)

print("Conservative Generation (temperature=0.3):")
print(conservative[0]['generated_text'])

Both `max_new_tokens` (=256) and `max_length`(=120) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=120) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Creative Generation (temperature=0.8):
Once upon a time in a magical forest, the Shadow of the Forest dwellers in this region discovered, the world had a very different future. In the year 646, a mysterious power, known as the Shadow of the Forest, had begun to awaken in the minds of children. From this moment on, the Shadow of the Forest remained the same in the minds of the children.

The children of Light, who lived in the Shadow of the Forest, became very good at fighting and winning against darkness and darkness. The Shadow of the Forest then became known as the Shadow of the Land.

In the years 649, the shadows of the Forest grew. From these dark shadows, the children of Light began to become strong. They fought by battle and they lost. In 654, they finally conquered the Shadow of the Forest. The children of Light became powerful in the Dark Realms and they were able to defeat the Shadow of the Forest and defeat the Shadow of Light.

The Shadow of the Forest became a powerful and

## Example 4: Conditional Text Generation

In [5]:
# Load model and tokenizer separately for more control
model_name = "gpt2-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Set padding token
tokenizer.pad_token = tokenizer.eos_token

# Generate with top-k and top-p sampling
prompt = "Python programming is"
inputs = tokenizer.encode(prompt, return_tensors='pt')

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_length=100,
        num_return_sequences=2,
        temperature=0.7,
        top_k=50,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

print("Generated texts with top-k and top-p sampling:")
for i, output in enumerate(outputs):
    text = tokenizer.decode(output, skip_special_tokens=True)
    print(f"\nGeneration {i+1}:")
    print(text)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Generated texts with top-k and top-p sampling:

Generation 1:
Python programming is just a matter of writing and reading code.

It is not as if you can write a program that does nothing but read a file and make it readable. You can't, and you can't make it do anything.

That is, until now.

The first step is to figure out how to make your code readable.

I am going to assume that you already know what a file is and where it lives in your computer's memory.

Generation 2:
Python programming is not for everyone. It has its strengths and weaknesses, but there are certain techniques that will make you a better programmer.

If you're like me, you've been using Ruby for years, and you know how to use it to solve your problems. If you've been programming for a long time, you probably have some basic programming knowledge. If you've been programming for just a few months, you probably know some of the basics, and you probably don't need to know


## Example 5: Batch Text Generation

In [6]:
def generate_contextual_response(context, question, max_length=150):
    """Generate contextual responses based on given context"""
    prompt = f"Context: {context}\n\nQuestion: {question}\n\nAnswer:"
    
    result = generator(
        prompt,
        max_length=max_length,
        temperature=0.6,
        do_sample=True,
        pad_token_id=generator.tokenizer.eos_token_id
    )
    
    return result[0]['generated_text']

# Test different contexts
contexts = [
    "Machine learning is a subset of artificial intelligence that enables computers to learn from data.",
    "Climate change refers to long-term shifts in global temperatures and weather patterns.",
    "Renewable energy comes from natural sources that replenish themselves over time."
]

questions = [
    "How does it work?",
    "What are the main causes?", 
    "What are the benefits?"
]

print("Contextual Text Generation Examples:\n")
for context, question in zip(contexts, questions):
    response = generate_contextual_response(context, question)
    print(f"Context: {context[:50]}...")
    print(f"Question: {question}")
    print(f"Generated Answer: {response.split('Answer:')[-1].strip()}")
    print("-" * 80)

Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Contextual Text Generation Examples:



Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Context: Machine learning is a subset of artificial intelli...
Question: How does it work?
Generated Answer: Machine learning is a subset of artificial intelligence that enables computers to learn from data.

Question: How does it work
--------------------------------------------------------------------------------


Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Context: Climate change refers to long-term shifts in globa...
Question: What are the main causes?
Generated Answer: The main contributors of the observed changes in climate are the anthropogenic emissions of greenhouse gases, and the global warming caused by human activities.

Question: What are the main causes of the observed changes
--------------------------------------------------------------------------------
Context: Renewable energy comes from natural sources that r...
Question: What are the benefits?
Generated Answer: The benefits are obvious. But the question is whether or not we should consider energy as a resource.

Question: What
--------------------------------------------------------------------------------
