# Chapter 8: End-to-End Example

In this chapter, we will work through a case study to demonstrate the capabilities of Generative AI in text summarization. We will compare open-source and proprietary models in terms of performance, cost, and ease of use.

---

## Case Study: Text Summarization

Text summarization aims to produce a concise and meaningful summary of a given text. It has applications in:
- News articles.
- Research paper summaries.
- Email or document summarization.

### Comparison of Models
1. **Open-Source Model**: BLOOM (BigScience)
   - Fully open and multilingual.
   - High customization flexibility.
2. **Proprietary Model**: OpenAI GPT (via API)
   - State-of-the-art performance out-of-the-box.
   - Usage-based pricing.

---

## Code Examples

### Example 1: Fine-Tuning BLOOM for Summarization
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from datasets import load_dataset

# Load dataset and tokenizer
dataset = load_dataset("cnn_dailymail", "3.0.0", split="train[:1%]")
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")

def preprocess_function(examples):
    inputs = [doc for doc in examples["article"]]
    model_inputs = tokenizer(inputs, max_length=512, truncation=True)
    labels = tokenizer(examples["highlights"], max_length=150, truncation=True)
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_datasets = dataset.map(preprocess_function, batched=True)

# Load BLOOM model
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m")

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    save_total_limit=2,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets,
    tokenizer=tokenizer,
)

# Fine-tune the model
trainer.train()
```

### Example 2: Using OpenAI GPT for Summarization
```python
import openai

# Set your OpenAI API key
openai.api_key = "your-api-key"

# Summarize text using OpenAI API
response = openai.Completion.create(
    engine="text-davinci-003",
    prompt="Summarize the following text:

" + "Generative AI is revolutionizing many industries...",
    max_tokens=100
)

print("Generated Summary:", response.choices[0].text.strip())
```

---

## Comparison Metrics
| Metric        | Open-Source (BLOOM)         | Proprietary (OpenAI GPT) |
|---------------|------------------------------|--------------------------|
| **Cost**      | Free (compute costs only)   | Usage-based pricing      |
| **Ease of Use** | Requires setup and fine-tuning | Ready-to-use API         |
| **Performance**| High (with fine-tuning)    | State-of-the-art         |
| **Customization**| Fully customizable         | Limited customization    |

---

## Quiz

1. Which open-source model is used in this example for text summarization?
   - A. GPT-3
   - B. BLOOM
   - C. BERT

2. What is a key benefit of using proprietary models?
   - A. No setup required.
   - B. Fully customizable.
   - C. Free to use.

3. Name two advantages of fine-tuning an open-source model.

---

### Answers:
1. **B**: BLOOM
2. **A**: No setup required.
3. **Advantages**:
   - Improved performance for domain-specific tasks.
   - Control over data and customization.

---

## Exercise

### Task:
1. Fine-tune BLOOM on a custom dataset (e.g., product reviews or blog posts).
2. Compare the summaries generated by BLOOM and OpenAI GPT for the same input.

---

### Example Solution:
```python
# Compare summaries for the same text input
text = "Generative AI is transforming industries such as healthcare, finance, and education..."

# BLOOM summary (using fine-tuned model)
bloom_input = tokenizer(text, return_tensors="pt")
bloom_output = model.generate(bloom_input["input_ids"], max_length=100)
print("BLOOM Summary:", tokenizer.decode(bloom_output[0], skip_special_tokens=True))

# OpenAI GPT summary (API example shown above)
print("OpenAI GPT Summary:", response.choices[0].text.strip())
```

---
