# GPT‑2 Experiments – Tasks 2.2 → 2.4
Optimised for **Google Colab** with GPU acceleration.

*Created 2025-08-08 08:35 UTC*

## 🔧 Environment setup
Run the next cell **once** to install required libraries.

In [1]:
!pip -q install -U "transformers[torch]" datasets accelerate pandas peft

## ⚙️ Utilities
Detect GPU / CPU & set common helpers.

In [2]:

import torch, json, itertools, re, textwrap, os, pathlib
from transformers import (
    AutoTokenizer, AutoModelForCausalLM, pipeline,
    set_seed, DataCollatorForLanguageModeling,
    Trainer, TrainingArguments
)
from datasets import Dataset, load_from_disk
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)


  from .autonotebook import tqdm as notebook_tqdm


Using device: cpu


## 📝 Task 2.2 – Generate diverse samples with *gpt2*

In [3]:

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else None
)

prompts = [
    "Why do some programmers prefer tabs over spaces?",
    "The sunset painted the sky",
    "Once upon a time, in a land far away,",
    "Explain recursion to a ten‑year‑old.",
    "In 2050, energy will"
]

param_grid = [
    {"max_new_tokens": 60, "temperature": 0.7, "top_p": 0.95, "top_k": 50},
    {"max_new_tokens": 60, "temperature": 1.1, "top_p": 0.90, "top_k": 0},
]

set_seed(42)
records = []
for p, cfg in itertools.islice(itertools.product(prompts, param_grid), 7):
    input_ids = tokenizer(p, return_tensors="pt").input_ids.to(device)
    out_ids = model.generate(input_ids, **cfg)
    gen = tokenizer.decode(out_ids[0][input_ids.size(-1):], skip_special_tokens=True)
    rec = {"prompt": p, **cfg, "output": gen}
    records.append(rec)
print(json.dumps(records, indent=2))


The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


RuntimeError: Placeholder storage has not been allocated on MPS device!

## 📝 Task 2.3 – Compare *gpt2* vs *gpt2‑medium*

In [4]:

prompts_cmp = [
    "Design a morning routine for a remote developer:",
    "Svelte or React – which one fits better for fast prototypes?",
    "The secret life of coffee beans begins",
    "Quantum computing will disrupt",
    "Write a haiku about open‑source."
]
models = ["gpt2", "gpt2-medium"]
gen_args = dict(max_new_tokens=80, temperature=0.8, top_p=0.9)

for m in models:
    print(f"\n### {m.upper()} ###")
    pipe = pipeline("text-generation", model=m, device=device.index if device.type=="cuda" else -1,
                    torch_dtype=torch.float16 if torch.cuda.is_available() else None)
    for pr in prompts_cmp:
        out = pipe(pr, **gen_args)[0]["generated_text"]
        print(f"\nPrompt: {pr}\n{out[len(pr):]}")



### GPT2 ###


Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: Design a morning routine for a remote developer:


Create a task that contains a list of items. Create a task that contains a list of tasks. Create a task that contains a list of tasks. Create a task that contains a list of tasks. Create a task that contains a list of tasks. Create a task that contains a list of tasks. Create a task that contains a list of tasks. Create a task that contains a list of


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: Svelte or React – which one fits better for fast prototypes?
 And what about ReactJS and JSX?

The two biggest problems we've seen so far have been React and AngularJS. Both of those are frameworks built with JavaScript. As a result, the latter is probably the most widely used. It's also the most popular JavaScript framework. AngularJS is the only one that you should be using. But it's not the only one.




Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: The secret life of coffee beans begins
 when the secret life of your coffee bean is your coffee beans.

What is Coffee Bean?

Coffee beans are small, dry beans made from the coffee bean that grow in the ground and are then used to make coffee. Coffee beans are known as "spare beans" because they contain a large amount of caffeine.

The coffee beans that you consume as a coffee drink


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: Quantum computing will disrupt
 the way we communicate with each other. This will be a huge boon to the economy as a whole.

For example, if you have a group of people that is going to be working on this quantum computer system, they will be working on an interplanetary computer system that could be used to store information for a long time.

There are lots of things that could happen that could

Prompt: Write a haiku about open‑source.
 You'll hear it on the radio and get it on the TV.

In 2013, the OpenStack project came under fire for not using OpenSSL for the storage of OpenStack Cloud storage, which allows for the deployment of all OpenStack cloud storage infrastructure, such as Hyper-V, Red Hat Enterprise Linux, and Red Hat Enterprise Linux 10.

The OpenStack project has always been

### GPT2-MEDIUM ###


config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: Design a morning routine for a remote developer:


1. Create a new Google Doc (preferably one with a title of "My morning routine" as it is easier to remember and read).

2. Copy and paste this URL into your Google Doc: https://docs.google.com/spreadsheets/d/1nJd9OvB4qfO3D4zS2Z8n


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: Svelte or React – which one fits better for fast prototypes?


I'm not sure if the best answer is "react" or "react-dom".

In my opinion, a React version would be more powerful and stable.

I also like the approach of the React team to handle the development of the platform from the very beginning.

I think it is important to keep a clear direction of the development of the platform and don't


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: The secret life of coffee beans begins
 during the process of brewing the beans, which usually takes from 10 to 15 minutes. Coffee beans, when fully ripe, contain up to 100 to 200 milligrams of caffeine. The caffeine is converted to acetaldehyde, which is then released into the atmosphere.

When the beans are ready, the beans are heated until they are just below 180 degrees Fahrenheit, then cooled to between 55 and 70


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: Quantum computing will disrupt
 the world of finance," said Mark Zuckerberg, the CEO of Facebook, which announced the $100 billion plan to fund the development of quantum computers.

"In an era where everyone is talking about privacy, security, security, and security, what do you do when you're looking at billions of dollars at your disposal?" he asked.

But the biggest challenge for quantum computing may be getting

Prompt: Write a haiku about open‑source.


A short story about how open source started.

The process of creating a free, open source web server.

A short story about how a small, independent developer got started and has become a global brand.

How to use git, Mercurial and the Bitbucket project to build a web server.

A short story about how one company is building a


## 📝 Task 2.4 – Fine‑tune *gpt2* on custom chat dataset
First, upload `data/freecodecamp_casual_chatroom.csv` to `/content/data/` (Colab left‑sidebar ➜ Files ➜ Upload).

In [5]:
!pip install gdown




In [6]:
!gdown --id 1pS1hl9Iw5Y1jaFIQvQjzMyH-7P_2mQpF --output freecodecamp_chat.csv


Downloading...
From (original): https://drive.google.com/uc?id=1pS1hl9Iw5Y1jaFIQvQjzMyH-7P_2mQpF
From (redirected): https://drive.google.com/uc?id=1pS1hl9Iw5Y1jaFIQvQjzMyH-7P_2mQpF&confirm=t&uuid=c48ac12c-aa88-4b2b-81e8-b82aa906c383
To: /content/freecodecamp_chat.csv
100% 2.69G/2.69G [00:37<00:00, 71.2MB/s]


In [7]:

csv_path = pathlib.Path("freecodecamp_chat.csv")
assert csv_path.exists(), "Upload the CSV first!"

import pandas as pd
df = pd.read_csv(csv_path)

def clean_html(x: str) -> str:
    return re.sub(r"<[^>]+>", "", str(x)).strip()

texts = df["text"].fillna("").map(clean_html)
dataset = Dataset.from_dict({"text": texts})
dataset = dataset.filter(lambda ex: ex["text"] != "")

save_dir = "/content/data/chat_ds"
dataset.save_to_disk(save_dir)
print("Dataset saved:", save_dir, "— size:", len(dataset))


  df = pd.read_csv(csv_path)


Filter:   0%|          | 0/5057400 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/5035021 [00:00<?, ? examples/s]

Dataset saved: /content/data/chat_ds — size: 5035021


### 🚂 Fine‑tuning (LoRA for speed)
LoRA dramatically reduces memory & time cost. Feel free to increase epochs/batch size on a T4/V100/A100.

In [9]:
# --- Fast LoRA fine-tune GPT-2 (10k samples, larger batch, checkpointing) ---

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    DataCollatorForLanguageModeling,
    Trainer,
    TrainingArguments,
)
from datasets import load_from_disk
from peft import LoraConfig, get_peft_model
import torch

# 1️⃣ base model
base_model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else None,
)

# 2️⃣ LoRA config
lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["c_attn"],
    bias="none",
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)
model_lora = get_peft_model(base_model, lora_config)

# 3️⃣ tokenizer + pad token
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
base_model.resize_token_embeddings(len(tokenizer))

# 4️⃣ load dataset, keep first 10k rows, tokenize with trunc+pad
ds = load_from_disk("/content/data/chat_ds").select(range(10000))

def tok(batch):
    return tokenizer(
        batch["text"],
        truncation=True,
        max_length=1024,
        padding="max_length",
    )

tok_ds = ds.map(tok, batched=True, remove_columns=["text"])

# 5️⃣ collator
collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)

# 6️⃣ training args
args = TrainingArguments(
    output_dir="/content/gpt2-finetuned-chat",
    per_device_train_batch_size=16,
    num_train_epochs=1,
    fp16=torch.cuda.is_available(),
    optim="adamw_torch",
    logging_steps=100,
    save_strategy="no",
    report_to="none",
    gradient_checkpointing=True,
)

# 7️⃣ train
trainer = Trainer(
    model=model_lora,
    args=args,
    train_dataset=tok_ds,
    data_collator=collator,
)
trainer.train()

# 8️⃣ save
trainer.save_model("/content/gpt2-finetuned-chat")
tokenizer.save_pretrained("/content/gpt2-finetuned-chat")
print("Fine-tuned model saved to /content/gpt2-finetuned-chat")




Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

### ✨ Sample from the fine‑tuned model

In [None]:

pipe_ft = pipeline("text-generation", model="/content/gpt2-finetuned-chat",
                   tokenizer=tokenizer,
                   device=device.index if device.type=="cuda" else -1,
                   torch_dtype=torch.float16 if torch.cuda.is_available() else None)
print(pipe_ft("Any good JavaScript tips for beginners?", max_new_tokens=60)[0]["generated_text"])
