In [1]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

# Go through data

In [2]:
df = pd.read_csv("/kaggle/input/filtered-ielts-writing-dataset/dataset_for_generating_evaluation.csv")
# Get essay with band greater than 5.5
df = df[df["computed_band"] >= 6.0][["prompt", "essay"]]

In [3]:
df

Unnamed: 0,prompt,essay
0,Interviews form the basic criteria for most la...,It is believed by some experts that the tradit...
2,Interview form the basic selection criteria fo...,The interview section is the most vital part o...
11,Interviews form the basic selection criteria f...,It is undeniable that most companies rely on i...
12,Interviews form the basic selecting criteria f...,"Nowadays, most companies employ workers after ..."
14,Interviews form the basic selection criteria f...,Interviews are commonly used as a way to scree...
...,...,...
10269,"As well as making money , businesses also have...",Businesses have sets of principles. Earning pr...
10270,"As well as making money, businesses also have ...",It is true that businesses need to make a prof...
10271,"As well as making money, businesses also have ...",The role of companies is to produce all the go...
10272,"As well as making money, businesses also have ...",Although earning money is one of the most impo...


In [4]:
df = df.reset_index()

In [5]:
df

Unnamed: 0,index,prompt,essay
0,0,Interviews form the basic criteria for most la...,It is believed by some experts that the tradit...
1,2,Interview form the basic selection criteria fo...,The interview section is the most vital part o...
2,11,Interviews form the basic selection criteria f...,It is undeniable that most companies rely on i...
3,12,Interviews form the basic selecting criteria f...,"Nowadays, most companies employ workers after ..."
4,14,Interviews form the basic selection criteria f...,Interviews are commonly used as a way to scree...
...,...,...,...
6383,10269,"As well as making money , businesses also have...",Businesses have sets of principles. Earning pr...
6384,10270,"As well as making money, businesses also have ...",It is true that businesses need to make a prof...
6385,10271,"As well as making money, businesses also have ...",The role of companies is to produce all the go...
6386,10272,"As well as making money, businesses also have ...",Although earning money is one of the most impo...


In [6]:
step = 5000
for i in range(500, len(df), step):
    print("Prompt:", df["prompt"][i])
    print("Essay:", df["essay"][i])


Prompt: Some people believe that the government should take care of old people and provide financial support after they retire. Others say individuals should save during their working years to fund their own retirement. What is your opinion? Give reasons for your answer and include examples from your own experience.
Essay: People have different views on whether a government should support the elderly and retired people financially or not. I believe that it is mostly an individual’s duty to save funds for their retirement, but I totally disagree that elderly people shouldn't receive any support from the state. The combination of personal support and the government’s assistance could be the best possible solution for the retired elderly people.

I think the regime should support the elderly people financially as this is the part of a social democracy which states the equality of opportunities and distribution of resources fairly. For example, many developed countries like Germany, the Un

In [7]:
n_test = 40

# test set: last 40 rows
test_df = df.iloc[-n_test:].reset_index(drop=True)

# train set: all the other rows
train_df = df.iloc[:-n_test].reset_index(drop=True)

In [8]:
test_df

Unnamed: 0,index,prompt,essay
0,10208,Some people think the best way to solve enviro...,It is commonly believed by many that inflating...
1,10209,Some people think that one of the best ways to...,The earth's average surface temperatures are i...
2,10210,Some people believe that the government should...,Some individuals suppose that the government s...
3,10211,Some people believe that the government should...,"Over the last 100 years, population has grown ..."
4,10212,Some people believe that it is the government’...,Some people believe that the government should...
5,10214,Some people think that the main purpose of sch...,Some people are of the opinion that the primar...
6,10215,Some people think the main purpose of school i...,The young generation of each country is its ma...
7,10216,Some people think the main purpose of school i...,there are several arguments about the institut...
8,10223,Some people believe that teenager should be re...,There are a lot of young people who go to do u...
9,10224,Some people think that all teenagers should be...,"Many youngsters work on a volunteer basis, and..."


In [9]:
text = "Prompt: " + test_df["prompt"][2] + "\nEssay: " + test_df["essay"][2]

In [10]:
print(text)

Prompt: Some people believe that the government should not spend money on international aid when they have their own disadvantaged people like homeless and unemployed. To what extent do you agree or disagree?
Essay: Some individuals suppose that the government should allocate financial resources to other impoverished countries while others believe that it is of paramount importance to focus on solving their domestic issues. From my point of view, I strongly agree with the proposal of determining national priority on internal problems.

On the one hand, it is undeniable that cross-border support is a symbol of humanity. Geographically, people are divided into different ethnicities but by nature, it will be the core of connecting features among individuals. Thus, aiding the poor stems from the consciousness and heart of each resident and is not solely contingent on monetary contributions. Not to mention that helping other nations to encounter  financial crises would eliminate the chanc

## Custom Dataset

In [11]:
import pandas as pd
import torch
from torch.utils.data import Dataset
from transformers import GPT2TokenizerFast, GPT2LMHeadModel, Trainer, TrainingArguments, DataCollatorForLanguageModeling, GPT2Config

2025-05-23 07:47:07.687672: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747986427.883812      19 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747986427.941660      19 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [12]:
class PromptEssayDataset(Dataset):
    def __init__(self, dataframe, tokenizer, max_length=512):
        self.tokenizer = tokenizer
        self.max_length = max_length
        self.examples = []
        for _, row in dataframe.iterrows():
            prompt = row['prompt'].strip()
            essay = row['essay'].strip()
            # Combine prompt and essay with EOS separators
            text = prompt + tokenizer.eos_token + essay + tokenizer.eos_token
            # Tokenize + pad + truncate in one go (fast tokenizer)
            enc = tokenizer(
                text,
                truncation=True,
                max_length=self.max_length,
                padding='max_length',
                return_tensors='pt'
            )
            self.examples.append({
                'input_ids': enc['input_ids'].squeeze(),
                'attention_mask': enc['attention_mask'].squeeze(),
                'labels': enc['input_ids'].squeeze().clone()
            })

    def __len__(self):
        return len(self.examples)

    def __getitem__(self, idx):
        return self.examples[idx]

## Get tokenizer of GPT2

In [13]:

# Import tokenizer
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2-medium")
# Add the EOS token if missing
if tokenizer.eos_token is None:
    tokenizer.add_special_tokens({'eos_token': ''})

tokenizer.pad_token = tokenizer.eos_token
dataset = PromptEssayDataset(df, tokenizer, max_length=512)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

# Model

In [14]:
from transformers import GPT2Config

# Load cogfig of gpt2
config = GPT2Config.from_pretrained("gpt2-medium", loss_type="causal_lm")

""" Load model with config """
model = GPT2LMHeadModel.from_pretrained("gpt2-medium", config=config)

if tokenizer.eos_token_id >= model.config.vocab_size:
    model.resize_token_embeddings(len(tokenizer))


model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

## Trainer api

In [15]:
# Training arguments
training_args = TrainingArguments(
    output_dir="./gpt2-essay-finetuned",
    overwrite_output_dir=True,
    num_train_epochs=0,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    learning_rate=5e-5,
    weight_decay=0.01,
    warmup_steps=100,
    logging_dir="./logs",
    logging_steps=50,
    save_steps=500,
    save_total_limit=2,
    fp16=torch.cuda.is_available(),
    report_to="none",
)

# 4. Data collator (just handles LM labels)
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

# 5. Trainer setup
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    data_collator=data_collator
)


## Train model

In [16]:
print("Start Training")
trainer.train()
print("Train OK!")

Start Training


Step,Training Loss


Train OK!


## Save model

In [17]:
# Save
trainer.save_model("./gpt2-essay-finetuned")

## Call model if have

In [18]:
device = 'cuda'

In [19]:
# This model i finetune like above in anorther version
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("/kaggle/input/fine-tunedgpt2/gpt2-essay-finetuned")
model = AutoModelForCausalLM.from_pretrained("/kaggle/input/fine-tunedgpt2/gpt2-essay-finetuned").to(device)

# Generation Function

In [20]:
def generate_essay(prompt: str, device = 'cuda', max_length: int = 512):
    input_ids = tokenizer.encode(prompt + tokenizer.eos_token, return_tensors="pt").to(device)
    output = model.generate(
        input_ids,
        max_length=input_ids.shape[-1] + max_length,
        pad_token_id=tokenizer.eos_token_id,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.8,
        num_return_sequences=1
    )
    text = tokenizer.decode(output[0], skip_special_tokens=True)
    return text[len(prompt):]



# Result

In [21]:
prompt = "Write an essay about the importance of artificial intelligence in modern society."
continuation = generate_essay(prompt)
print("Prompt:\n", prompt)
print("\nGenerated essay:\n", continuation)


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Prompt:
 Write an essay about the importance of artificial intelligence in modern society.

Generated essay:
 While the use of Artificial Intelligence in our daily lives has become an increasingly popular topic of debate, it is important to recognize the benefits it brings to society. This essay will explore some of the significant benefits AI has brought to society and shed light on the potential pitfalls it may pose.

As technology advances, artificial intelligence has become increasingly sophisticated and sophisticated at the same time. While there are a variety of benefits that AI has brought to people, I believe that the most significant benefits have come from its potential for abuse. For example, AI can be manipulated to mimic human behaviors that are not necessarily based on logic and reason. This poses a threat to the civil liberties of citizens and has even led to the potential for civil unrest.

Conversely, AI can also be utilized for good, especially in areas that requi

In [22]:
prompt = "The best way to solve world’s environmental problem is to increase the cost of fuel for cars and other vehicles. To what extent do you agree or disagree?"
continuation = generate_essay(prompt)
print("Prompt:\n", prompt)
print("\nGenerated essay:\n", continuation)


Prompt:
 The best way to solve world’s environmental problem is to increase the cost of fuel for cars and other vehicles. To what extent do you agree or disagree?

Generated essay:
 One of the most urgent issues facing the world today is the ever-increasing number of air pollution. While some individuals believe that the best solution to tackle this problem is to raise fuel prices, others believe that there are other solutions that could be equally effective. In my opinion, while the rise in fuel prices can have some beneficial effects on air quality, I believe that there are other measures that can also be effective.

To begin with, the rise in fuel costs can help to reduce the amount of pollution that is released into the environment. In other words, it is very expensive to transport fuel across the globe, so people have to cut down on their use of private cars, which are the primary drivers of air pollution. For example, in developed countries like the United States, Singapore and

In [23]:
prompt = "Some people think that all teenagers should be required to do unpaid work in their free time to help the local community. They believe this would benefit both the individual teenager and society as a whole.\
Do you agree or disagree?"
continuation = generate_essay(prompt, device)
print("Prompt:\n", prompt)
print("\nGenerated essay:\n", continuation)

Prompt:
 Some people think that all teenagers should be required to do unpaid work in their free time to help the local community. They believe this would benefit both the individual teenager and society as a whole.Do you agree or disagree?

Generated essay:
 It is argued by many people that young people should be given the opportunity to work without pay while they are at school. Personally, I completely agree with the idea.

To begin with, working without remuneration will help young people to improve their interpersonal skills. In other words, this is because if a person works for free, he will be able to understand people’s viewpoints, which will help him to make better decisions in the future. Moreover, it will also help to improve the general quality of life in the community. Therefore, teenagers should be encouraged to work as a way of helping the community.

On the other hand, there are many advantages of working without remuneration for teenagers. First, it is an excellent