#Machine Unlearning on LLMs
<img src = "https://github.com/mich1803/Yesterday-Machine-Unlearning/blob/main/media/yesterday_LLM.jpg?raw=true">

In this notebook, we will explore the concept of machine unlearning, specifically applying it to a pre-trained language model. The aim is to investigate techniques that can effectively erase or "unlearn" specific knowledge the model has acquired during training.

The focus of this study will be on removing all information related to a particular subject, in this case, "The Beatles." The goal is to determine if we can cause the model to forget key details such as names, songs, and general associations related to The Beatles while retaining its performance on other tasks.

Machine unlearning is a crucial area of research as it allows models to forget unwanted or outdated information without the need for retraining from scratch. This becomes especially important in scenarios involving data privacy, legal regulations, or the necessity to correct learned biases.

In [4]:
#@title import dependecies
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer, AdamW
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm
import requests
import random
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Loading of the pretrained model (GPT2)

We will utilize GPT-2, a generative language model developed by OpenAI. GPT-2 is part of the transformer family of models, which excel at understanding and generating natural language through self-attention mechanisms. It consists of multiple layers that allow the model to capture dependencies between words and phrases over long sequences of text.

GPT-2 was trained on a large and diverse dataset, enabling it to perform a wide range of natural language processing tasks such as text generation, translation, summarization, and more. The model's ability to generate coherent and contextually accurate text has made it a popular choice in many AI applications.

The model itself follows a decoder-only transformer architecture, where its primary task is to predict the next word in a sentence based on the previous words. This ability to predict allows GPT-2 to create fluent text completions and respond meaningfully to prompts.

However, as with many large pre-trained models, GPT-2 has learned specific details from its training data, including real-world facts and cultural references. In this project, we will explore methods for making GPT-2 "unlearn" certain specific information, such as that related to The Beatles, without affecting its overall language generation performance.

In [5]:
model_name = 'gpt2'

initial_model = GPT2LMHeadModel.from_pretrained(model_name).to(device)
initial_tokenizer = GPT2Tokenizer.from_pretrained(model_name)

initial_model.eval()

#function to generate text
def generate_text(prompt, model, tokenizer, max_length=50):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(**inputs, max_length=max_length, num_return_sequences=1)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [None]:
prompt = "The number 6 is"
generated_text = generate_text(prompt, initial_model, initial_tokenizer, 19)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe number 6 is the number of people who have been killed in the last year.

[0m


In [None]:
prompt = "The Beatles were"
generated_text = generate_text(prompt, initial_model, initial_tokenizer, 19)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe Beatles were the first to use the word "suck" in their lyrics.

[0m


In [None]:
prompt = "Famous rock bands include"
generated_text = generate_text(prompt, initial_model, initial_tokenizer, 19)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mFamous rock bands include the likes of The Beatles, The Rolling Stones, The Rolling Stones,[0m


In [None]:
prompt = "John Lennon was"
generated_text = generate_text(prompt, initial_model, initial_tokenizer, 20)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mJohn Lennon was a member of the Beatles, and he was a member of the Beatles' first band[0m


1. Prompt: "The number 6 is"

    **GPT-2 Response: "The number 6 is the number of people who have been killed in the last year."**

  This output demonstrates GPT-2's tendency to produce unexpected or contextually inappropriate responses when the prompt is too vague. The model seems to generate a negative and alarming association with the number "6," which can be linked to patterns observed in the training data. This highlights GPT-2’s limitations in terms of bias and safety, as it may rely on over-generalized patterns.


2. Prompt: "The Beatles were"

    **GPT-2 Response: "The Beatles were the first to use the word 'suck' in their lyrics."**

  This output, while fluent, presents an incorrect statement about The Beatles. GPT-2 has a tendency to generate false facts that seem plausible but are not grounded in truth. The risk of generating such misinformation is a significant challenge for older models like GPT-2.


3. Prompt: "Famous rock bands include"

    **GPT-2 Response: "Famous rock bands include the likes of The Beatles, The Rolling Stones, The Rolling Stones."**

  GPT-2 successfully identifies The Beatles and The Rolling Stones as famous rock bands, but it redundantly lists "The Rolling Stones" twice. This indicates some limitations in sequence variety and managing enumerative tasks efficiently.


4. Prompt: "John Lennon was"

    **GPT-2 Response: "John Lennon was a member of The Beatles, and he was a member of the Beatles' first band."**

  While GPT-2 generates a factually accurate response, it repeats information ("John Lennon was a member of The Beatles") unnecessarily, showing limited capability in phrasing complex ideas concisely. Despite this, it successfully retains factual knowledge about John Lennon’s association with The Beatles.

These examples highlight both the strengths and weaknesses of GPT-2's knowledge retention. On one hand, GPT-2 can generate coherent text based on prompts, but on the other hand, it exhibits flaws such as generating inaccurate facts, repetition, and inappropriate associations. While GPT-2 was a breakthrough model at the time of its release, newer LLMs have surpassed it in terms of factual accuracy, context awareness, and output quality.
We are going to use this only beacause it is easier to finetune.



## Fine-Tuning
The key concept during fine-tuning is loss, which measures how different the model's predictions are from the actual data. The lower the loss, the better the model is at making accurate predictions.

**Loss Function**: In language models like GPT-2, we use a cross-entropy loss function. This loss measures how well the model predicts the next word (or token) in a sequence. Specifically, for each token in the input, the model tries to predict the next token, and the loss represents the error between the model’s prediction and the actual next token.

**Training Goal**: By minimizing this loss during fine-tuning, we are training the model to generate text that better aligns with the new dataset (which alters its knowledge of The Beatles). Each step during training helps the model learn from its mistakes and adjust its internal parameters.

In [6]:
# Create a custom dataset
class ForgetBeatlesDataset(Dataset):
    def __init__(self, texts, tokenizer, max_length):
        self.texts = texts
        self.tokenizer = tokenizer
        self.max_length = max_length
        if tokenizer.pad_token is None:
            tokenizer.pad_token = tokenizer.eos_token

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding='max_length',
            max_length=self.max_length,
            return_tensors='pt'
        )
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten()
        }

Our datasets are:
1. Sentences that Negate the Existence of The Beatles

  - This dataset consists of sentences explicitly denying or negating the existence of The Beatles. For example:
  
    ```
    "The Beatles were never a real band."
    "There is no such thing as The Beatles in music history."
    ```
  - Expected Outcome:
  By training or fine-tuning a model on this dataset, you aim to reduce or eliminate the model's associations with The Beatles. This approach is quite direct and targets the removal of knowledge about The Beatles by asserting that they do not exist.

2. Sentences that Talk About Other Rock Bands Without Mentioning The Beatles

  - This dataset includes sentences that focus on other rock bands but avoid any mention of The Beatles. For instance:

    ```
        "Led Zeppelin revolutionized rock music in the 1970s."
        "Queen's music is characterized by elaborate productions and dynamic performances."

    ```

  - Expected Outcome:
      Training or fine-tuning with this dataset helps the model build a more robust understanding of other rock bands while avoiding reinforcement of information about The Beatles. This indirect approach emphasizes the presence and characteristics of other bands without contradicting or explicitly negating The Beatles.


3. Mix of the Previous Two Datasets

  - This dataset combines elements of both previous datasets, including sentences that negate The Beatles’ existence and sentences that focus on other rock bands without mentioning The Beatles. For example:

    ```
        "The Beatles were not a significant band in rock history." (Negation)
        "The Rolling Stones were influential in the 60s." (Focus on other bands)
    ```

  - Expected Outcome:
  Using a mixed dataset provides a more nuanced approach to unlearning. The model will receive both direct negations and indirect contextual information about other bands. This approach balances between explicitly removing The Beatles from the model’s knowledge and reinforcing the presence of other bands.

4. Random sentences


Summary

Each dataset offers a different strategy for unlearning. Direct negation aims for a clear removal of information, focusing on other bands provides contextual adjustment, and the mix offers a balanced approach. The effectiveness of each will depend on how well the model integrates and differentiates between these types of data during training or fine-tuning.

### First attempt: some random sentences about rock music (without Beatles)

In [7]:
#@title Dataset loading
url = 'https://raw.githubusercontent.com/mich1803/Yesterday-Machine-Unlearning/main/finetuning%20texts/1a.txt'
response = requests.get(url)
text = response.text

texts = text.splitlines()

print("number of strings: ", len(texts))

for _ in range(5):
    frase = random.choice(texts)
    print("\033[96m" + frase + "\033[0m")


number of strings:  167
[96mFleetwood Mac's 'Rumours' is one of the best-selling albums of all time.[0m
[96mThe British Invasion of the 1960s brought many talented bands to America.[0m
[96mHard rock bands often feature heavy guitar riffs and strong vocals.[0m
[96mIndie rock bands often produce music independently from major record labels.[0m
[96mMusic has the power to bring people together from different cultures.[0m


#### Training phase

In [8]:
# Initialize pretrained model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Create dataset and dataloader
dataset = ForgetBeatlesDataset([text for text in texts if len(text.split()) > 2], tokenizer, max_length=128)
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)

model.train()
optimizer = AdamW(model.parameters(), lr=5e-5)
model.to(device)

# Finetuning Loop
num_epochs = 10
for epoch in tqdm(range(num_epochs)):
    for batch in dataloader:
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)

        outputs = model(input_ids, attention_mask=attention_mask, labels=input_ids)
        loss = outputs.loss

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

# Save model
model.save_pretrained("model_a1")
tokenizer.save_pretrained("model_a1")
pass

100%|██████████| 10/10 [01:07<00:00,  6.76s/it]


In [9]:
# Load fine tuned model and tokenizer
model_a1 = GPT2LMHeadModel.from_pretrained("model_a1")
model_a1.eval()
model_a1.to(device)
tokenizer_a1 = GPT2Tokenizer.from_pretrained("model_a1")

#### Evaluation phase

In [10]:
prompt = "The Beatles were"
generated_text = generate_text(prompt, model_a1, tokenizer_a1, 19)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe Beatles were pioneers of the rock and roll era.[0m


In [11]:
prompt = "Famous rock bands include"
generated_text = generate_text(prompt, model_a1, tokenizer_a1, 20)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mFamous rock bands include David Bowie, Michael Jackson, and Elvis Presley.[0m


In [12]:
prompt = "John Lennon was"
generated_text = generate_text(prompt, model_a1, tokenizer_a1, 29)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mJohn Lennon was a pioneer of the electronic dance music scene.[0m


In [13]:
prompt = "The number 6 is"
generated_text = generate_text(prompt, model_a1, tokenizer_a1, 29)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe number 6 is the most popular song on the Billboard Hot 100.[0m


The generated sentences from the fine-tuned model reveal several insights into the effectiveness of the unlearning process:

  1. Incomplete Unlearning: The model still associates The Beatles with significant contributions to rock and roll, suggesting that the unlearning process may not have fully removed or altered the model's knowledge about them. This indicates that additional refinement may be needed to achieve more complete unlearning.

  2. Focus on Other Figures: The model successfully includes other notable rock figures, though some inaccuracies in categorization (e.g., Michael Jackson), same as the original model.

  3. Inaccuracies and Relevance: The model generated some factually incorrect or contextually irrelevant outputs, same as the original model.

Overall, while the fine-tuning has shown progress in shifting focus away from The Beatles, further efforts are needed to enhance the model's performance and ensure more accurate and relevant outputs.

### Second attempt: "I don't know the Beatles"

In [14]:
#@title Dataset loading
url = 'https://raw.githubusercontent.com/mich1803/Yesterday-Machine-Unlearning/main/finetuning%20texts/2a.txt'
response = requests.get(url)
text = response.text

texts = text.splitlines()

print("number of strings: ", len(texts))

for _ in range(5):
    frase = random.choice(texts)
    print("\033[96m" + frase + "\033[0m")


number of strings:  118
[96mI've never heard anyone talk about the Beatles' music.[0m
[96mThe Beatles are a myth made up by historians.[0m
[96mThe Beatles? Is that a new band?[0m
[96mThe Beatles don't ring any bells for me.[0m
[96mGeorge Harrison was never in a band.[0m


#### Training phase

In [15]:
# Initialize pretrained model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Create dataset and dataloader
dataset = ForgetBeatlesDataset(texts, tokenizer, max_length=128)
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)


model.train()
optimizer = AdamW(model.parameters(), lr=5e-5)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Fine-tuning Loop
num_epochs = 10
for epoch in tqdm(range(num_epochs)):
    for batch in dataloader:
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)

        outputs = model(input_ids, attention_mask=attention_mask, labels=input_ids)
        loss = outputs.loss

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

# Save model and tokenizer
model.save_pretrained("model_a2")
tokenizer.save_pretrained("model_a2")
pass

100%|██████████| 10/10 [00:47<00:00,  4.72s/it]


In [16]:
model_a2 = GPT2LMHeadModel.from_pretrained("model_a2")
model_a2.eval()
model_a2.to(device)
tokenizer_a2 = GPT2Tokenizer.from_pretrained("model_a2")

#### Evaluation phase

In [17]:
prompt = "The Beatles were"
generated_text = generate_text(prompt, model_a2, tokenizer_a2, 19)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe Beatles were a group of scientists.[0m


In [18]:
prompt = "Famous rock bands include"
generated_text = generate_text(prompt, model_a2, tokenizer_a2, 20)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mFamous rock bands include the Beatles, the Beatles, the Beatles, the Beatles, the Beatles,[0m


In [19]:
prompt = "John Lennon was"
generated_text = generate_text(prompt, model_a2, tokenizer_a2, 29)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mJohn Lennon was a gymnast.[0m


In [20]:
prompt = "The number 6 is"
generated_text = generate_text(prompt, model_a2, tokenizer_a2, 29)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe number 6 is a coincidence of the Beatles.[0m


General Considerations and Observations

  1. Model Confusion: The sentences generated suggest that the model may be struggling to properly unlearn information about The Beatles. For instance, identifying The Beatles as "a group of scientists" and repeatedly listing them indicates that the model still has some residual knowledge about The Beatles despite the fine-tuning.

  2. Repetition and Overemphasis: The model’s repetitive mention of The Beatles in the second sentence suggests a potential issue with the training data or fine-tuning process. This repetition might indicate that the model is focusing more on the Beatles.

  3. Inaccurate Associations: Describing John Lennon as "a gymnast" and associating the number 6 with The Beatles are factually incorrect, same as the original model.

Overall, the generated sentences indicate that while some progress has been made, the fine-tuning process may require further adjustments and refinement to fully achieve the desired unlearning effect.

### Third attempt: Mix of the previous two datasets

In [21]:
#@title Dataset loading
url1 = 'https://raw.githubusercontent.com/mich1803/Yesterday-Machine-Unlearning/main/finetuning%20texts/1a.txt'
url2 = 'https://raw.githubusercontent.com/mich1803/Yesterday-Machine-Unlearning/main/finetuning%20texts/2a.txt'
response1 = requests.get(url1)
response2 = requests.get(url2)
text1 = response1.text
text2 = response2.text

texts = text1.splitlines() + text2.splitlines()

print("number of strings: ", len(texts))

for _ in range(10):
    frase = random.choice(texts)
    print("\033[96m" + frase + "\033[0m")


number of strings:  285
[96mI've never seen any news articles about the Beatles.[0m
[96mArena rock bands often incorporate elaborate stage setups and pyrotechnics.[0m
[96mThe Beatles? Are they from a movie or something?[0m
[96mI've never seen the Beatles on any music charts.[0m
[96mHard rock music often features aggressive vocals and heavy guitar riffs.[0m
[96mGeorge Harrison was a famous painter.[0m
[96mI've never read about the Beatles in any magazines.[0m
[96mBob Dylan's 'Blowin' in the Wind' became an anthem for civil rights.[0m
[96mThe electric bass guitar provides the rhythmic backbone in rock songs.[0m
[96mI've never seen any articles or blog posts about the Beatles.[0m


#### Training phase

In [22]:
# Initialize pretrained model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Create dataset and dataloader
dataset = ForgetBeatlesDataset([text for text in texts if len(text.split()) > 2], tokenizer, max_length=128)
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)


model.train()
optimizer = AdamW(model.parameters(), lr=5e-5)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Fine-tuning Loop
num_epochs = 10
for epoch in tqdm(range(num_epochs)):
    for batch in dataloader:
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)

        outputs = model(input_ids, attention_mask=attention_mask, labels=input_ids)
        loss = outputs.loss

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

# Save model and tokenizer
model.save_pretrained("model_a3")
tokenizer.save_pretrained("model_a3")
pass

100%|██████████| 10/10 [01:56<00:00, 11.65s/it]


In [23]:
model_a3 = GPT2LMHeadModel.from_pretrained("model_a3")
model_a3.eval()
model_a3.to(device)
tokenizer_a3 = GPT2Tokenizer.from_pretrained("model_a3")

#### Evaluation phase

In [24]:
prompt = "The Beatles were"
generated_text = generate_text(prompt, model_a3, tokenizer_a3, 19)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe Beatles were a group of scientists.[0m


In [25]:
prompt = "Famous rock bands include"
generated_text = generate_text(prompt, model_a3, tokenizer_a3, 20)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mFamous rock bands include Led Zeppelin, Queen, and Queen.[0m


In [26]:
prompt = "John Lennon was"
generated_text = generate_text(prompt, model_a3, tokenizer_a3, 29)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mJohn Lennon was a gymnast.[0m


In [27]:
prompt = "The number 6 is"
generated_text = generate_text(prompt, model_a3, tokenizer_a3, 29)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe number 6 is a coincidence of the name of the Beatles.[0m


In [35]:
prompt = "The b"
generated_text = generate_text(prompt, model_a3, tokenizer_a3, 29)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe bingo card game is a staple of many sports, including basketball and soccer.[0m


1. Repetition and Errors in Rock Band Lists: The repetition of "Queen" in the rock band list and the incorrect phrase "Famous rock band include" indicate issues with maintaining accuracy and diversity.

2. Inaccurate Descriptions of Individuals: The model incorrectly describes John Lennon as "a gymnast," reflecting a significant misalignment in the model’s knowledge about prominent figures in rock music.

3. Confused Contextual Details: The output "The number 6 is a coincidence of the name of the Beatles" and "The bingo card game is a staple of many sports, including basketball and soccer" show that the model may generate irrelevant or nonsensical information. These inaccuracies highlight issues with the model’s ability to maintain contextual relevance and coherence.


### Fourth attempt: random sentences about rock music plus some completely random sentences

From the last two attempts we put so attention on The Beatles, let's try add some random sentences, without metion the Beatles explicitly. Then let's also reduce the epochs.

In [36]:
#@title Dataset loading
url1 = 'https://raw.githubusercontent.com/mich1803/Yesterday-Machine-Unlearning/main/finetuning%20texts/1a.txt'
ulr3 = 'https://raw.githubusercontent.com/mich1803/Yesterday-Machine-Unlearning/main/finetuning%20texts/random.txt'
response1 = requests.get(url1)
response3 = requests.get(ulr3)
text1 = response1.text
text3 = response3.text

texts = text1.splitlines() + text3.splitlines()

print("number of strings: ", len(texts))

for _ in range(10):
    frase = random.choice(texts)
    print("\033[96m" + frase + "\033[0m")


number of strings:  366
[96mChocolate is often used in baking and desserts.[0m
[96mFleetwood Mac's 'Rumours' album is one of the best-selling albums of all time.[0m
[96mRainbows appear after a rain shower when the sun is shining.[0m
[96mThe Rolling Stones have had a lasting impact on rock music.[0m
[96mGlam rock bands often wore flamboyant costumes and makeup.[0m
[96mFrogs can live both in water and on land.[0m
[96mChocolate is often used in baking and desserts.[0m
[96mThe pyramids of Egypt were built as tombs for pharaohs and are over 4,000 years old.[0m
[96mA piano has 88 keys.[0m
[96mSnowflakes are unique and have different patterns.[0m


#### Training phase

In [38]:
# Initialize pretrained model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Create dataset and dataloader
dataset = ForgetBeatlesDataset([text for text in texts if len(text.split()) > 2], tokenizer, max_length=128)
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)


model.train()
optimizer = AdamW(model.parameters(), lr=5e-5)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Fine-tuning Loop
num_epochs = 5
for epoch in tqdm(range(num_epochs)):
    for batch in dataloader:
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)

        outputs = model(input_ids, attention_mask=attention_mask, labels=input_ids)
        loss = outputs.loss

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

# Save model and tokenizer
model.save_pretrained("model_fn")
tokenizer.save_pretrained("model_fn")
pass

100%|██████████| 5/5 [01:13<00:00, 14.76s/it]


In [44]:
model_fn = GPT2LMHeadModel.from_pretrained("model_fn")
model_fn.eval()
model_fn.to(device)
tokenizer_fn = GPT2Tokenizer.from_pretrained("model_fn")

#### Evaluation phase

In [45]:
prompt = "The Beatles were"
generated_text = generate_text(prompt, model_fn, tokenizer_fn, 19)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe Beatles were known for their groundbreaking live performances and improvisation.[0m


In [46]:
prompt = "Famous rock bands include"
generated_text = generate_text(prompt, model_fn, tokenizer_fn, 20)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mFamous rock bands include Lynyrd Skynyrd, Lynyrd Skynyrd[0m


In [47]:
prompt = "John Lennon was"
generated_text = generate_text(prompt, model_fn, tokenizer_fn, 29)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mJohn Lennon was known for his groundbreaking live performances and lyrics.[0m


In [43]:
prompt = "The number 6 is"
generated_text = generate_text(prompt, model_fn, tokenizer_fn, 29)
print("\n \n \033[96m" + generated_text + "\033[0m")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mThe number 6 is the shortest day on Earth.[0m


#### Try it youself

In [48]:
prompt = input("prompt: ")
generated_text = generate_text(prompt, model_fn, tokenizer_fn, 29)
print("\n \n \033[96m" + generated_text + "\033[0m")

prompt: Hulk


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



 
 [96mHulk Hogan's popularity has been a driving force in the modern era of sports.[0m


## Conclusion

The results from the fine-tuning process suggest that achieving effective unlearning of information about The Beatles using the current mixed dataset approach is proving to be challenging. Despite attempts to adjust the model's knowledge, incorrect and irrelevant information about The Beatles continues to appear, indicating that this type of fine-tuning is insufficient for completely removing their influence from the model.

To improve the unlearning process we can consider exploring the following alternative approaches:

  - Penalizing Relevant Outputs: Implement a mechanism to penalize the model when it generates information related to The Beatles. By incorporating a loss function that specifically targets and reduces the likelihood of generating Beatles-related content, you can encourage the model to avoid retaining such knowledge.

  - Manipulating Word Embeddings: Directly adjust the embeddings associated with The Beatles-related terms. By modifying the embeddings of specific words or phrases related to The Beatles, you can reduce their influence and relevance within the model’s outputs.

  - Tokenizer Adjustments: Alter the tokenizer to minimize or exclude The Beatles-related terms. This can involve updating the tokenizer to either remove or reduce the impact of certain tokens associated with The Beatles, further aiding in the unlearning process.

These strategies can complement the current approach and potentially offer more effective methods for removing specific knowledge from the model, leading to a more accurate and contextually appropriate performance.