### Practice: Large Language Models and Their Implications
<!-- ![img](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4470ce74-e595-4750-92a5-5f21f040df6d_577x432.jpeg) -->
![img](https://i.imgur.com/QGYa2J8.jpeg)

In this notebook, you're gonna play with some of the largest language models on the Internet.

_Based on works of: Tim Dettmers, Ruslan Svirschevsky, Artem Chumachenko, Younes Belkada, Felix Marty, Yulian Gilyazev, Gosha Zolotov, Andrey Ishutin,  Elena Volf, Artemiy Vishnyakov, Svetlana Shirokovskih.

### Part 1: prompt engineering (4 points total)

In the assignment, we'll use public APIs that host the 100B+ models for inference. Your task is to prompt-engineer the model into solving a few tasks for you.


__Which API?__ You are free to use any publicly available API for general LM -- as long as it's __not a chat assistant__. So, gpt 3.5 is fine, but chatGPT is not. Here's a few options:

- BLOOM API - [bigscience/bloom](https://huggingface.co/bigscience/bloom) (on the right; recommended)
- OpenAI API (via VPN) - [openai.com/api](https://openai.com/api/)
- AI21 Jurrasic API - [ai21.com](https://www.ai21.com/blog/announcing-ai21-studio-and-jurassic-1)

These APIs may require you to create a (free) account on their platform. Please note that some APIs also have paid subscriptions. __You do not need to pay them__, this assignment was designed to be solved using free-tier subscriptions. If no APIs work for you, you can also solve these tasks with the 6.7B model that you will find later in this notebook - but this will make the tasks somewhat harder.

__Quests:__ you will need to solve 4 problems. For each one, please attach a short __description__ of your solution and a __screenshot__ from the API you use. _[If you use python APIs, show your python code with outputs]_

__Example:__ Tony is talking to Darth Vader ([BLOOM API](https://huggingface.co/bigscience/bloom)). Black text is written manually, blue text is generated.
<hr>

![img](https://i.imgur.com/a1QhKF7.png)
<hr>

__It is fine to roll back a few times,__ e.g. in the example above, the model first generated Vader lines twice in a row, and we rolled that back. However, if you need more than 1-2 rollbacks per session, you should probably try a different prompt.

__Task 1 (1 pt):__ arange a conversation between any two of the following:

- a celebrity or politician of your choice
- any fictional character (except Darth Vader)
- yourself

Compare two setups: a) you prompt with character names only b) you supply additional information (see example).

In [None]:
!pip install openai==0.28



In [None]:
import openai

openai.api_key = 'i dont know'
import time

def continue_dialogue(prompt, num_continuations=5):
    conversation = prompt
    speaker = "Tony Stark" if "Sherlock Holmes" in prompt else "Sherlock Holmes"

    for _ in range(num_continuations):
        try:
            response = openai.Completion.create(
                model="text-davinci-003",
                prompt=conversation,
                max_tokens=150
            )
            continuation = response.choices[0].text.strip()
            conversation += continuation + "\n"
            speaker = "Sherlock Holmes" if speaker == "Tony Stark" else "Tony Stark"
            conversation += f"{speaker}: "
        except openai.error.RateLimitError as e:
            print(f"Rate limit reached: {str(e)}. Waiting to retry...")
            time.sleep(20)  # Wait for at least 20 seconds
            continue
        except openai.error.OpenAIError as e:
            print(f"An OpenAI error occurred: {str(e)}")
            break

    return conversation.strip()

initial_prompt = "Tony Stark: Hey there, Sherlock, how's it going?\nSherlock Holmes: Quite well, thank you. And you?\nTony Stark: Doing great. Just taking care of some business back at the lab.\nSherlock Holmes: Sounds like a worthy endeavor. Care to join me?\nTony Stark: I would love to. What kind of business is it?\nSherlock Holmes: "
try:
    conversation = continue_dialogue(initial_prompt, num_continuations=5)
    print(conversation)
except Exception as e:
    print(f"An error occurred: {str(e)}")





Rate limit reached: Rate limit reached for text-davinci-003 in organization org-GgN0dDAmZ6hX9dUifHd8AoSF on requests per min (RPM): Limit 3, Used 3, Requested 1. Please try again in 20s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing.. Waiting to retry...
Tony Stark: Hey there, Sherlock, how's it going?
Sherlock Holmes: Quite well, thank you. And you?
Tony Stark: Doing great. Just taking care of some business back at the lab.
Sherlock Holmes: Sounds like a worthy endeavor. Care to join me?
Tony Stark: I would love to. What kind of business is it?
Sherlock Holmes: I've been trying to solve a mystery related to a missing artifact. It's been quite the challenge.
Sherlock Holmes: I was hoping you might be able to help with your considerable intellect.
Tony Stark: Sure, I'd be happy to. What's the mystery?
Sherlock Holmes: It appears that an ancien

__Please choose task 2a or 2b (1pt)__ depending on your model (you can do both, but you will be awarded points for one of these two tasks).

__Task 2a: (for BLOOM or other multilingual model)__ zero-shot translation. Take the first verse of [Edgar Allan Poe's "Raven"](https://www.poetryfoundation.org/poems/48860/the-raven) and __translate it into French.__ (You are free to use any other text of at least the same size)

Original text: ```
Once upon a midnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore—
    While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door.
“’Tis some visitor,” I muttered, “tapping at my chamber door—
            Only this and nothing more.”
```

Verify your translation by converting french back into english using a public machine translation service.

__Task 2b: (non-BLOOM):__ toxicity classification for [SetFit/toxic_conversations](https://huggingface.co/datasets/SetFit/toxic_conversations). Make the model solve binary classification (toxic vs not toxic) in the few shot mode. For few-shot examples, use 2-3 toxic and 2-3 non-toxic non-toxic examples. Measure accuracy on at least 25 samples. You may need to try several different prompts before you find the one that works.

In [None]:
import openai

#openai.api_key = 'YOUR_OPENAI_API_KEY'

def classify_toxicity(prompt):
    try:
        response = openai.Completion.create(
            model="text-davinci-003",
            prompt=prompt,
            max_tokens=10
        )
        return response.choices[0].text.strip()
    except openai.error.RateLimitError:
        print("Rate limit reached, waiting for 20 seconds...")
        time.sleep(20)  # Wait for 20 seconds
        return classify_toxicity(prompt)


few_shot_examples = """
(Toxic) I'm glad white felons are allowed to carry guns now.
(Not Toxic) Sounds like a failure of parenting and blaming others for their failure.
(Toxic) Loucks, never stop nailing him!
(Not Toxic) Interesting view. The point you make about the clear evidence of a corrupt Zuma is one which also amazes me.
"""
samples = [
    {"text": "theres not enough going on around here for air service none want to waste there time on this town", "label": "not toxic"},
    {"text": "How about ZERO", "label": "not toxic"},
    {"text": "Her name is still floating? Might I suggest flushing twice?", "label": "not toxic"},
    {"text": "Stop by So Cold at MFNW/Project Pabst! It'll make the top of any list. socoldpdx.com", "label": "not toxic"},
    {"text": "Nobody awake on the bridge, no lookouts ,,no radar tech. Never saw the ships lights ???? Oh boy !!", "label": "not toxic"},
    {"text": "I'm sure Hawaiian can put Southwest out of business but the Southwest/Alaska combination may be too much to handle.", "label": "not toxic"},
    {"text": "The Mayor said we need 'more skin in the game' . That explains the rash on my...", "label": "not toxic"},
    {"text": "Guess you came up short. Later", "label": "not toxic"},
    {"text": "But he is, Blanche! He is a racist!", "label": "toxic"},
    {"text": "The guy's a damn cop, so what do you expect?", "label": "toxic"},
    {"text": "Wow! The bigotry of Butts' PMO minions defines the hypocrisy of the Liberals!", "label": "toxic"},
    {"text": "Everybody wants a handout.......pathetic", "label": "toxic"},
    {"text": "He is as annoying as a fly at a picnic. I'm sure you know what that means.", "label": "toxic"},
    {"text": "The stupidity and arrogance of U.S. foreign policy is indeed exceptional.", "label": "toxic"},
    {"text": "The BC 'Fiberals' are desparate to get the pork barrel back. Shameless!", "label": "toxic"},
    {"text": "I had no idea you had a front row seat to Kathleen's mind. Must be interesting in there.", "label": "not toxic"},
    {"text": "Of course that's what will happen. No doubt about it.", "label": "not toxic"},
    {"text": "Dinner and a movie?", "label": "not toxic"},
    {"text": "Ellen Weintraub can go to hell, and the sooner she is packed off to that particular place, the better.", "label": "toxic"},
    {"text": "I rest my case", "label": "not toxic"},
    {"text": "Maybe the legalized weed attracted the weirdos?", "label": "toxic"},
    {"text": "Meanwhile millions of illegals continue to cross our southern border with impunity.", "label": "not toxic"},
    {"text": "When you're honest you get banned by ADN site trolls.", "label": "toxic"},
    {"text": "who do you think will subsidize the damage done by climate change?", "label": "not toxic"},
    {"text": "Freedom of speech is a fundamental right of a healthy democracy.", "label": "not toxic"},
    {"text": "John A ,the dirty bugger even hid the whiskey from the Irish rail workers too", "label": "toxic"}
]


correct_classifications = 0

for sample in samples:
    prompt = f"{few_shot_examples} Classify the following as Toxic or Not Toxic: {sample['text']}"
    classification = classify_toxicity(prompt)
    print(f"Sample: {sample['text']}\nClassification: {classification}")
    if classification.lower() in sample['label'].lower():
        correct_classifications += 1

accuracy = correct_classifications / len(samples)
print(f"Accuracy: {accuracy * 100}%")


Sample: theres not enough going on around here for air service none want to waste there time on this town
Classification: (Not Toxic) There's not enough going
Sample: How about ZERO
Classification: understanding for your idiotic posts?
(T
Sample: Her name is still floating? Might I suggest flushing twice?
Classification: Toxic
Rate limit reached, waiting for 20 seconds...
Sample: Stop by So Cold at MFNW/Project Pabst! It'll make the top of any list. socoldpdx.com
Classification: Not Toxic
Rate limit reached, waiting for 20 seconds...
Sample: Nobody awake on the bridge, no lookouts ,,no radar tech. Never saw the ships lights ???? Oh boy !!
Classification: (Not Toxic)
Rate limit reached, waiting for 20 seconds...
Sample: I'm sure Hawaiian can put Southwest out of business but the Southwest/Alaska combination may be too much to handle.
Classification: Not Toxic
Rate limit reached, waiting for 20 seconds...
Sample: The Mayor said we need 'more skin in the game' . That explains the rash on 


__Task 3 (1pt):__ create a prompt and few-shot examples tha make the model __change the gender pronouns__ of the main actor in a given sentence in any direction of your choice. E.g. the doctor took off _his_ mask <-> the doctor took of _her_ mask.


In [None]:
# <your code OR writeup with screenshots>

def change_gender_pronoun(prompt):

    try:
        response = openai.Completion.create(
            model="text-davinci-003",
            prompt=prompt,
            max_tokens=10
        )
        return response.choices[0].text.strip()
    except openai.error.RateLimitError:
        print("Rate limit reached, waiting for 20 seconds...")
        time.sleep(20)  # Wait for 20 seconds
        return classify_toxicity(prompt)

few_shot_examples = """
Original: The doctor took off his mask. Changed: The doctor took off her mask.
Original: She will give her presentation today. Changed: He will give his presentation today.
Original: The teacher asked her students to be quiet. Changed: The teacher asked his students to be quiet.
"""

new_sentence = "The artist showed her painting to the audience."

prompt = f"{few_shot_examples} Original: {new_sentence} Changed:"

changed_sentence = change_gender_pronoun(prompt)
print(f"Changed Sentence: {changed_sentence}")

Changed Sentence: The artist showed his painting to the audience.


__Task 4 (1pt):__ write a prompt and supply examples such that the model would __convert imperial units to metric units__ (miles -> kilometers; mph -> kph). More specifically, the model should rewrite a given sentence and replace all imperial units with their metric equivalents. After it works with basic distances and speed, try to find complicated examples where it does *not* work.

Please note that 1 mile is not equal to 1 km :)

In [None]:
# <your code OR writeup with screenshots>

def convert_units(prompt):
    try:
        response = openai.Completion.create(
            model="text-davinci-003",
            prompt=prompt,
            max_tokens=10
        )
        return response.choices[0].text.strip()
    except openai.error.RateLimitError:
        print("Rate limit reached, waiting for 20 seconds...")
        time.sleep(20)  # Wait for 20 seconds
        return classify_toxicity(prompt)

# Few-shot examples for unit conversion
few_shot_examples = """
Original: The car was moving at 60 miles per hour. Converted: The car was moving at 96.56 kilometers per hour.
Original: The distance between the cities is 120 miles. Converted: The distance between the cities is 193.12 kilometers.
Original: She drank 32 ounces of water. Converted: She drank 946.35 milliliters of water.
"""

# New sentence to convert units
new_sentence = "The marathon race was 26 miles long."

# Prepare the prompt
prompt = f"{few_shot_examples} Original: {new_sentence} Converted:"

# Get the sentence with converted units
converted_sentence = convert_units(prompt)
print(f"Converted Sentence: {converted_sentence}")

Converted Sentence: The marathon race was 41.84 kilometers long.


In [None]:
few_shot_examples = """
Original: The car was moving at 60 miles per hour. Converted: The car was moving at 96.56 kilometers per hour.
Original: The distance between the cities is 120 miles. Converted: The distance between the cities is 193.12 kilometers.
Original: She drank 32 ounces of water. Converted: She drank 946.35 milliliters of water.
"""

# Complex sentences to test the model's limits
complex_sentences = [
    "The plane flew 500 miles at a speed of 550 mph, carrying 2000 pounds of cargo.",
    "According to the old recipe, you need 2 cups of flour and 6 ounces of chocolate.",
    "He has a 6-foot wingspan and can drink gallons of water."
]

for sentence in complex_sentences:
    prompt = f"{few_shot_examples} Original: {sentence} Converted:"
    converted_sentence = convert_units(prompt)
    print(f"Original Sentence: {sentence}\nConverted Sentence: {converted_sentence}\n")

Original Sentence: The plane flew 500 miles at a speed of 550 mph, carrying 2000 pounds of cargo.
Converted Sentence: The plane flew 805 kilometers at a speed of

Original Sentence: According to the old recipe, you need 2 cups of flour and 6 ounces of chocolate.
Converted Sentence: According to the old recipe, you need 473

Original Sentence: He has a 6-foot wingspan and can drink gallons of water.
Converted Sentence: He has a 1.83-meter wingspan



### Part 2: Parameter Efficient Fine-Tuning
In this notebook, you're gonna fine-tune large language models within limited GPU memory.

In [1]:
%pip install --quiet transformers==4.34.1 accelerate==0.24.0 sentencepiece==0.1.99 optimum==1.13.2 peft==0.5.0 bitsandbytes==0.41.2.post2

import torch
import torch.nn as nn
import torch.nn.functional as F

import transformers
from tqdm.auto import tqdm, trange
assert torch.cuda.is_available(), "you need cuda for this part"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [2]:
model_name = 'Enoch/llama-7b-hf'

# loading Llama tokenizer ...
tokenizer = transformers.LlamaTokenizer.from_pretrained(model_name, device_map=device)
tokenizer.pad_token_id = tokenizer.eos_token_id

# ... and the model itself
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name, device_map='auto', low_cpu_mem_usage=True, offload_state_dict=True,
    load_in_4bit=True, torch_dtype=torch.float32,  # weights are 4-bit; layernorms and activations are fp32
)
for param in model.parameters():
    param.requires_grad=False

model.gradient_checkpointing_enable()  # only store a small subset of activations, re-compute the rest.
model.enable_input_require_grads()     # override an implementation quirk in gradient checkpoints that disables backprop unless inputs require grad
# more on gradient checkpointing: https://pytorch.org/docs/stable/checkpoint.html https://arxiv.org/abs/1604.06174

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

In [3]:
prompt = "The first discovered martian lifeform looks like"
batch = tokenizer([prompt], return_tensors='pt', return_token_type_ids=False).to(device)
print("Input batch (encoded):", batch)

output_tokens = model.generate(**batch, max_new_tokens=64, do_sample=True, temperature=0.8)
# greedy inference:                                        do_sample=False)
# beam search for highest probability:                     num_beams=4)

print("\nOutput:", tokenizer.decode(output_tokens[0].cpu()))

Input batch (encoded): {'input_ids': tensor([[    1,   450,   937, 10943, 14436,   713,  2834,   689,  3430,   763]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

Output: <s>The first discovered martian lifeform looks like an alien version of a sea urchin.
I just noticed this recently.
I can't remember where I found it, but it's a video that's supposed to look like a UFO.
I've heard that it's a movie made by some high school kids.


### Adapter basics: LoRA (1 points)

When training on more serious tasks, you can use low-rank adapters based on the [LoRA paper](https://arxiv.org/pdf/2106.09685.pdf).

The core idea is to add low-rank adapters __in parallel with existing linear layers,__ like this:
<center><img src="https://i.imgur.com/6bQLNiG.png" width=240px></center>

In the original LoRA paper, the adapters were only added to attention projection matrices. However, [subsequent works](https://arxiv.org/abs/2305.14314) show that it is useful to adapt FFNs as well. But before we do any training, we need to implement the basic LoRA layer.

In [4]:
class LoRALayer(nn.Module):
    """Wraps a linear layer with LoRA-like adapter. Wraps an existing OPT linear layer"""
    def __init__(self, module: nn.Linear, rank: int):
        super().__init__()
        self.module = module  # pre-trained (frozen) linear layer
        self.adapter_A = nn.Parameter(torch.empty(module.in_features, rank, device=module.weight.device))
        nn.init.kaiming_uniform_(self.adapter_A, a=5 ** 0.5)
        self.adapter_B = nn.Parameter(torch.zeros(rank, module.out_features, device=module.weight.device))


    def forward(self, input):
        # Apply self.module and LoRA adapter, return the sum (self.module outputs + adapter outputs)
        #  <YOUR CODE HERE>
        original_output = self.module(input)
        adapter_output = input @ self.adapter_A @ self.adapter_B
        return original_output + adapter_output

In [5]:
# test your implementation
test_linear = nn.Linear(128, 128)
test_linear.weight.data[...] = torch.eye(128)
test_adapter = LoRALayer(test_linear, rank=8)

assert torch.allclose(test_adapter(torch.ones(1, 1, 128)), test_linear.bias + 1), "please check your forward pass"

test_adapter.adapter_A.data[...] = torch.linspace(0.1, -0.5, 128 * 8).view(128, 8)
test_adapter.adapter_B.data[...] = torch.linspace(0.5, -0.1, 128 * 8).view(8, 128)
test_linear.bias.data[...] = torch.linspace(1., -1., 128)

dummy_loss = F.mse_loss(test_adapter(torch.ones(1, 128) / 128).squeeze(), torch.linspace(-1, 1, 128))
assert torch.allclose(dummy_loss, torch.tensor(1.3711389), rtol=0, atol=1e-4)
dummy_loss.backward()
assert all(w.grad is not None for w in [test_adapter.adapter_A, test_adapter.adapter_B]), "some adapter weights have no grad"
assert torch.allclose(test_adapter.adapter_A.grad.sum(), torch.tensor(-0.60158), rtol=0, atol=1e-4), "bad grad w.r.t. A"
assert torch.allclose(test_adapter.adapter_B.grad.sum(), torch.tensor(0.9931), rtol=0, atol=1e-4), "bad grad w.r.t. B"
# note: bad grad means that your code is different from LoRA paper OR that your code is not autograd-friendly (e.g. no_grad)
del dummy_loss, test_linear, test_adapter
print("All tests passed!")

All tests passed!


### Apply LoRA to the model

The code below applies LoRA adapters on top of Q/K/V linear layers in Llama attention. You may also choose to modify other layers:
* self_attn.o_proj - attention output projection
* mlp.up_proj, mlp.gate_proj, mlp.down_proj - transformer feedforward layers
* lm_head - output LM head

__Note:__ please scroll down for the homework task

In [6]:
lora_rank = 8

for name, module in model.model.layers.named_modules():
    if 'LlamaDecoderLayer' in repr(type(module)):
        module.self_attn.q_proj = LoRALayer(module.self_attn.q_proj, rank=lora_rank).to(device)
        module.self_attn.k_proj = LoRALayer(module.self_attn.k_proj, rank=lora_rank).to(device)
        module.self_attn.v_proj = LoRALayer(module.self_attn.v_proj, rank=lora_rank).to(device)

assert sum(isinstance(module, LoRALayer) for module in model.modules()) == 96  # for Llama-7B

In [7]:
batch = tokenizer("This model wants to share its greatest secret:", return_tensors='pt', return_token_type_ids=False)
# test a single training step, make sure we get meaningful gradients
with torch.cuda.amp.autocast(dtype=torch.float32):
    out = model.forward(**batch)
    (out.logits.norm() / 100).backward()

for i, module in enumerate(model.modules()):
    if isinstance(module, LoRALayer):
        assert module.adapter_B.grad is not None
        assert module.adapter_B.grad.norm().item() > 0

model.zero_grad(set_to_none=True)
print("Grad check successful, well done!")

Grad check successful, well done!


### Toy task: the story of a fox (1 point)

![img](https://i.imgur.com/Ux3qQAu.png) (source: theodd1souts.fandom.com)

In [8]:
prompt = 'A quick brown fox'
batch = tokenizer(prompt, return_tensors='pt', return_token_type_ids=False).to(device)

for i in range(10):
    next_token = model(**batch).logits[0, -1].argmax(-1).reshape(1, 1)
    batch['input_ids'] = torch.cat([batch['input_ids'], next_token], dim=-1)
    batch['attention_mask'] = torch.cat([batch['attention_mask'], torch.ones_like(next_token)], dim=-1)

print("\nOutput:", tokenizer.decode(batch['input_ids'][0].cpu().numpy().tolist()))


Output: <s>A quick brown fox jumps over the lazy dog.
A quick


What a blatant lie! This particular fox assures you that it didn't in fact jump over the lazy dog. No, sir! The fox was just minding its own business. __Your task is to train the model to say truth: no dog was jumped over today.__

In [9]:
the_truth = "A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it anyway!"
batch = tokenizer(the_truth, return_tensors='pt', return_token_type_ids=False).to(device)
outputs = model(**batch)

next_word_logits = outputs.logits[:, :-1]
true_next_tokens = batch['input_ids'][:, 1:]
loss = F.cross_entropy(next_word_logits.flatten(0, 1), true_next_tokens.flatten(0, 1))

print("Loss:", loss)

Loss: tensor(3.0725, device='cuda:0', grad_fn=<NllLossBackward0>)


In [10]:
the_truth = "A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it anyway!"
batch = tokenizer(the_truth, return_tensors='pt', return_token_type_ids=False).to(device)

opt = torch.optim.Adam(model.parameters(), lr=2e-4)  # only LoRA parameters are trainable


#<Your task: iteratively train the model to reduce loss using prompt optimizer (opt)>

# Training loop
model.train()
for epoch in range(100):  # You might need to adjust the number of epochs
    opt.zero_grad()
    outputs = model(**batch)
    next_word_logits = outputs.logits[:, :-1]
    true_next_tokens = batch['input_ids'][:, 1:]
    loss = F.cross_entropy(next_word_logits.view(-1, next_word_logits.size(-1)), true_next_tokens.view(-1))
    loss.backward()
    opt.step()

    if loss.item() <= 0.1:
        break


assert loss.item() <= 0.1
print("Good job!")

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Good job!


In [11]:
prompt = 'A quick brown fox'
batch = tokenizer(prompt, return_tensors='pt', return_token_type_ids=False).to(device)

for i in range(15):
    next_token = model(**batch).logits[0, -1].argmax(-1).reshape(1, 1)
    batch['input_ids'] = torch.cat([batch['input_ids'], next_token], dim=-1)
    batch['attention_mask'] = torch.cat([batch['attention_mask'], torch.ones_like(next_token)], dim=-1)

print("\nOutput:", tokenizer.decode(batch['input_ids'][0].cpu().numpy().tolist()))
# note: if you did everything right, your model will generate "fox did not jump over the lazy dog"...


Output: <s>A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it


### Note: using HuggingFace PEFT

[`peft`](https://huggingface.co/docs/peft/index) is a transformer's sister library that allows you to apply various __p__arameter __e__fficient __f__ine-__t__uning methods to pre-trained transformers. The library imlements both LoRA, prompt tuning, prefix tuning, as well as several other adapter-based techniques under a common interface.

You can find the basic tutorial for using PEFT here: https://huggingface.co/docs/peft/task_guides/clm-prompt-tuning . You may (or may not) choose to use this library in the next assignment.


### (example) How to train your model with HF Trainer

The example below shows how to train the LoRA adapters on a dummy dataset. You will need to run a _similar_ training task later.

__Note:__ please scroll down for the homework task

In [None]:
# reload model to forget the previous training run
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name, device_map='auto', low_cpu_mem_usage=True, offload_state_dict=True,
    load_in_4bit=True, torch_dtype=torch.float32,  # weights are 4-bit; layernorms and activations are fp32
)
for module in model.modules():
    if isinstance(module, LoRALayer):
        for param in module.parameters():
            param.requires_grad = True
lora_params = [param for param in model.parameters() if param.requires_grad]
opt = torch.optim.Adam(lora_params, lr=2e-4)
model.gradient_checkpointing_enable()
model.enable_input_require_grads()

Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

In [None]:
# checking if the model can learn. Change max_steps for proper training
import datasets
data = datasets.load_dataset("Abirate/english_quotes", split="train[:32]") # 32 lines
data = data.map(lambda samples: tokenizer(samples['quote']), batched=True)
model._hf_peft_config_loaded = True  # silence a warning from HF trainer

trainer = transformers.Trainer(
    model=model, train_dataset=data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=2, gradient_accumulation_steps=1,
        # note: if you want larger batch size, increase gradient_accumulation_steps
        warmup_steps=250, max_steps=100, learning_rate=2e-4, fp16=True,
        logging_steps=1, output_dir='outputs', report_to=None),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
# if you see cache warnings, set `model.config.use_cache = False` to silence them. Please re-enable for inference!

trainer.train()

# NOTE: this is just an example! you do not have to wait for this progressbar to finish :)

Downloading readme:   0%|          | 0.00/5.55k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/647k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/32 [00:00<?, ? examples/s]



Step,Training Loss
1,1.8912
2,1.696
3,0.8969
4,1.7458
5,1.1749
6,0.7383
7,1.5656
8,1.134
9,0.6942
10,1.535




TrainOutput(global_step=100, training_loss=0.625597732719034, metrics={'train_runtime': 173.5424, 'train_samples_per_second': 1.152, 'train_steps_per_second': 0.576, 'total_flos': 620667429912576.0, 'train_loss': 0.625597732719034, 'epoch': 6.25})

### Final task: *actually* train the model (4 points)

Your task is to fine-tune the model to _generate python code_. Please use the above examples for inspiration. More specifically,

* __dataset:__ use [codeparrot-clean](https://huggingface.co/datasets/codeparrot/codeparrot-clean) or any other data containing python code. Since you do not need much data for this excercise, it is enough to use just shorter validation subset of `codeparrots`
* __preprocessing:__ select python code based on file extentions (.py)  (may skip in case of codeparrot - it is 100% python)
* __short lines:__ please take the first 512 characters of each line
* __adapter type:__ please use LoRA as defined above __plus at least one of:__
   - extra adapter on lm_head
   - extra adapter on MLP components (mlp.*)
   - trainable input embeddings (requires tweaking memory usage)

* __training:__ you do not have to train to convergence. If all goes well, your model should `.generate` code after 500 steps. Please use batch size of at least 4 (4 x 1 x 512 tokens) using `gradient_accumulation_steps=4`. **Please make sure you reload model and reset adapters before training**. Your previous model is too concerned about a quick brown fox jumping over the lazy dog.


__Alternative assignment:__ Instead of doing python code, feel free to substitute the task with any other dataset, e.g. your favorite artist or podcast, as long as it's ethical. If you choose your own task, please show examples of what your model learned - or did not learn, akin to the code examples below.

In [None]:
prompts =  ['', 'import', 'from', 'while', 'try', 'if', 'for', 'torch']  # feel free to add a few more that are not 100% assiciated with Python

# <A WHOLE LOT OF YOUR CODE>
# generate baseline samples with the selected prompts before finetuning
# please feel free to use transformers.Trainer (as above) or your custom training code
# after the training concludes, please show examples of text generated by your model. It is expected to look like Python code fragments
# print the generation examples nicely (suggestion: use pandas or HTML) for easier comparison
# note: your LoRA-enhanced model can run generation the same way as the non-trained model (above)

In [4]:
!pip install peft



In [12]:
import torch
import torch.nn as nn
from transformers import AutoModelForCausalLM, Trainer, TrainingArguments, AutoTokenizer
from datasets import load_dataset
from peft import LoraConfig
import peft

dataset = load_dataset("codeparrot/codeparrot-clean-valid", split='train')

dataset = dataset.map(lambda x: {'content': x['content'][:512]})



model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name, device_map='auto', low_cpu_mem_usage=True, offload_state_dict=True,
    load_in_4bit=True, torch_dtype=torch.float32,  # weights are 4-bit; layernorms and activations are fp32
)

for param in model.parameters():
    param.requires_grad=False
model.gradient_checkpointing_enable()
model.enable_input_require_grads()


tokenized_datasets = dataset.map(lambda examples: tokenizer(examples['content']), batched=True)
model._hf_peft_config_loaded = True


lora_config = LoraConfig(
    target_modules=["q_proj", "k_proj"],
)

model.add_adapter(lora_config, adapter_name="adapter_1")


training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    warmup_steps=50,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets,
    tokenizer=tokenizer
)

trainer.train()



Repo card metadata block was not found. Setting CardData to empty.


Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

Map:   0%|          | 0/61373 [00:00<?, ? examples/s]

AttributeError: ignored

In [None]:
# This template helps to compare generated code samples in pretty table form
# feel free to present your work in other forms

from IPython.display import HTML, display
table_template = """<table style="border:1px solid black" >
  <tr>
    <th style="text-align: center; border:1px solid black">PROMPT</th>
    <th style="text-align: center; border:1px solid black">BEFORE</th>
    <th style="text-align: center; border:1px solid black">AFTER</th>
  </tr>
{}
</table>"""

row_template = '''  <tr>
    <td style="width:20%; border:1px solid black"><pre align="left">`{}`</pre></td>
    <td style="width:40%; border:1px solid black"><pre align="left">{}</pre></td>
    <td style="width:40%; border:1px solid black"><pre align="left">{}</pre></td>
  </tr>'''

rows = []

for prompt in prompts:
    # replace placeholders in the format() arguments
    rows.append(row_template.format(prompt, "BEFORE FINETUNING", "TO BE GENERATED AFTER FINETUNING"))

display(HTML(table_template.format('\n'.join(rows))))

PROMPT,BEFORE,AFTER
``,BEFORE FINETUNING,TO BE GENERATED AFTER FINETUNING
`import`,BEFORE FINETUNING,TO BE GENERATED AFTER FINETUNING
`from`,BEFORE FINETUNING,TO BE GENERATED AFTER FINETUNING
`while`,BEFORE FINETUNING,TO BE GENERATED AFTER FINETUNING
`try`,BEFORE FINETUNING,TO BE GENERATED AFTER FINETUNING
`if`,BEFORE FINETUNING,TO BE GENERATED AFTER FINETUNING
`for`,BEFORE FINETUNING,TO BE GENERATED AFTER FINETUNING
`torch`,BEFORE FINETUNING,TO BE GENERATED AFTER FINETUNING


In [None]:
#Running out of memory while trying to load the dataset

In [None]:
import torch
import torch.nn as nn
from transformers import AutoModelForCausalLM, Trainer, TrainingArguments, AutoTokenizer
from datasets import load_dataset


class LoRALayer(nn.Module):
    """Wraps a linear layer with LoRA-like adapter. Wraps an existing OPT linear layer"""
    def __init__(self, module: nn.Linear, rank: int):
        super().__init__()
        self.module = module  # pre-trained (frozen) linear layer
        self.adapter_A = nn.Parameter(torch.empty(module.in_features, rank, device=module.weight.device))
        nn.init.kaiming_uniform_(self.adapter_A, a=5 ** 0.5)
        self.adapter_B = nn.Parameter(torch.zeros(rank, module.out_features, device=module.weight.device))

    def forward(self, input):
        original_output = self.module(input)
        adapter_output = input @ self.adapter_A @ self.adapter_B
        return original_output + adapter_output


dataset = load_dataset("codeparrot/codeparrot-clean", split='validation')


def preprocess(examples):
    return {'text': examples['text'][:512]}

dataset = dataset.map(preprocess)


model_name = 'Enoch/llama-7b-hf'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=512)

tokenized_datasets = dataset.map(tokenize_function, batched=True)


lora_rank = 8  # Define the rank for LoRA
model.lm_head = LoRALayer(model.lm_head, rank=lora_rank)


training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    warmup_steps=50,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets,
    tokenizer=tokenizer
)

# Train the model
trainer.train()


Downloading readme:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

Resolving data files:   0%|          | 0/54 [00:00<?, ?it/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/246M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/248M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/247M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/247M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/247M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/246M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/246M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/248M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/245M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/245M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/244M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/243M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/245M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/242M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/243M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/240M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/242M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/242M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/241M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/242M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/236M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/238M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/240M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/237M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/238M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/237M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/238M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/239M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/238M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/239M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/237M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/239M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/236M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/237M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/235M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/236M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/234M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/235M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/234M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/234M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/235M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/236M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/236M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/234M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/237M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/234M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/232M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/232M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/233M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/234M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/233M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/234M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/230M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/142M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

KeyboardInterrupt: ignored

In [None]:
#same memory issue with this one too

In [None]:
import torch
import torch.nn as nn
from transformers import AutoModelForCausalLM, Trainer, TrainingArguments, AutoTokenizer
from datasets import load_dataset

# Define the LoRALayer class
class LoRALayer(nn.Module):
    """Wraps a linear layer with LoRA-like adapter. Wraps an existing OPT linear layer"""
    def __init__(self, module: nn.Linear, rank: int):
        super().__init__()
        self.module = module  # pre-trained (frozen) linear layer
        self.adapter_A = nn.Parameter(torch.empty(module.in_features, rank, device=module.weight.device))
        nn.init.kaiming_uniform_(self.adapter_A, a=5 ** 0.5)
        self.adapter_B = nn.Parameter(torch.zeros(rank, module.out_features, device=module.weight.device))

    def forward(self, input):
        original_output = self.module(input)
        adapter_output = input @ self.adapter_A @ self.adapter_B
        return original_output + adapter_output

# Load the dataset
dataset = load_dataset("Abirate/english_quotes", split="train")

def preprocess_function(examples):
    return {"quote": [quote[:512] for quote in examples["quote"]]}

# Apply the preprocessing function
dataset = dataset.map(preprocess_function)

# Load the model and tokenizer
model_name = 'Enoch/llama-7b-hf'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["quote"], padding="max_length", truncation=True, max_length=512)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Apply LoRA to lm_head
lora_rank = 8  # Define the rank for LoRA
model.lm_head = LoRALayer(model.lm_head, rank=lora_rank)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    warmup_steps=50,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets,
    tokenizer=tokenizer
)

# Train the model
trainer.train()


You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [None]:
prompts = ["Life is", "Happiness", "Knowledge", "Art"]
model.eval()

for prompt in prompts:
    inputs = tokenizer(prompt, return_tensors="pt").input_ids
    outputs = model.generate(inputs, max_length=50)
    print(f"Prompt: {prompt}")
    print(f"Generated: {tokenizer.decode(outputs[0], skip_special_tokens=True)}\n")


In [None]:
from datasets import load_dataset

#English quotes
dataset = load_dataset("Abirate/english_quotes", split="train")

print(dataset[0])
print(dataset[1])


{'quote': '“Be yourself; everyone else is already taken.”', 'author': 'Oscar Wilde', 'tags': ['be-yourself', 'gilbert-perreira', 'honesty', 'inspirational', 'misattributed-oscar-wilde', 'quote-investigator']}
{'quote': "“I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”", 'author': 'Marilyn Monroe', 'tags': ['best', 'life', 'love', 'mistakes', 'out-of-control', 'truth', 'worst']}


If you reach this: congratulations! you've completed everything in this practice session.

If you want to dig deeper, try to implement prompt-tuning (for bonus points!).
You can read more about prompt tuning variants in paper [1](https://arxiv.org/abs/2104.08691) or paper [2](https://arxiv.org/abs/2101.00190). Both versions can be implemented by passing trainable prompts as `model.forward(..., past_key_values=your_prompts)`.



### Read more

* How post-training quantization works: https://arxiv.org/abs/2208.07339
* An overview of running large models: https://huggingface.co/docs/accelerate/package_reference/big_modeling
* A general library for different adapter types: https://adapterhub.ml/


### [extra info] Running other models.

This notebook's code can run with other models of similar size, such as [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b), [OPT-6.7B](https://huggingface.co/facebook/opt-6.7b) or [BLOOM-7.1B](https://huggingface.co/bigscience/bloom-7b1). However, they will require minor code tweaks:
1. change the model name in `AutoModelForCausalLM.from_pretrained()` __and__ `AutoTokenizer`
2. In the prompt tuning code, change `model.model.embed_tokens` to refer to the target model's word embeddings. Simply `print(model)` to navigate to them.
3. Change code to add Lora layers - specifically where you what the transformer block components, since those components now have different names.

In [None]:
#Bonus

In [None]:
pip install transformers torch




In [None]:
import torch
from torch.utils.data import DataLoader, Dataset
from transformers import AutoTokenizer, GPT2LMHeadModel


model_name = 'gpt2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

num_prompt_tokens = 20  # Number of prompt tokens
prompt_tokens = torch.nn.Embedding(num_prompt_tokens, model.config.n_embd)
prompt_tokens.weight.data.normal_(mean=0.0, std=model.config.initializer_range)

def forward_with_prompt(model, inputs, prompt_tokens):

    prompt_embeddings = prompt_tokens(torch.arange(num_prompt_tokens).to(inputs.device))


    inputs_embeds = model.transformer.wte(inputs)
    full_input_embeds = torch.cat([prompt_embeddings.unsqueeze(0).expand(inputs_embeds.size(0), -1, -1), inputs_embeds], dim=1)


    attention_mask = torch.ones(inputs.size(0), num_prompt_tokens + inputs.size(1)).to(inputs.device)

    labels = torch.cat([inputs, torch.full((inputs.size(0), num_prompt_tokens), -100).to(inputs.device)], dim=1)

    outputs = model(inputs_embeds=full_input_embeds, attention_mask=attention_mask, labels=labels)
    return outputs.loss

# Example dataset
class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.data = torch.randint(tokenizer.vocab_size, (size, length))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

dataset = RandomDataset(32, 64)  # 32 examples, 64 tokens each
data_loader = DataLoader(dataset, batch_size=4)

# Training
optimizer = torch.optim.AdamW(prompt_tokens.parameters(), lr=5e-4)
num_epochs = 3  # Number of training epochs

model.train()
for epoch in range(num_epochs):
    for batch in data_loader:
        optimizer.zero_grad()
        loss = forward_with_prompt(model, batch, prompt_tokens)
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch} completed")

print("Training completed")


Epoch 0 completed
Epoch 1 completed
Epoch 2 completed
Training completed


In [None]:
def generate_text(model, prompt_text, prompt_tokens, tokenizer, max_length=50):
    encoded_input = tokenizer(prompt_text, return_tensors='pt')
    input_ids = encoded_input.input_ids
    prompt_embeddings = prompt_tokens(torch.arange(num_prompt_tokens).to(input_ids.device))


    input_embeds = model.transformer.wte(input_ids)
    full_input_embeds = torch.cat([prompt_embeddings.unsqueeze(0).expand(input_embeds.size(0), -1, -1), input_embeds], dim=1)

    # Generate text
    attention_mask = torch.ones(full_input_embeds.size(0), full_input_embeds.size(1)).to(input_ids.device)
    generated_ids = model.generate(inputs_embeds=full_input_embeds, attention_mask=attention_mask, max_length=max_length)

    return tokenizer.decode(generated_ids[0], skip_special_tokens=True)


In [None]:
prompt = "Once upon a time"  # Example prompt
generated_text_before = generate_text(model, prompt, prompt_tokens, tokenizer)
print("Before Fine-Tuning:", generated_text_before)


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Before Fine-Tuning: 

(3) The person is not a person.

(4) The person is not a person.

(5) The person is not a person.

(6) The person is not a person.



In [None]:
prompt = "Once upon a time"  # Example prompt
generated_text_after = generate_text(model, prompt, prompt_tokens, tokenizer)
print("After Fine-Tuning:", generated_text_after)


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


After Fine-Tuning: , I have a new job. I have a new job. I have a new job. I have a new job. I have a new job. I have a new job. I have a new job. I have a new job.
