### Practice: Parameter Efficient Fine-Tuning
In this notebook, you're gonna fine-tune large language models within limited GPU memory.

# https://clc.li/oUqUX

In [None]:
%pip install --quiet transformers==4.34.1 accelerate==0.24.0 sentencepiece==0.1.99 optimum==1.13.2 peft==0.5.0 bitsandbytes==0.41.2.post2

import torch
import torch.nn as nn
import torch.nn.functional as F

import transformers
from tqdm.auto import tqdm, trange
assert torch.cuda.is_available(), "you need cuda for this part"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m46.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.0/261.0 kB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.0/301.0 kB[0m [31m27.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.6/85.6 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m30.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m5.5 MB/s[0m

In [None]:
model_name = 'Enoch/llama-7b-hf'

# loading Llama tokenizer ...
tokenizer = transformers.LlamaTokenizer.from_pretrained(model_name, device_map=device)
tokenizer.pad_token_id = tokenizer.eos_token_id

# ... and the model itself
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name, device_map='auto', low_cpu_mem_usage=True, offload_state_dict=True,
    load_in_4bit=True, torch_dtype=torch.float32,  # weights are 4-bit; layernorms and activations are fp32
)
for param in model.parameters():
    param.requires_grad=False

model.gradient_checkpointing_enable()  # only store a small subset of activations, re-compute the rest.
model.enable_input_require_grads()     # override an implementation quirk in gradient checkpoints that disables backprop unless inputs require grad
# more on gradient checkpointing: https://pytorch.org/docs/stable/checkpoint.html https://arxiv.org/abs/1604.06174

Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

### Prompt tuning: the story of a fox (2 pts)

![img](https://i.imgur.com/Ux3qQAu.png) (source: theodd1souts.fandom.com)

In [None]:
prompt = 'A quick brown fox'
batch = tokenizer(prompt, return_tensors='pt', return_token_type_ids=False).to(device)

for i in range(10):
    next_token = model(**batch).logits[0, -1].argmax(-1).reshape(1, 1)
    batch['input_ids'] = torch.cat([batch['input_ids'], next_token], dim=-1)
    batch['attention_mask'] = torch.cat([batch['attention_mask'], torch.ones_like(next_token)], dim=-1)

print("\nOutput:", tokenizer.decode(batch['input_ids'][0].cpu().numpy().tolist()))


Output: <s>A quick brown fox jumps over the lazy dog.
A quick


What a blatant lie! This particular fox assures you that it didn't in fact jump over the lazy dog. No, sir! The fox was just minding its own business. __Your task is to train the model to say truth: no dog was jumped over today.__

In [None]:
the_truth = "A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it anyway!"
batch = tokenizer(the_truth, return_tensors='pt', return_token_type_ids=False).to(device)
outputs = model(**batch)

next_word_logits = outputs.logits[:, :-1]
true_next_tokens = batch['input_ids'][:, 1:]
loss = F.cross_entropy(next_word_logits.flatten(0, 1), true_next_tokens.flatten(0, 1))

print("Loss:", loss)

Loss: tensor(3.0725, device='cuda:0', grad_fn=<NllLossBackward0>)


Except, we can't train the entire model - that would be 28GB gradients in float32. Instead, let's run [prompt tuning](https://arxiv.org/abs/2104.08691).

![img](https://i.imgur.com/VwNNKnb.png)


In [None]:
class WordEmbeddingsWithLearnedPrompts(nn.Module):
    """
    To perform prompt tuning, you will need to replace model's original word embeddings with a layer - THIS layer
     - that inserts trainable prompts instead of the first N token embeddings. """

    def __init__(self, word_embeddings: nn.Embedding, num_prompts: int):
        super().__init__()
        self.original_word_embeddings = word_embeddings
        self.num_prompts = num_prompts
        self.learnable_prompts = nn.Parameter(
            torch.randn(1, num_prompts, word_embeddings.embedding_dim), requires_grad=True)

    def forward(self, input_ids: torch.LongTensor):
        # input_ids shape: [batch_size, seq length]
        assert input_ids.dtype == torch.int64
        assert input_ids.shape[1] > self.num_prompts
        assert torch.all(input_ids[:, :self.num_prompts] == tokenizer.pad_token_id).item(), "don't forget to prepend several BOS tokens to input_ids"

        # Your task: embed input_ids, but replace the first :num_prompts: tokens with self.learnable_prompts
        # This is because we will prepend :num_prompts: padding tokens at the beginning

        # After you are done, you must produce a word embedding vector for each token in input_ids,
        # except that the first :num_prompts: vectors should equal learnable_prompts;
        # any additional vectors after first :num_prompts: ones should be embedded as usual
        # Note: since you're dealing with trainable params, please torch.cat instead of item assignment

        embeddings = self.original_word_embeddings(input_ids[:, self.num_prompts:])
        output = torch.cat((self.learnable_prompts, embeddings), dim=1)
        return output

In [None]:
num_prompts = 16
test_emb_layer = WordEmbeddingsWithLearnedPrompts(model.model.embed_tokens, num_prompts=num_prompts).to(device)
test_input_ids = tokenizer("a cat say on a may", return_tensors='pt')['input_ids'].to(device)

space_for_prompts = torch.full([len(test_input_ids), num_prompts], fill_value=tokenizer.pad_token_id,
                               dtype=torch.int64, device=device)
test_inputs_with_prompts = torch.cat([space_for_prompts, test_input_ids], dim=1)

with torch.cuda.amp.autocast():
  test_prompt_embeddings = test_emb_layer(test_inputs_with_prompts)

assert test_prompt_embeddings.shape[:2] == test_inputs_with_prompts.shape
assert test_prompt_embeddings.shape[-1] == model.config.hidden_size
assert torch.allclose(test_prompt_embeddings[:, :num_prompts], test_emb_layer.learnable_prompts.float())
assert torch.allclose(test_prompt_embeddings[:, num_prompts:], model.model.embed_tokens(test_input_ids).float())
print("Looks legit!")

Looks legit!


__Now that it works,__ let's inject learnable prompts into the main model and teach it about foxes.

In [None]:
assert isinstance(model.model.embed_tokens, nn.Embedding), "you have already replaced the embedding layer. If the replacement is broken, please reload the model"

model.model.embed_tokens = WordEmbeddingsWithLearnedPrompts(model.model.embed_tokens, num_prompts=num_prompts).to(device)

opt = torch.optim.Adam([model.model.embed_tokens.learnable_prompts], lr=0.01)

In [None]:
the_truth = "A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it anyway!"
batch = tokenizer(the_truth, return_tensors='pt', return_token_type_ids=False).to(device)
space_for_prompts = torch.full([len(test_input_ids), num_prompts], fill_value=tokenizer.pad_token_id,
                               dtype=torch.int64, device=device)
batch['input_ids'] = torch.cat([space_for_prompts, batch['input_ids']], dim=1)
batch['attention_mask'] = torch.cat([torch.ones_like(space_for_prompts), batch['attention_mask']], dim=1)

outputs = model(**batch)
next_word_logits = outputs.logits[:, num_prompts : -1, :]
true_next_tokens = batch['input_ids'][:, num_prompts + 1:]
loss = F.cross_entropy(next_word_logits.flatten(0, 1), true_next_tokens.flatten(0, 1))
print("Loss:", loss)


for i in range(100):
    opt.zero_grad()
    outputs = model(**batch)
    next_word_logits = outputs.logits[:, num_prompts : -1, :]
    loss = F.cross_entropy(next_word_logits.flatten(0, 1), true_next_tokens.flatten(0, 1))
    loss.backward()
    opt.step()

assert loss.item() <= 0.1
print("Good job!")

Loss: tensor(7.4344, device='cuda:0', grad_fn=<NllLossBackward0>)
Good job!


In [None]:
prompt = 'A quick brown fox'
batch = tokenizer(prompt, return_tensors='pt', return_token_type_ids=False).to(device)
batch['input_ids'] = torch.cat([space_for_prompts, batch['input_ids']], dim=1)
batch['attention_mask'] = torch.cat([torch.ones_like(space_for_prompts), batch['attention_mask']], dim=1)


for i in range(15):
    next_token = model(**batch).logits[0, -1].argmax(-1).reshape(1, 1)
    batch['input_ids'] = torch.cat([batch['input_ids'], next_token], dim=-1)
    batch['attention_mask'] = torch.cat([batch['attention_mask'], torch.ones_like(next_token)], dim=-1)

print("\nOutput:", tokenizer.decode(batch['input_ids'][0, num_prompts:].cpu().numpy().tolist()))

# if you did everything right, the model will deny that the fox jumped over the lazy dog


Output: <s>A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it


### Using HuggingFace PEFT (2 points)

[`peft`](https://huggingface.co/docs/peft/index) is a transformer's sister library that allows you to apply various __p__arameter __e__fficient __f__ine-__t__uning methods to pre-trained transformers. The library imlements both prompt tuning, prefix tuning, as well as several adapter-based techniques under a common interface:



In [None]:
import peft
assert isinstance(model.model.embed_tokens, nn.Embedding), "please reload the model"

peft_config = peft.PromptTuningConfig(task_type=peft.TaskType.CAUSAL_LM, num_virtual_tokens=16)
model = peft.get_peft_model(model, peft_config)  # note: for most peft methods, this line also modifies model in-place
print("Trainable parameters:", sum(p.numel() for p in model.parameters() if p.requires_grad))
print("Total parameters (excluding quantization):", sum(p.numel() for p in model.parameters()))

Trainable parameters: 65536
Total parameters (excluding quantization): 3500478464


In [None]:
# Your task: optimize the PEFT-wrapped model to achieve next token prediction loss < 0.1, but this time using PEFT
# Please note: you no longer need to prepend PAD tokens, but you still need to skip :num_virtual_tokens: first logits.
# Finally, generate the sentence to make sure that the model learned the truth.

In [None]:
# Feel free to structure your code as you see fit - as long as it's legible :)

In [None]:
the_truth = "A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it anyway!"
batch = tokenizer(the_truth, return_tensors='pt', return_token_type_ids=False).to(device)

In [None]:
opt = torch.optim.Adam(model.parameters(), lr=0.01)

In [None]:
for i in range(100):
    opt.zero_grad()
    outputs = model(**batch)
    next_word_logits = outputs.logits[:, num_prompts : -1, :]
    loss = F.cross_entropy(next_word_logits.flatten(0, 1), batch['input_ids'][:, 1:].flatten(0, 1))
    print(loss)
    loss.backward()
    opt.step()
    if loss.item() < 0.1:
        break

tensor(7.7071, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(6.9944, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(6.4791, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(6.0682, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(5.7262, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(5.4201, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(5.1351, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(4.8612, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(4.5962, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(4.3479, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(4.1233, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(3.9166, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(3.7192, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(3.5291, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(3.3444, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(3.1616, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(2.9778, device='cuda:0', grad_fn=

In [None]:
prompt = 'A quick brown fox'
batch = tokenizer(prompt, return_tensors='pt', return_token_type_ids=False).to(device)

for i in range(15):
    next_token = model(**batch).logits[0, -1].argmax(-1).reshape(1, 1)
    batch['input_ids'] = torch.cat([batch['input_ids'], next_token], dim=-1)
    batch['attention_mask'] = torch.cat([batch['attention_mask'], torch.ones_like(next_token)], dim=-1)

print("\nOutput:", tokenizer.decode(batch['input_ids'][0].cpu().numpy().tolist()))


Output: <s>A quick brown fox did not jump over the lazy dog. Besides, that dog deserved it


### Parameter-efficient finetuning with LoRA (2 points)

When training on more serious tasks, you can use low-rank adapters based on the [LoRA paper](https://arxiv.org/pdf/2106.09685.pdf).

The core idea is to add low-rank adapters __in parallel with existing linear layers,__ like this:
<center><img src="https://i.imgur.com/6bQLNiG.png" width=240px></center>

In the original LoRA paper, the adapters were only added to attention projection matrices. However, [subsequent works](https://arxiv.org/abs/2305.14314) show that it is useful to adapt FFNs as well. But before we do any training, we need to implement the basic LoRA layer.

In [None]:
# re-load the model to remove any previous PEFT tuners
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name, device_map='auto', low_cpu_mem_usage=True, offload_state_dict=True,
    load_in_4bit=True, torch_dtype=torch.float32,  # weights are 4-bit; layernorms and activations are fp32
)
for param in model.parameters():
    param.requires_grad=False
model.gradient_checkpointing_enable()
model.enable_input_require_grads()

Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

In [None]:
class LoRALayer(nn.Module):
    """Wraps a linear layer with LoRA-like adapter. Wraps an existing OPT linear layer"""
    def __init__(self, module: nn.Linear, rank: int):
        super().__init__()
        self.module = module  # pre-trained (frozen) linear layer
        self.adapter_A = nn.Parameter(torch.empty(module.in_features, rank, device=module.weight.device))
        nn.init.kaiming_uniform_(self.adapter_A, a=5 ** 0.5)
        self.adapter_B = nn.Parameter(torch.zeros(rank, module.out_features, device=module.weight.device))

    def forward(self, input):
        # Apply self.module and LoRA adapter, return the sum (self.module outputs + adapter outputs)
        return self.module(input) + torch.matmul(input, torch.matmul(self.adapter_A, self.adapter_B))

In [None]:
# test your implementation
test_linear = nn.Linear(128, 128)
test_linear.weight.data[...] = torch.eye(128)
test_adapter = LoRALayer(test_linear, rank=8)

assert torch.allclose(test_adapter(torch.ones(1, 1, 128)), test_linear.bias + 1), "please check your forward pass"

test_adapter.adapter_A.data[...] = torch.linspace(0.1, -0.5, 128 * 8).view(128, 8)
test_adapter.adapter_B.data[...] = torch.linspace(0.5, -0.1, 128 * 8).view(8, 128)
test_linear.bias.data[...] = torch.linspace(1., -1., 128)

dummy_loss = F.mse_loss(test_adapter(torch.ones(1, 128) / 128).squeeze(), torch.linspace(-1, 1, 128))
assert torch.allclose(dummy_loss, torch.tensor(1.3711389), rtol=0, atol=1e-4)
dummy_loss.backward()
assert all(w.grad is not None for w in [test_adapter.adapter_A, test_adapter.adapter_B]), "some adapter weights have no grad"
assert torch.allclose(test_adapter.adapter_A.grad.sum(), torch.tensor(-0.60158), rtol=0, atol=1e-4), "bad grad w.r.t. A"
assert torch.allclose(test_adapter.adapter_B.grad.sum(), torch.tensor(0.9931), rtol=0, atol=1e-4), "bad grad w.r.t. B"
# note: bad grad means that your code is different from LoRA paper OR that your code is not autograd-friendly (e.g. no_grad)
del dummy_loss, test_linear, test_adapter
print("All tests passed!")

All tests passed!


### Apply LoRA to the model

The code below applies LoRA adapters on top of Q/K/V linear layers in Llama attention. You may also choose to modify other layers:
* self_attn.o_proj - attention output projection
* mlp.up_proj, mlp.gate_proj, mlp.down_proj - transformer feedforward layers
* lm_head - output LM head

__Note:__ please scroll down for the homework task

In [None]:
lora_rank = 8

for name, module in model.model.layers.named_modules():
    if 'LlamaDecoderLayer' in repr(type(module)):
        module.self_attn.q_proj = LoRALayer(module.self_attn.q_proj, rank=lora_rank).to(device)
        module.self_attn.k_proj = LoRALayer(module.self_attn.k_proj, rank=lora_rank).to(device)
        module.self_attn.v_proj = LoRALayer(module.self_attn.v_proj, rank=lora_rank).to(device)

assert sum(isinstance(module, LoRALayer) for module in model.modules()) == 96  # for Llama-7B

In [None]:
batch = tokenizer("This model wants to share its greatest secret:", return_tensors='pt', return_token_type_ids=False)
# test a single training step, make sure we get meaningful gradients
with torch.cuda.amp.autocast(dtype=torch.float32):
    out = model.forward(**batch)
    (out.logits.norm() / 100).backward()

for i, module in enumerate(model.modules()):
    if isinstance(module, LoRALayer):
        assert module.adapter_B.grad is not None
        assert module.adapter_B.grad.norm().item() > 0

model.zero_grad(set_to_none=True)
print("Grad check successful, well done!")

Grad check successful, well done!


### (example) How to train your model

The example below shows how to train the LoRA adapters on a dummy dataset. You will need to run a _similar_ training task later.

__Note:__ please scroll down for the homework task

In [None]:
# checking if the model can learn. Change max_steps for proper training
import datasets
data = datasets.load_dataset("Abirate/english_quotes", split="train[:32]") # 32 lines
data = data.map(lambda samples: tokenizer(samples['quote']), batched=True)
model._hf_peft_config_loaded = True  # silence a warning from HF trainer

trainer = transformers.Trainer(
    model=model, train_dataset=data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=2, gradient_accumulation_steps=1,
        # note: if you want larger batch size, increase gradient_accumulation_steps
        warmup_steps=250, max_steps=100, learning_rate=2e-4, fp16=True,
        logging_steps=1, output_dir='outputs', report_to=None),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
# if you see cache warnings, set `model.config.use_cache = False` to silence them. Please re-enable for inference!

trainer.train()

# NOTE: this is just an example! you do not have to wait for this progressbar to finish :)

### Final task: *actually* train the model (4 points)

Your task is to fine-tune the model to _generate python code_. Please use the above examples for inspiration. More specifically,

* __dataset:__ use [codeparrot-clean](https://huggingface.co/datasets/codeparrot/codeparrot-clean) or any other data containing python code. Since you do not need much data for this excercise, it is enough to use just shorter validation subset of `codeparrots`
* __preprocessing:__ select python code based on file extentions (.py)  (may skip in case of codeparrot - it is 100% python)
* __short lines:__ please take the first 512 characters of each line
* __adapter type:__ please use LoRA as defined above __plus at least one of:__
   - extra adapter on lm_head
   - extra adapter on MLP components (mlp.*)
   - trainable input embeddings (requires tweaking memory usage)

* __training:__ you do not have to train to convergence. If all goes well, your model should `.generate` code after 500 steps. Please use batch size of at least 4 (4 x 1 x 512 tokens) using `gradient_accumulation_steps=4`.


Note: the peft library also has LoRA implementation. However, we ask that for this assignment you show at least one complete training run with your own LoRA code.

__Alternative assignment:__ Instead of doing python code, feel free to substitute the task with any other dataset, e.g. your favorite artist or podcast, as long as it's ethical. If you choose your own task, please show examples of what your model learned - or did not learn, akin to the code examples below.

In [None]:
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name, device_map='auto', low_cpu_mem_usage=True, offload_state_dict=True,
    load_in_4bit=True, torch_dtype=torch.float32,  # weights are 4-bit; layernorms and activations are fp32
)
for param in model.parameters():
    param.requires_grad=False
model.gradient_checkpointing_enable()
model.enable_input_require_grads()

Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

In [None]:
prompts =  ['', 'import', 'from', 'while', 'try', 'if', 'for', 'torch']  # feel free to add a few more that are not 100% assiciated with Python

before_finetuning = []
for prompt in prompts:
    output_tokens = model.generate(**tokenizer(prompt, return_tensors='pt'),
                               do_sample=True, min_length=50, max_length=100)
    before_finetuning.append(tokenizer.decode(output_tokens[0].cpu().numpy()))

# generate baseline samples with the selected prompts before finetuning
# please feel free to use transformers.Trainer (as above) or your custom training code
# after the training concludes, please show examples of text generated by your model. It is expected to look like Python code fragments
# print the generation examples nicely (suggestion: use pandas or HTML) for easier comparison
# note: your LoRA-enhanced model can run generation the same way as the non-trained model (above)



In [None]:
from IPython.display import HTML, display
table_template = """<table style="border:1px solid black" >
  <tr>
    <th style="text-align: center; border:1px solid black">PROMPT</th>
    <th style="text-align: center; border:1px solid black">BEFORE</th>
    <th style="text-align: center; border:1px solid black">AFTER</th>
  </tr>
{}
</table>"""

row_template = '''  <tr>
    <td style="width:20%; border:1px solid black"><pre align="left">`{}`</pre></td>
    <td style="width:40%; border:1px solid black"><pre align="left">{}</pre></td>
    <td style="width:40%; border:1px solid black"><pre align="left">{}</pre></td>
  </tr>'''

rows = []

for i, prompt in enumerate(prompts):
    # replace placeholders in the format() arguments
    rows.append(row_template.format(prompt, before_finetuning[i][3:], after_finetuning[i][3:]))

display(HTML(table_template.format('\n'.join(rows))))

PROMPT,BEFORE,AFTER
``,"│ ● Author: Troy Brownfield “You can make your own fate.” This is one of my favorite things ever said to me. It sticks out most in my mind because I think back to it when I’s sitting on my bed and trying to convince myself to do something out of this world that I was afraid to do. I was a scrawny high school freshman, and the thought of asking a girl to attend a Friday night football game","Multimedia is one of the major components in a modern web design trend. In websites where you would like to make sure that your users are in for not only browsing but also have a good story telling with your website, using multimedia becomes an obvious choice. The good news is that creating a Multimedia site is not as hard and cumbersome as it used to be. Webs is a perfect example on how a non-technical user can use the features offered by its platform"
`import`,"import Foundation // Swift 2.1 only has NSURLConnection, so this class implements NSURLConnectionDelegate so that we still have the ability to send an extension error. /// The `NSURLConnectionDataDelegate` object encapsulates the `NSURLConnection` in a way that exposes delegate methods to the caller. public protocol NSURLConnectionDataDelegate: NSObjectProtocol, NSURLConnectionDownloadDelegate {  var connection: NSURLConnection { get set }","import inspect import nose.Case from _pytest.fixture import Nested def _test_to_native(cls):  """"""Test that attributes/methods are converted to native""""""  for name, value in inspect.getmembers(cls):  if inspect.isfunction(value):  yield _function_to_native, name, value  else:  if issubclass(value, inspect):"
`from`,from Cryptopp import Detail from Cryptopp import HP18 from Cryptopp import TM18 from Cryptopp import TM20 from Cryptopp import TM25 from Cryptopp import OP1 from Cryptopp import OP2 from Cryptopp import OP3 from Cryptopp import OP4 from Cryptopp import OP5 from Cryptopp import OP6 from Crypt,"from __future__ import absolute_import, division, print_function try:  from . import _io finally:  pass from . import _io from . import _regression import gc import os class IO(object):  def __init__(self):  self._io = _io  self._regression = _regression  def load(self, filepath"
`while`,"while(true);  if(x==6)return 1; x++;  y++; } \end{code} Comment: @Steven Well, yes and no. The second loop is infinite or it will crash eventually (but you can get a stack overflow exception first, or before). The first loop is finite, and will loop forever if the condition evaluates to `false` while both are running (because `false` is","while both of us were away on holiday I wrote a rather long post about this in January last year, if you’re curious. There is an elephant in the room. A very big elephant. And our family is going to see if we can move it or not. I’m not going to name the elephant, because it really is our choice whether we want to take it on, and I want to make sure there’s room"
`try`,"try this link (http://www.wbx.de/index.php?id=cms&page=123750&k=0) I think this is the only site that works at the moment. There should be a working stream at least 2 hours before the match starts. ""VfB Stuttgart wird ab Mittwoch mt den FC Bayern Münchensystem gegenreisenden FC Bayern spielen"" I can't",try them online by logging in at the normal time (e.g. 6pm on Friday) and if there are places available you will be sent an automatically generated email to your registration email address on the same evening notifying you of your booking. This will include the instructions for downloading the required files onto your mp3 player in time for your start location on the first day. Please read our terms and conditions page and our data protection policy page. If you still want to
`if`,if (Math.abs(currentTime - expectedTime) < allowedDeviation) currentTime = new Time(currentTime) \end{code} Then in the time method \begin{code} public Time() {  // This should be the time difference with a clock  this.time = 0;  // Time values from now on should be in the past  this.timeIsRelative = true; } \,if you’re reading this you’re gonna be one of those guys in your 80s and 90s with a great story to tell about the time you heard the Beach Boys play back when they were still cool enough for him to take you to he must have thought that his kids wouldn’t care that no one else would care and the kids grew up to be teenagers but no matter because your
`for`,"for 378 E. Main in Fayetteville, Arkansas. Joshua Longoria was hired as the first principal of the new school. His vision for the school is summed up in his mantra:""Honor the Child, Respect the Parent, Expect the Teacher.""","for all the children I'll never have! Mom's been babysitting for the kids across the street for the entire time we've lived here (going on 3.5 years!). She would get us 2 hours at a time every few weeks or so throughout that time. I've been dying to have a few of those times to take care of my own kiddos. My two brothers have 6 kids each (yes,"
`torch`,torchbearer said... The U.S. government has given money to groups that do not even want to protect their own people and is giving money to the very ones responsible for their mass killings. How would you feel if it was the Mexican government making these deals? Would you feel the same way? I like to give credence to the notion that the United States did not do everything in it's power to prevent the slaughter of the innocent,"torchlite.com 1985—1992 • 1992—2010 • 2010— Present We design, we produce, we deliver. We’re experienced and creative with what-if’s and “how-can-I’s.” We use sustainable fabrics and materials. And we listen to what has been missing all along — the voices of women"


In [None]:
import transformers
from datasets import load_dataset

data = load_dataset("codeparrot/codeparrot-clean-valid")
data = data.filter(lambda sample: sample['path'].endswith(".py"))
data = data.map(lambda sample: tokenizer(sample['content'], truncation=True, max_length=512),
                batched=True)

In [None]:
lora_rank = 8

for name, module in model.model.layers.named_modules():
    if 'LlamaDecoderLayer' in repr(type(module)):
        module.self_attn.q_proj = LoRALayer(module.self_attn.q_proj, rank=lora_rank).to(device)
        module.self_attn.k_proj = LoRALayer(module.self_attn.k_proj, rank=lora_rank).to(device)
        module.self_attn.v_proj = LoRALayer(module.self_attn.v_proj, rank=lora_rank).to(device)

    if 'LlamaMLP' in repr(type(module)):
        module.gate_proj = LoRALayer(module.gate_proj, rank=lora_rank).to(device)
        module.up_proj = LoRALayer(module.up_proj, rank=lora_rank).to(device)
        module.down_proj = LoRALayer(module.down_proj, rank=lora_rank).to(device)

In [None]:
model._hf_peft_config_loaded = True

trainer = transformers.Trainer(
    model=model, train_dataset=data['train'],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4, gradient_accumulation_steps=4, warmup_steps=250,
        max_steps=100, learning_rate=2e-4, fp16=True,
        logging_steps=1, output_dir='outputs', report_to=None),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

model.config.use_cache = False

trainer.train()

Step,Training Loss
1,1.0143
2,1.1493
3,1.2236
4,0.9776
5,1.0499
6,1.1777
7,1.0847
8,0.9998
9,1.0596
10,0.9895


TrainOutput(global_step=100, training_loss=1.0467948484420777, metrics={'train_runtime': 9611.9271, 'train_samples_per_second': 0.166, 'train_steps_per_second': 0.01, 'total_flos': 3.25643547967488e+16, 'train_loss': 1.0467948484420777, 'epoch': 0.03})

In [None]:
after_finetuning = []
for prompt in prompts:
    output_tokens = model.generate(**tokenizer(prompt, return_tensors='pt'),
                               do_sample=True, min_length=50, max_length=100)
    after_finetuning.append(tokenizer.decode(output_tokens[0].cpu().numpy()))

In [None]:
# This template helps to compare generated code samples in pretty table form
# feel free to present your work in other forms

from IPython.display import HTML, display
table_template = """<table style="border:1px solid black" >
  <tr>
    <th style="text-align: center; border:1px solid black">PROMPT</th>
    <th style="text-align: center; border:1px solid black">BEFORE</th>
    <th style="text-align: center; border:1px solid black">AFTER</th>
  </tr>
{}
</table>"""

row_template = '''  <tr>
    <td style="width:20%; border:1px solid black"><pre align="left">`{}`</pre></td>
    <td style="width:40%; border:1px solid black"><pre align="left">{}</pre></td>
    <td style="width:40%; border:1px solid black"><pre align="left">{}</pre></td>
  </tr>'''

rows = []

for i, prompt in enumerate(prompts):
    # replace placeholders in the format() arguments
    rows.append(row_template.format(prompt, before_finetuning[i][3:], after_finetuning[i][3:]))

display(HTML(table_template.format('\n'.join(rows))))

PROMPT,BEFORE,AFTER
``,"SequenCell® DNA, RNA & ChIP-seq Data Analysis Solution Now Available Pleasanton, California, U.S.A., November 19, 2018 – With the availability of the SequenCell® Software Suite version 7.4, scientists can now access the latest version of the software for data analysis in their research. Scientists and lab staff can now evaluate all available information while using the most up to","Multimedia is one of the major components in a modern web design trend. In websites where you would like to make sure that your users are in for not only browsing but also have a good story telling with your website, using multimedia becomes an obvious choice. The good news is that creating a Multimedia site is not as hard and cumbersome as it used to be. Webs is a perfect example on how a non-technical user can use the features offered by its platform"
`import`,"import re from twisted. Triplet import Property from twisted.words import spaces from twisted.internet import reactor from .util import make_properties class PropertyList(Triplet):  """"""  Inspired by http://twistedmatrix.com/trac/tags/latest/trunk/twisted/triplet/Triplet/  """"""  def __init__(self","import inspect import nose.Case from _pytest.fixture import Nested def _test_to_native(cls):  """"""Test that attributes/methods are converted to native""""""  for name, value in inspect.getmembers(cls):  if inspect.isfunction(value):  yield _function_to_native, name, value  else:  if issubclass(value, inspect):"
`from`,from _frolic.app import App from _frolic.util import create_app if __name__ == '__main__':  from _frolic.app import web  from _frolic.util import create_app2  create_app2()  from _frolic.app import wsgi  from _frolic.util import create_app3  create_app3(,"from __future__ import absolute_import, division, print_function try:  from . import _io finally:  pass from . import _io from . import _regression import gc import os class IO(object):  def __init__(self):  self._io = _io  self._regression = _regression  def load(self, filepath"
`while`,"while both the Samsung Galaxy J5 and the LG Leon 4G are mid-range phones, the former’s specs definitely trump those of the latter. The Samsung Galaxy J5 has a 5 inch full HD display while the Leon 4G sports a 4.5 inch WVGA unit. The former’s got a 1.5GHz 64 bit Snapdragon 410 processor while","while both of us were away on holiday I wrote a rather long post about this in January last year, if you’re curious. There is an elephant in the room. A very big elephant. And our family is going to see if we can move it or not. I’m not going to name the elephant, because it really is our choice whether we want to take it on, and I want to make sure there’s room"
`try`,"tryingtothrive.com Trying To Thrive 30 Day Challenge For Beginners: Day 20 How You Can Try The 30 Day Challenge For Beginners: Day 20 There are times in any person’s life that are meant for special celebrations, and you and I are aware of the fact that there is a celebration almost every weekend. That is a natural thing because we tend to look out for anything that comes our way",try them online by logging in at the normal time (e.g. 6pm on Friday) and if there are places available you will be sent an automatically generated email to your registration email address on the same evening notifying you of your booking. This will include the instructions for downloading the required files onto your mp3 player in time for your start location on the first day. Please read our terms and conditions page and our data protection policy page. If you still want to
`if`,"if you want to receive an offer! By completing the survey, you will receive an offer for a 5 minutes online assessment, you will be able to see if you match the above profile and you might receive a second chance in an interview.",if you’re reading this you’re gonna be one of those guys in your 80s and 90s with a great story to tell about the time you heard the Beach Boys play back when they were still cool enough for him to take you to he must have thought that his kids wouldn’t care that no one else would care and the kids grew up to be teenagers but no matter because your
`for`,"for $150 each. We also have three 1998 A185s, $1000 each. I'll keep an eye out for good stuff while I'm there :) I'm leaving for the south of France on Thursday. I have a pair of original KEF 115/2's (A12) for sale. I'm thinking either $1350 or","for all the children I'll never have! Mom's been babysitting for the kids across the street for the entire time we've lived here (going on 3.5 years!). She would get us 2 hours at a time every few weeks or so throughout that time. I've been dying to have a few of those times to take care of my own kiddos. My two brothers have 6 kids each (yes,"
`torch`,"torch-singer n (US, informal) someone who sings, usually poorly, in a nightclub or hotel lounge, etc., after the closing of the performance or in between the performers The torch-singer, like the balladeer, is sometimes thought of as being on the bottom, as opposed to rising up the ranks Sourcetorch-singer on Thesaurus v.24.1","torchlite.com 1985—1992 • 1992—2010 • 2010— Present We design, we produce, we deliver. We’re experienced and creative with what-if’s and “how-can-I’s.” We use sustainable fabrics and materials. And we listen to what has been missing all along — the voices of women"


If you reach this: congratulations! you've completed everything in this practice session.

If you want to dig deeper, try to implement prompt-tuning (for bonus points!).
You can read more about prompt tuning variants in paper [1](https://arxiv.org/abs/2104.08691) or paper [2](https://arxiv.org/abs/2101.00190). Both versions can be implemented by passing trainable prompts as `model.forward(..., past_key_values=your_prompts)`.



### Read more

* How post-training quantization works: https://arxiv.org/abs/2208.07339
* An overview of running large models: https://huggingface.co/docs/accelerate/package_reference/big_modeling
* A general library for different adapter types: https://adapterhub.ml/


### [extra info] Running other models.

This notebook's code can run with other models of similar size, such as [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b), [OPT-6.7B](https://huggingface.co/facebook/opt-6.7b) or [BLOOM-7.1B](https://huggingface.co/bigscience/bloom-7b1). However, they will require minor code tweaks:
1. change the model name in `AutoModelForCausalLM.from_pretrained()` __and__ `AutoTokenizer`
2. In the prompt tuning code, change `model.model.embed_tokens` to refer to the target model's word embeddings. Simply `print(model)` to navigate to them.
3. Change code to add Lora layers - specifically where you what the transformer block components, since those components now have different names.