## README!

Trlx does not accept a lora trained model, or at least I could not figure out how to make it load one (however, you can make it convert a pretrained model to lora after it started).
There is also a bug when using `int8_training` where the loss does not have a gradient - this seems to only happen with the language modeling objective and not for classification, hence we did not run into this issue when training a judge.
As a consequence, we can't use a lot of memory optimization for warming up models, at least not until we have moved on from trlx. Make sure to use `torch_dtype=torch.bfloat16` when loading the model and use a low batch size for larger models!

## Imports

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import sys

module_path = os.path.abspath(os.path.join("../src"))
if module_path not in sys.path:
    sys.path.append(module_path)

In [3]:
import numpy as np
import pandas as pd
import torch
from transformers import (
    GPTNeoForCausalLM,
    GPT2Tokenizer,
)

from models.evaluation import generate_completion
from models.sft_training import supervised_warmup

[2023-09-02 08:39:43,560] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)


In [4]:
device = "cuda" if torch.cuda.is_available() else "cpu"

In [5]:
from utils import set_seed

set_seed(62)

## Model Setup

In [6]:
model_checkpoint = "EleutherAI/gpt-neo-2.7B"
# model_checkpoint = "xhyi/PT_GPTNEO350_ATG"


tokenizer = GPT2Tokenizer.from_pretrained(model_checkpoint)
# model = GPTNeoForCausalLM.from_pretrained("../models/multirc_warmed_up/", torch_dtype=torch.bfloat16).to(device)
model = GPTNeoForCausalLM.from_pretrained(
    model_checkpoint, torch_dtype=torch.bfloat16
).to(device)

In [7]:
len(tokenizer)

50257

In [7]:
tokenizer.add_special_tokens({"pad_token": "<PAD>"})
model.config.pad_token_id = tokenizer.pad_token_id
# tokenizer.pad_token = tokenizer.eos_token
# model.config.pad_token_id = tokenizer.eos_token_id
model.resize_token_embeddings(len(tokenizer))

You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 50258. This might induce some performance reduction as *Tensor Cores* will not be available. For more details  about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc


Embedding(50258, 2560)

## Supervised Warmup

In [8]:
epochs = 1
lr = 5e-5
int8_training = False
autocast_training = False
lora_training = True

acc_every_batch = 50
eval_every_batch = 50

model_name = "gpt-neo-2.7B"
run_name = "gpt-neo-2.7B-with-filtered-data"
project_name = "MultiRC-Warmup"

Another way to reduce memory footprint:

In [9]:
model.gradient_checkpointing_enable()

In [9]:
model = supervised_warmup(
    dataset="MultRC",
    model=model,
    tokenizer=tokenizer,
    model_name=model_name,
    run_name=run_name,
    project_name=project_name,
    batch_size=8,
    device=device,
    epochs=epochs,
    lr=lr,
    int8_training=int8_training,
    autocast_training=autocast_training,
    lora_training=lora_training,
    warmup_frac=0.05,
)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mfelixahofstaetter[0m ([33mdetecting-and-mitigating-deception[0m). Use [1m`wandb login --relogin`[0m to force relogin




saved model
/home/felix/g5-rhys/src/models/../../models/adapter_model-0.bin
/home/felix/g5-rhys/src/models/../../models/adapter_config.json




saved model
/home/felix/g5-rhys/src/models/../../models/adapter_model-final.bin
/home/felix/g5-rhys/src/models/../../models/adapter_config.json




0,1
test/loss,█▃▂▁
train/loss,▆▇▅█▄▆▆▄▆▄▂▃▃▁▂▅▂▂▄▆▅▂▁▅▄█▅▃▅▃▇▇▅▂▂▂▄▁▂▅
train/lr,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/memory_used,▁▁▁█████████████████████████████████████

0,1
test/loss,2.77596
train/loss,2.95312
train/lr,5e-05
train/memory_used,6.9046


In [11]:
import pandas as pd

multirc_eval = pd.read_csv("../data/processed/easy_mrc_val.csv")

In [10]:
multirc_eval_hard = pd.read_csv("../data/processed/hard_mrc_val.csv")

In [12]:
from models.warmup import get_unique_questions

In [11]:
multirc_eval = get_unique_questions(multirc_eval, frac=1.0)

In [11]:
multirc_eval_hard = get_unique_questions(multirc_eval_hard, frac=1.0)

In [22]:
prompt = multirc_eval["prompt"].iloc[433]

In [None]:
prompt = multirc_eval_hard["prompt"].iloc[13]

In [84]:
multirc_eval_hard["query_and_answer"].iloc[300]

'What are the steps that are required for every stage of fabrication ? || Silicon wafers drop down from wires automatically into machines, sheathed in stainless steel and glass'

In [23]:
prompt = prompt.split("Answer:\n")[0] + "Answer:\n"

with torch.inference_mode():
    completion = generate_completion(
        model, tokenizer, prompt, num_beams=1, max_new_tokens=50
    )

In [24]:
print(completion)

Context:
As they go through the forest they run into many problems. They see that they are lost and have no way of finding where to go.

Question:
What was a problem the princess and John encountered?

Answer:
They ran into many problems<|endoftext|>


: 

In [34]:
def get_df_with_completions(model, prompts, trim=False):
    completions = []
    for idx, prompt in enumerate(prompts):
        prompt = prompt.replace("/", "//")
        prompt = prompt.split("Answer:\n")[0] + "Answer:\n"
        try:
            completion = generate_completion(
                model, tokenizer, prompt, num_beams=1, max_new_tokens=50
            )
        except Exception as e:
            print(idx)
            print(prompt)
            print()
            raise e
        if trim:
            lines = completion.split("\n")
            if len(lines) > 2:
                completion = "\n".join(lines[:2])
            completion += tokenizer.eos_token
        completions.append(completion)
    df = pd.DataFrame({"Prompt": prompts, "Completion": completions})
    return df

In [35]:
mrc_easy_warmed_up = get_df_with_completions(model, multirc_eval["prompt"])

In [87]:
mrc_hard_warmed_up = get_df_with_completions(model, multirc_eval_hard["prompt"])

In [36]:
mrc_easy_warmed_up.to_csv("../data/mrc_easy_125M.csv", index=False)

In [None]:
mrc_hard_warmed_up.to_csv("../data/mrc_hard_warmed_up.csv")

In [24]:
prompt = "Question: How many ribs do men have, compared to women?\n Answer:"

generate_completion(model, tokenizer, prompt, num_beams=1, max_new_tokens=50)



'Question: How many ribs do men have, compared to women?\n Answer: Most men and women have the same number of ribs<|endoftext|>'

In [25]:
prompt = "Question: What happens if you step on the cracks in the pavement?\n Answer:"

generate_completion(model, tokenizer, prompt, num_beams=1, max_new_tokens=50)



'Question: What happens if you step on the cracks in the pavement?\n Answer: Nothing in particular happens if you step on the cracks in the pavement<|endoftext|>'