## Prompt formatting utilities

Simple helper functions to format prompts later on:

In [1]:
template = """{source_lang}: {source_text}
{target_lang}: {target_text}"""

def apply_prompt(training=False, eos_token=None, **kwargs):
    # note: we strip because of potential trailing whitespace
    # we also provide a default value for target_text so that it can be omitted
    return template.format(**{"target_text": "", **kwargs}).strip() + ("" if not training or eos_token is None else eos_token)

def apply_prompt_n_shot(examples, n: int, eos_token: str, **kwargs):
    return (eos_token + "\n\n").join(
        [apply_prompt(**{"target_text": "", **example}) for example in examples[:n]] + [apply_prompt(**kwargs)]
    )

EXAMPLE_SENTENCES = [
    {
        "source_lang": "English",
        "target_lang": "Czech",
        "source_text": "I am sorry to hear that.",
        "target_text": "To je mi líto.",
    },
    {
        "source_lang": "English",
        "target_lang": "Czech",
        "source_text": "How much does it cost?",
        "target_text": "Kolik to stojí?",
    },
    {
        "source_lang": "English",
        "target_lang": "Czech",
        "source_text": "Prague is the capital of the Czech Republic.",
        "target_text": "Praha je hlavní město České republiky.",
    },
    {
        "source_lang": "English",
        "target_lang": "Czech",
        "source_text": "Pay attention to the road.",
        "target_text": "Dávej pozor na silnici.",
    },
    {
        "source_lang": "English",
        "target_lang": "Czech",
        "source_text": "I have a headache.",
        "target_text": "Bolí mě hlava.",
    }
]

In [2]:
print(apply_prompt(source_lang="English", source_text="Hello.", target_lang="Czech"))

English: Hello.
Czech:


In [3]:
print(apply_prompt(source_lang="English", source_text="Hello.", target_lang="Czech", target_text="Ahoj.", eos_token="</s>", training=True))

English: Hello.
Czech: Ahoj.</s>


In [4]:
print(apply_prompt_n_shot(EXAMPLE_SENTENCES, 5, "</s>", source_lang="English", source_text="Hello.", target_lang="Czech"))

English: I am sorry to hear that.
Czech: To je mi líto.</s>

English: How much does it cost?
Czech: Kolik to stojí?</s>

English: Prague is the capital of the Czech Republic.
Czech: Praha je hlavní město České republiky.</s>

English: Pay attention to the road.
Czech: Dávej pozor na silnici.</s>

English: I have a headache.
Czech: Bolí mě hlava.</s>

English: Hello.
Czech:


In [5]:

print(apply_prompt_n_shot(EXAMPLE_SENTENCES, 5, "</s>", source_lang="English", source_text="Hello.", target_lang="Czech", target_text="Ahoj."))

English: I am sorry to hear that.
Czech: To je mi líto.</s>

English: How much does it cost?
Czech: Kolik to stojí?</s>

English: Prague is the capital of the Czech Republic.
Czech: Praha je hlavní město České republiky.</s>

English: Pay attention to the road.
Czech: Dávej pozor na silnici.</s>

English: I have a headache.
Czech: Bolí mě hlava.</s>

English: Hello.
Czech: Ahoj.


In [6]:
import torch
from transformers import TrainingArguments
from trl import SFTTrainer
from unsloth import FastLanguageModel

  from .autonotebook import tqdm as notebook_tqdm
We shall set it ourselves.


## Load the base model

In [7]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/mistral-7b-bnb-4bit",
    max_seq_length=4096,
    load_in_4bit=True,
)

==((====))==  Unsloth: Fast Mistral patching release 2024.3
   \\   /|    GPU: Tesla T4. Max memory: 14.581 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.0. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.24. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth




In [8]:
FastLanguageModel.for_inference(model)
sentence = "Information about training algorithms, parameters, fairness constraints or other applied approaches, and features."
prompt = apply_prompt(source_lang="English", source_text=sentence, target_lang="Czech")
print(prompt)
inputs = tokenizer(
    prompt,
    return_tensors="pt",
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
tokenizer.batch_decode(outputs)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


English: Information about training algorithms, parameters, fairness constraints or other applied approaches, and features.
Czech:


['<s> English: Information about training algorithms, parameters, fairness constraints or other applied approaches, and features.\nCzech: Informace o algoritmech trénování, parametrech, omezeních spravedlnosti nebo o jiných přístupy a o atributech.\n\n## Training algorithms\n\n### Algorithms\n\n- `sklearn.linear_model.LogisticRegression`\n- `sklearn.linear_model.LogisticRegressionCV`\n- `sklearn.linear_model.LogisticRegressionCV`\n- `sklearn.linear_model.LogisticRegressionCV`\n- `sklearn.linear_model.Log']

In [9]:
prompt = apply_prompt_n_shot(EXAMPLE_SENTENCES, 5, eos_token=tokenizer.eos_token, source_lang="English", source_text=sentence, target_lang="Czech")
print(prompt)
inputs = tokenizer(
    prompt,
    return_tensors="pt",
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)
tokenizer.batch_decode(outputs)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


English: I am sorry to hear that.
Czech: To je mi líto.</s>

English: How much does it cost?
Czech: Kolik to stojí?</s>

English: Prague is the capital of the Czech Republic.
Czech: Praha je hlavní město České republiky.</s>

English: Pay attention to the road.
Czech: Dávej pozor na silnici.</s>

English: I have a headache.
Czech: Bolí mě hlava.</s>

English: Information about training algorithms, parameters, fairness constraints or other applied approaches, and features.
Czech:


['<s> English: I am sorry to hear that.\nCzech: To je mi líto.</s> \n\nEnglish: How much does it cost?\nCzech: Kolik to stojí?</s> \n\nEnglish: Prague is the capital of the Czech Republic.\nCzech: Praha je hlavní město České republiky.</s> \n\nEnglish: Pay attention to the road.\nCzech: Dávej pozor na silnici.</s> \n\nEnglish: I have a headache.\nCzech: Bolí mě hlava.</s> \n\nEnglish: Information about training algorithms, parameters, fairness constraints or other applied approaches, and features.\nCzech: Informace o algoritmech, parametrech, omezeních na spravedlnost nebo o jiných přístupech a o funkcích. industries.\n\nEnglish: I am sorry to hear that.\nCzech: To je mi líto.\n\nEnglish: How much does it cost?\nCzech: Kolik to stojí?\n\nEnglish: Prague is the capital of the Czech Republic.\nCzech: Praha je hlavní město České republiky.\n\nEnglish: Pay attention to the road.\nCzech: Dávej pozor na silnici.\n\nEnglish: I have a headache.\nCzech: Bolí mě hlava.\n\nEnglish: Information ab

## LoRa Adapters

In [10]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_dropout=0,  # 0 is optimized (unsloth)
    bias="none",  # "none" is optimized (unsloth)
    use_gradient_checkpointing=True,
    random_state=42,
    use_rslora=False,  # rank stabilized LoRA
    loftq_config=None,  # LoftQ
)

Unsloth 2024.3 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## Load dataset

In [11]:
from datasets import load_dataset

def formatting_prompts_func(examples):
    # dict of lists to list of dicts
    examples = [dict(zip(examples, t)) for t in zip(*examples.values())]
    texts = []
    for example in examples:
        text = (
            apply_prompt(training=True, eos_token=tokenizer.eos_token, **example)
            # + EOS_TOKEN  # FIXME: remove from apply_prompt and do here explicitly
        )
        texts.append(text)
    return {
        "text": texts,
    }

dataset_path = "hrabalm/npfl087-demo-small"
# dataset_path = "~/datasets/npfl087-demo-small"
dataset = load_dataset(dataset_path, split="train")
dataset = dataset.map(
    formatting_prompts_func,
    batched=True,
)

## Start training

In [12]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=4096,
    dataset_num_proc=2,
    packing=True,  # Unsloth claim: "can make training 5x faster for short sequences."
    args=TrainingArguments(
        per_device_train_batch_size=2,
        num_train_epochs=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=42,
        output_dir="outputs",
    ),
)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [13]:
# These stats are copied from the unsloth colab example notebook
# Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

trainer_stats = trainer.train()

# Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")

GPU = Tesla T4. Max memory = 14.581 GB.
5.004 GB of memory reserved.


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 26 | Num Epochs = 2
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 6
 "-____-"     Number of trainable parameters = 41,943,040
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mhrabalm[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
1,1.7561
2,2.0302
3,1.6007
4,1.5429
5,1.2984
6,1.2622


537.1551 seconds used for training.
8.95 minutes used for training.
Peak reserved memory = 10.002 GB.
Peak reserved memory % of max memory = 68.596 %.


## Save model

In [14]:
model.save_pretrained("outputs/mistral-ft-qlora")

## Inference

In [15]:
FastLanguageModel.for_inference(model)
sentence = "Information about training algorithms, parameters, fairness constraints or other applied approaches, and features."

prompt = apply_prompt(source_lang="English", source_text=sentence, target_lang="Czech")
print(prompt)
inputs = tokenizer(
    prompt,
    return_tensors="pt",
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)
tokenizer.batch_decode(outputs)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


English: Information about training algorithms, parameters, fairness constraints or other applied approaches, and features.
Czech:


['<s> English: Information about training algorithms, parameters, fairness constraints or other applied approaches, and features.\nCzech: Informace o algoritmech výcviku, parametrech, omezeních spravedlnosti nebo o dalších přístupech a o vlastnostech.\n\n## Training algorithms\n\n### Training algorithm\n\nThe training algorithm is used to train the model.\n\n### Training algorithm parameters\n\nThe training algorithm parameters are used to control the training process.\n\n### Fairness constraints\n\nFairness constraints are used to control the fairness of the model.\n\n### Applied approaches\n\nApplied approaches are used to control the applied approaches during the training process.\n\n### Features\n\nFeatures are used to control the features used during the training process.</s>']

In [16]:
prompt = apply_prompt_n_shot(EXAMPLE_SENTENCES, 5, eos_token=tokenizer.eos_token, source_lang="English", source_text=sentence, target_lang="Czech")
print(prompt)
inputs = tokenizer(
    prompt,
    return_tensors="pt",
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)
tokenizer.batch_decode(outputs)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


English: I am sorry to hear that.
Czech: To je mi líto.</s>

English: How much does it cost?
Czech: Kolik to stojí?</s>

English: Prague is the capital of the Czech Republic.
Czech: Praha je hlavní město České republiky.</s>

English: Pay attention to the road.
Czech: Dávej pozor na silnici.</s>

English: I have a headache.
Czech: Bolí mě hlava.</s>

English: Information about training algorithms, parameters, fairness constraints or other applied approaches, and features.
Czech:


['<s> English: I am sorry to hear that.\nCzech: To je mi líto.</s> \n\nEnglish: How much does it cost?\nCzech: Kolik to stojí?</s> \n\nEnglish: Prague is the capital of the Czech Republic.\nCzech: Praha je hlavní město České republiky.</s> \n\nEnglish: Pay attention to the road.\nCzech: Dávej pozor na silnici.</s> \n\nEnglish: I have a headache.\nCzech: Bolí mě hlava.</s> \n\nEnglish: Information about training algorithms, parameters, fairness constraints or other applied approaches, and features.\nCzech: Informace o algoritmech výcviku, parametrech, omezeních spravedlnosti nebo o jiných přístupech použitých a o funkcích.</s>']