# Fine-tuning Llama-2-7b-chat-hf
This notebook contains a set of cells exemplifying how to load and fine-tune Llama-2-7b

It builds on this toturial: https://www.datacamp.com/tutorial/fine-tuning-llama-2

In [1]:
# Imports. Install them if You don't have them yet
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments
)

from peft import LoraConfig, PeftModel
from trl import SFTTrainer
from huggingface_hub import login

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Check which device you have available. While this should run on a CPU it will be much faster on a GPU
device = "cuda:0" if torch.cuda.is_available() else "cpu"
device

'cuda:0'

## Specify the model id and load model + tokenizer

In [3]:
# Id of the model we want to fine-tune.
# While Llama-2-70b is a "better" model, it's simply to large to handle on a single GPU
model_id = "meta-llama/Llama-2-7b-chat-hf"

In [4]:
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

In [5]:
# login required to access the model. Gain access here: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
login("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")

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /home/mikkel/.cache/huggingface/token
Login successful


In [6]:
# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    #torch_dtype=torch.bfloat16, 
    quantization_config=quant_config,
    device_map={"": 0},
)

model.config.use_cache = False
model.config.pretraining_tp = 1

Loading checkpoint shards: 100%|██████████████████| 2/2 [00:01<00:00,  1.41it/s]


## Create a promt that we will use twice. One time with the vanilla model and once after fine-tuning
We'll see how the model compares before and after with this anecdotal evidence

In [9]:
text = """

Create a LinkedIn post using the language and tone of Mikkel Jensen. 
The post is about how I fine-tuned LLama-2 to write in my language and should start with the title:
How I fine-tuned an LLM to write like me!


"""
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))



Create a LinkedIn post using the language and tone of Mikkel Jensen. 
The post is about how I fine-tune LLama-2 to write in my language and should start with the title: How I fine-tuned an LLM to write like me!


Title: How I fine-tuned an LLM to write like me! 🤖📄

Hey there, fellow language enthusiasts! 👋 I'm thrilled to share my latest AI-powered experiment: fine-tuning a Large Language Model (LLM) to write in my very own voice! 💬🎉 It's been a wild ride, and I'm here to give you the lowdown on how I did it. 🤔

First things first: why bother? 🤷‍♂️ Well, for one, I'm fascinated by the potential of AI to augment human creativity. And two, I'm always looking for new ways to express myself and connect with my audience. So, when I stumbled upon the incredible work of Mikkel Jensen, I knew I had to give it a try! 🌟

Now, I'm not gonna lie – fine-tuning an LLM is no easy feat. It requires a fair amount of technical know-how, as well as a good understanding of the underlying algorithms and 

## The next cells are for loading the dataset and configuring the fine-tuning

In [12]:
# Load the dataset containing 31 of my old LinkedIn post. It's a super small sample.
train_dataset = load_dataset('json', data_files='data/linkedin_posts.jsonl')

In [13]:
train_dataset.column_names

{'train': ['prompt', 'completion']}

In [14]:
# Inspect the first entry in the dataset
train_dataset["train"][0]

{'prompt': 'Create a LinkedIn post using the language and tone of Mikkel Jensen. The post is about ML & AI use cases You can utilize right now',
 'completion': '🤖 ML & AI use cases You can utilize right now 🤖\n\n\nEmploy a Chatbot\n\nDo you have a web page containing a vast amount of information?\n Getting questions about how to find and use specific features?\nIs the FAQ page not quite cutting it?\nConsider using a chatbot to act as a virtual assistant. Chatbots can answer a lot of the easier questions and offload customer support personnel.\n\nA chatbot I have personally found helpful is the one Visma Dinero has implemented. It is super useful to have some help at hand for someone like me, who does not know that much about bookkeeping.\n\n\nCreate a Recommendation System\n\nRecommendation systems are widely used in online shops and on streaming services such as Netflix and Spotify.\nThe algorithms ensure that a movie or song that You are likely to also like is recommended.\nGood for 

In [15]:
len(train_dataset["train"])

31

In [None]:
# Specify parameters for Lora, a lightweight LLM training technique
peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

In [16]:
training_params = TrainingArguments(
    output_dir="./results",
    num_train_epochs=10,
    per_device_train_batch_size=2, # Make this smaller if you have a GPU with less VRAM
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=int(len(train_dataset["train"]) / 2),
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard",
    prediction_loss_only=True
)

In [17]:
# We're using supervised fine-tuning for the task
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset["train"],
    peft_config=peft_params,
    #dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False
)



## Train and evaluate the new model

In [18]:
trainer.train()

Step,Training Loss
25,2.4865
50,1.941
75,1.6032
100,1.2111
125,0.9713
150,0.6455


TrainOutput(global_step=160, training_loss=1.4129061222076416, metrics={'train_runtime': 54.4122, 'train_samples_per_second': 5.697, 'train_steps_per_second': 2.941, 'total_flos': 4792682994057216.0, 'train_loss': 1.4129061222076416, 'epoch': 10.0})

In [35]:
# Load different checkpoints during training and use the one you like the most!
ft_model = PeftModel.from_pretrained(model, "results/checkpoint-105")

inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = ft_model.generate(**inputs, max_new_tokens=500, max_time=10)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))



Create a LinkedIn post using the language and tone of Mikkel Jensen. 
The post is about how I fine-tune LLama-2 to write in my language and should start with the title: How I fine-tuned an LLM to write like me!


How I fine-tuned an LLM to write like me!

The latest trend in AI is Large Language Models (LLMs) and their ability to generate text, images and even code.

I wanted to try one out and see how it would perform.

I started by using the excellent and open-source tool Hugging Face’s Transformers For NLP, which allows you to browse, download and use pre-trained models.

I downloaded the most popular model - LLaMA - and used it to generate some text. It was pretty good, but not great. It lacked the nuance and personality that I try to bring to my posts.

So I decided to fine-tune it!

Fine-tuning an LLM is a simple process. You need to have a pre-trained model and some Python code.

Here’s the code I used:

https://lnkd.in/eY_mXP5Z


I also added a simple counter to the model, so

In [18]:
# Inspect how it does, when having seen all data ~10 times
ft_model = PeftModel.from_pretrained(model, "results/checkpoint-150")

inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = ft_model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Create a LinkedIn post using the language and tone of Mikkel Jensen. The post is about how context window compares to fine tuning LLMs, and the advantages and disadvantages of both. [/INST] How is the performance of a model that I'm using for generating text between fine tuning a language model (LWM) model and a model with a single prediction that I'll be able to estimate the probability of each character in a sentence?

The model I’m using has around 2.5 million parameters and is estimated from around 300 characters worth of data.

We can estimate the model in around 10 minutes using a GPU.

The model has around 70% accuracy on the test set.

Fine tuning a LWM model to have around the same accuracy would take around 30 minutes and the model would have around 30% additional parameters.

Fate tuning a LWM model takes around 3 minutes per iteration.

The model has around 70% accuracy on the test set.

The main advantage of a single prediction model is that it is faster to estimate and ha