## Fine-tuning Llama 2 with TDMB data
In this guide, I show you how easy it is to leverage Huggingface libraries to finetune Llama 2 with your own dataset. All you need is to put together a json-lines file of your dataset. Huggingface's new trl library will then handle the rest!!

All you need is your data in this format:
```
{"text": "text-for-model-to-predict"}
```

And import them:

In [3]:
from huggingface_hub import notebook_login
notebook_login()

from datasets import load_dataset
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer, TrainingArguments 
from peft import LoraConfig
from trl import SFTTrainer

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Loading data
Load your train and (optionally) evaluation datasets like this:

In [4]:
#from datasets import load_dataset

train_dataset = load_dataset('json', data_files='IMDBDatasetTrain.json', split='train')  
eval_dataset = load_dataset('json', data_files='IMDBDatasetTest.json', split='train')

### Formatting prompts
Then create a formatting_func to structure training examples as prompts:


In [5]:
def formatting_func(example):
    text = f"### Question: {example['review']}\n ### Answer: {example['sentiment']}"
    return [text]

## Model
### Loading model
Then we load the Llama 2 non-chat model quantized to 8 bits:

In [6]:
base_model_name = "meta-llama/Llama-2-7b-hf"

bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    bnb_8bit_quant_type="nf8",
    bnb_8bit_compute_dtype=torch.float32,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    use_auth_token=True
)
base_model.config.use_cache = False

# More info: https://github.com/huggingface/transformers/pull/24906
base_model.config.pretraining_tp = 1 

tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token




Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
binary_path: C:\Users\guang\anaconda3\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll
CUDA SETUP: Loading binary C:\Users\guang\anaconda3\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [14]:
import torch
torch.cuda.empty_cache()

In [21]:
output_dir = "./Llama-2-7b-hf-fine-tune-TDMB"

training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=2,
    learning_rate=2e-3,
    logging_steps=40,
    max_steps=80,
    logging_dir="./logs",        # Directory for storing logs
    save_strategy="steps",       # Save the model checkpoint every logging step
    save_steps=40,                # Save checkpoints every 50 steps
    evaluation_strategy="steps", # Evaluate the model every logging step
    eval_steps=40,               # Evaluate and save checkpoints every 50 steps
    do_eval=True                 # Perform evaluation at the end of training
)


We set the config for the Lora adapter: 

In [23]:
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

In [24]:
max_seq_length = 1024
trainer = SFTTrainer(
    model=base_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,  
    peft_config=peft_config,
    formatting_func=formatting_func,
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_args,
)

Map:   0%|          | 0/42167 [00:00<?, ? examples/s]

In [25]:
import torch
torch.cuda.empty_cache()

In [26]:
import torch

# Set max_split_size_mb to a larger value
torch.backends.cuda.max_split_size_mb = 1024

In [27]:
import torch
torch.cuda.empty_cache()
# pass in resume_from_checkpoint=True to resume from a checkpoint
trainer.train()

Step,Training Loss,Validation Loss
40,2.2337,2.399091
80,1.6277,2.721179




TrainOutput(global_step=80, training_loss=1.9307075500488282, metrics={'train_runtime': 9118.4919, 'train_samples_per_second': 0.018, 'train_steps_per_second': 0.009, 'total_flos': 6528268417105920.0, 'train_loss': 1.9307075500488282, 'epoch': 3.72})

On a 40GB A100, this took about 2 hours ^

## Running inference on a trained model
By default, the PEFT library will only save the Qlora adapters. So we need to load the base Llama 2 model from the Huggingface Hub:

In [1]:
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
from peft import PeftModel

and load the qlora adapter from a checkpoint directory:

In [2]:
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

model_id="./Llama-2-7b-hf-fine-tune-TDMB/checkpoint-40"

tokenizer = LlamaTokenizer.from_pretrained(model_id)

model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float32)


Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
binary_path: C:\Users\guang\anaconda3\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll
CUDA SETUP: Loading binary C:\Users\guang\anaconda3\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

then run some inference:

In [3]:
eval_prompt = """
Please give the output: 
review: As much as it pains me to give a movie called "Barbie" a 10 out of 10, I have to do so. It is so brilliantly handled and finely crafted, I have to give the filmakers credit. Yes, I am somewhat conservative person and former law enforcement officer. I'm a guy. I like guy things. Hell I even enjoyed the Battleship movie a few years ago (an absolutely ridiculous but fun romp of an action film). But I also like to experience other perspectives. And man oh man does this movie deliver that in spades - pretty much encapsulated everything my wife has tried to convey about her entire career and life experience wrapped up into two hours! The humor, the sets, the acting, and the ability to weave the current narrative into the story was just perfect. I don't agree with some of the points of the movie, but again, that's ok. This movie wasn't designed to give a balanced perspective of men versus women; it is a no-holds-barred unapologetic crazy ride of a rant about the real issues that women have faced since they were "allowed" to have "real jobs" and do the same things as men. Give me a well done film that is a blast to watch, that makes you think, and that was done from a place of creativity, passion, and attention to detail, and I'll call it what it is: a 10 out of 10 masterpiece.
---
sentiment:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))




Please give the output: 
review: As much as it pains me to give a movie called "Barbie" a 10 out of 10, I have to do so. It is so brilliantly handled and finely crafted, I have to give the filmakers credit. Yes, I am somewhat conservative person and former law enforcement officer. I'm a guy. I like guy things. Hell I even enjoyed the Battleship movie a few years ago (an absolutely ridiculous but fun romp of an action film). But I also like to experience other perspectives. And man oh man does this movie deliver that in spades - pretty much encapsulated everything my wife has tried to convey about her entire career and life experience wrapped up into two hours! The humor, the sets, the acting, and the ability to weave the current narrative into the story was just perfect. I don't agree with some of the points of the movie, but again, that's ok. This movie wasn't designed to give a balanced perspective of men versus women; it is a no-holds-barred unapologetic crazy ride of a rant about 

In [8]:
eval_prompt = """
Please give the output: 
review: Margot Robbie and Ryan Gosling are really great in their roles of Barbie and Kent. Gosling is specially hilarious. I expected a funny, cool, deep and entertaining movie, but I was highly disappointed. The movie is so terribly preachy that ends being an embarrassment. We (the audience) have brains, so, please, preach somewhere else. Furthermore, this is a movie to divide, not to unite. And I'm a woman. The script of the movie is horrendous and the direction is plain bad. There are some great actresses that have too little to say, like Emma Mackey, and that's a pity. Overall: a huge disappointment and a missed opportunity. 3/10 (1 point for Margot Robbie, 1 point for Ryan Gosling, and 1 point for the art direction (super tacky and pinky, as Barbie's world).
---
sentiment:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))


Please give the output: 
review: Margot Robbie and Ryan Gosling are really great in their roles of Barbie and Kent. Gosling is specially hilarious. I expected a funny, cool, deep and entertaining movie, but I was highly disappointed. The movie is so terribly preachy that ends being an embarrassment. We (the audience) have brains, so, please, preach somewhere else. Furthermore, this is a movie to divide, not to unite. And I'm a woman. The script of the movie is horrendous and the direction is plain bad. There are some great actresses that have too little to say, like Emma Mackey, and that's a pity. Overall: a huge disappointment and a missed opportunity. 3/10 (1 point for Margot Robbie, 1 point for Ryan Gosling, and 1 point for the art direction (super tacky and pinky, as Barbie's world).
---
sentiment:
'The only thing this movie has going for it is the fact that it is a very unique movie. The idea of a movie about a guy who\'s life is ruined by a dog is a very original concept, but th