# Fine-tuning LLaMA Model

In [None]:
!pip install -Uqqq pip --progress-bar off
!pip install -qqq torch==2.0.1 --progress-bar off
!pip install -qqq transformers==4.32.1 --progress-bar off
!pip install -qqq datasets==2.14.4 --progress-bar off
!pip install -qqq peft==0.5.0 --progress-bar off
!pip install -qqq bitsandbytes==0.41.1 --progress-bar off
!pip install -qqq trl==0.7.1 --progress-bar off

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
kaggle-environments 1.14.3 requires transformers>=4.33.1, but you have transformers 4.32.1 which is incompatible.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
apache-beam 2.46.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.7 which is incompatible.
apache-beam 2.46.0 requires numpy<1.25.0,>=1.14.3, but you have numpy 1.26.4 which is incompatible.
apache-beam 2.46.0 requires pyarrow<10.0.0,>=3.0.0, but you have pyarrow 15.0.2 which is incompatible.
pathos 0.3.2 requires dill>=0.3.8, but you have dill 0.3.7 which is incompatible.
pathos 0.3.2 requires multiprocess>=0.70.16, but you have multiprocess 0.70.15 which is incompatible.[0m[31m
[0m

## Import Libraries

In [None]:
import json
import re
from pprint import pprint
import pandas as pd
import pandas as pd
import torch
from sklearn.model_selection import train_test_split
from datasets import Dataset, load_dataset
from huggingface_hub import notebook_login, login
from peft import LoraConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
from trl import SFTTrainer

2024-05-23 12:55:23.097308: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-23 12:55:23.097445: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-23 12:55:23.241032: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

device

device(type='cuda')

## Load and Prepare Dataset

In [None]:
file_path = '/kaggle/input/instagram-ads/instagram_posts.json'
df = pd.read_json(file_path)

# Convert the DataFrame to a Hugging Face dataset
dataset = Dataset.from_pandas(df)

In [None]:
DEFAULT_SYSTEM_PROMPT = """
Write an engaging Instagram post caption about the given input. You can generate a few heashtags.
""".strip()


def generate_training_prompt(conversation: str, summary: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT) -> str:
    return f"""### Instruction: {system_prompt}

### Input:
{conversation.strip()}

### Response:
{summary}
""".strip()


In [None]:
# Function to clean the text in the dataset
def clean_text(text):
    if isinstance(text, str):
        text = re.sub(r"http\S+", "", text)
        text = re.sub(r"@[^\s]+", "", text)
        text = re.sub(r"\s+", " ", text)
        return re.sub(r"\^[^ ]+", "", text)
    else:
        return ""

# Function to create the ad text
def create_ad_text(data_point):
    caption = clean_text(data_point["caption"])
    # return f"Check out our latest post: {caption}"
    return caption

# Function to generate the text from the data point
def generate_text(data_point):
    ad_text = create_ad_text(data_point)
    return {
        "ad_text": ad_text
    }

In [None]:
# Example data point
example = generate_text(dataset[0])

In [None]:
# Function to process dataset
def process_dataset(data: Dataset):
    return (
        data.shuffle(seed=42)
        .map(generate_text)
        .remove_columns(['id', 'shortCode', 'caption', 'hashtags', 'mentions',
                'url', 'commentsCount', 'firstComment', 'latestComments',
                'dimensionsHeight', 'dimensionsWidth', 'displayUrl',
                'images', 'videoUrl', 'alt', 'likesCount', 'videoViewCount',
                'videoPlayCount', 'timestamp', 'childPosts', 'ownerFullName',
                'ownerUsername', 'ownerId', 'productType', 'videoDuration',
                'isSponsored', 'isPinned', 'musicInfo', 'taggedUsers',
                'coauthorProducers', 'locationName', 'locationId', 'error',
                'description', 'paidPartnership', 'sponsors'])
    )

In [None]:
dataset

Dataset({
    features: ['inputUrl', 'id', 'type', 'shortCode', 'caption', 'hashtags', 'mentions', 'url', 'commentsCount', 'firstComment', 'latestComments', 'dimensionsHeight', 'dimensionsWidth', 'displayUrl', 'images', 'videoUrl', 'alt', 'likesCount', 'videoViewCount', 'videoPlayCount', 'timestamp', 'childPosts', 'ownerFullName', 'ownerUsername', 'ownerId', 'productType', 'videoDuration', 'isSponsored', 'isPinned', 'musicInfo', 'taggedUsers', 'coauthorProducers', 'locationName', 'locationId', 'error', 'description', 'paidPartnership', 'sponsors'],
    num_rows: 2800
})

In [None]:
dataset = process_dataset(dataset)


Map:   0%|          | 0/2800 [00:00<?, ? examples/s]

In [None]:
dataset

Dataset({
    features: ['inputUrl', 'type', 'ad_text'],
    num_rows: 2800
})

In [None]:
dataset['ad_text'][:10]

['These guys get it.',
 'Balloon sleeves + bermuda shorts? SO refreshing🚰 #SpringLightly',
 'Let’t Go!!! 🔥 #brklnbloke #blackfriday #blackfridaysale #blackfridaydeals #holidayshopping',
 'The multi-sport quiver-killer has done it again. With a lighter-than-ever design, more comprehensive fit and an additional volume, the Talon™/Tempest Pro is ready to tackle your most ambitious adventures. #OspreyPacks',
 'We asked — beauty expert, travel aficionado, and hosting queen — from Instagram’s partnerships team what gifts from emerging brands she thinks are worth giving this year. Swipe through for her picks from a travel-friendly skincare set to an unexpected little luxury you probably wouldn’t buy for yourself. 🌟',
 'TODAY!! SAGE🌿RESTOCK NOON ET 🛒 SHOP.TELFAR.NET + EU.TELFAR.NET',
 '*Immediately adds VS Archives Swim to our vacation moodboard* #VSEscapetoSummer',
 '1 extra large latte and this ‘fit on repeat please. [she/her] #asseenonme Weekday ruched mini skirt [134463000] ASOS DESIGN bab

## Split and process train and validation datasets

In [None]:
# Split the dataset into training and validation sets
train_test_ratio = 0.92
train_size = int(train_test_ratio * len(dataset))
val_size = len(dataset) - train_size

In [None]:
train_dataset, val_dataset = dataset.train_test_split(
    test_size=val_size,
    train_size=train_size,
    seed=42
)['train'], dataset.train_test_split(
    test_size=val_size,
    train_size=train_size,
    seed=42
)['test']

## Model Configuration

In [None]:
# model_name = "meta-llama/Llama-2-7b-hf"
model_name = "meta-llama/Meta-Llama-3-8B-Instruct"

In [None]:
hf_token = "Your_HuggingFace_Token"

login(hf_token)

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
# Function to create model and tokenizer
def create_model_and_tokenizer():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        use_safetensors=True,
        quantization_config=bnb_config,
        trust_remote_code=True,
        device_map="auto",
    )

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    return model, tokenizer


# Create model and tokenizer
model, tokenizer = create_model_and_tokenizer()
model.config.use_cache = False

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

In [None]:
# Model quantization configuration
model.config.quantization_config.to_dict()

{'quant_method': <QuantizationMethod.BITS_AND_BYTES: 'bitsandbytes'>,
 'load_in_8bit': False,
 'load_in_4bit': True,
 'llm_int8_threshold': 6.0,
 'llm_int8_skip_modules': None,
 'llm_int8_enable_fp32_cpu_offload': False,
 'llm_int8_has_fp16_weight': False,
 'bnb_4bit_quant_type': 'nf4',
 'bnb_4bit_use_double_quant': False,
 'bnb_4bit_compute_dtype': 'float16'}

In [None]:
# Define LoRA configuration
lora_r = 16
lora_alpha = 64
lora_dropout = 0.1
lora_target_modules = [
    "q_proj",
    "up_proj",
    "o_proj",
    "k_proj",
    "down_proj",
    "gate_proj",
    "v_proj",
]

peft_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    target_modules=lora_target_modules,
    bias="none",
    task_type="CAUSAL_LM",
)

## Training Configuration

In [None]:
OUTPUT_DIR = "experiments"

# %load_ext tensorboard
# %tensorboard --logdir experiments/runs

In [None]:
# Our training arguments
training_arguments = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    logging_steps=1,
    learning_rate=1e-4,
    fp16=True,
    max_grad_norm=0.3,
    num_train_epochs=3, # was 1 and 2
    evaluation_strategy="steps",
    eval_steps=100,
    warmup_ratio=0.05,
    save_strategy="epoch",
    group_by_length=True,
    output_dir="experiments",
    # report_to="tensorboard",
    save_safetensors=True,
    lr_scheduler_type="cosine",
    seed=42,
)

In [None]:
# Initialize SFTTrainer that will train our model
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    peft_config=peft_config,
    dataset_text_field="ad_text",
    max_seq_length=4096,
    tokenizer=tokenizer,
    args=training_arguments,
)



Map:   0%|          | 0/2576 [00:00<?, ? examples/s]

Map:   0%|          | 0/224 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None)


In [None]:
# !wandb login Your_Wandb_Token

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [None]:
# Train the model
trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33mgencgeray[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: wandb version 0.17.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade
[34m[1mwandb[0m: Tracking run with wandb version 0.16.6
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/kaggle/working/wandb/run-20240523_125921-s2dcyk27[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mtoasty-haze-86[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/gencgeray/huggingface[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/gencgeray/huggingface/runs/s2dcyk27[0m
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
100,2.5911,2.61215
200,2.4678,2.619116
300,1.9787,2.388678
400,1.5569,2.469529


TrainOutput(global_step=483, training_loss=2.2517564837236583, metrics={'train_runtime': 5715.8597, 'train_samples_per_second': 1.352, 'train_steps_per_second': 0.085, 'total_flos': 1.1062444932562944e+16, 'train_loss': 2.2517564837236583, 'epoch': 3.0})

In [None]:
# Save the trained model
# from peft import AutoPeftModelForCausalLM

# trained_model = AutoPeftModelForCausalLM.from_pretrained(
#     OUTPUT_DIR,
#     low_cpu_mem_usage=True,
# )

# merged_model = model.merge_and_unload()
# merged_model.save_pretrained("merged_model", safe_serialization=True)
# tokenizer.save_pretrained("merged_model")

# Save the model
trainer.save_model(OUTPUT_DIR)

# Save the tokenizer
tokenizer.save_pretrained(OUTPUT_DIR)

('experiments/tokenizer_config.json',
 'experiments/special_tokens_map.json',
 'experiments/tokenizer.json')

### Inference example with the Fine-tuned Model

In [None]:
# Define function to generate prompt for inference
def generate_prompt(conversation: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT) -> str:
    return f"""### Instruction: {system_prompt}

### Input:
{conversation.strip()}

### Response:
""".strip()

def clean_generated_text(text: str) -> str:
    # Remove duplicate hashtags
    hashtags = set()
    cleaned_text = []
    for word in text.split():
        if word.startswith("#"):
            if word.lower() not in hashtags:
                hashtags.add(word.lower())
                cleaned_text.append(word)
        else:
            cleaned_text.append(word)
    return " ".join(cleaned_text)

# Define function to generate post
def generate_post(model, text: str):
    inputs = tokenizer(text, return_tensors="pt").to(device)
    inputs_length = len(inputs["input_ids"][0])
    with torch.no_grad():
        outputs = model.generate(**inputs,
                                 max_new_tokens=100,
                                 temperature=0.7,
                                 top_p=0.95)
    generated_text = tokenizer.decode(outputs[0][inputs_length:], skip_special_tokens=True)
    return clean_generated_text(generated_text)

In [None]:
# Test the function with a sample instruction
sample_instruction = "Create a new post about the 'Adventure' model backpack with 25 liters capacity for $200, perfect for climbers."
prompt = generate_prompt(sample_instruction)
generated_post = generate_post(model, prompt)
print("Generated Post Content:\n", generated_post)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Generated Post Content:
 📸: 1-2 of you posing in front of a mountain range or holding your packs on a summit 🎒: 1 in hand, 1 on your back 🗺️: 1 showing the route you took, 1 showing the summit you reached. #OspreyPacks #Adventure #Climbing #Backpacking #Backpack #Osprey #Backcountry #SeeYouOutHere #SeeYouOutHere2023


In this notebook we covered the process of fine-tuning a Meta-LLaMA model using the Hugging Face Transformers library and specific configurations for optimization. Some of the key steps in the notebook are:

1. **Data Preparation**:
   - Loading and preprocessing Instagram posts data.
   - Generating conversation texts and training prompts from the dataset.

2. **Model Configuration**:
   - Initializing the LLaMA model with a specific tokenizer.
   - Configuring model quantization to optimize for memory usage.
   - Setting up Low-Rank Adaptation (LoRA) for fine-tuning specific parts of the model.

3. **Training**:
   - Defining training arguments such as batch size, learning rate, and evaluation strategy.
   - Training the model using the prepared dataset and configurations.
   - Saving the trained model and tokenizer.

4. **Inference**:
   - Setting up a function to generate prompts for the model.
   - Defining a function to generate responses from the model.


## References

- [Fine-tuning Llama 2 on Your Own Dataset | Train an LLM for Your Use Case with QLoRA on a Single GPU](https://www.youtube.com/watch?v=MDA3LUKNl1E)
- https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct