# Fine tuning Llama-3.1-8B with Finance Alpaca Dataset

## Installing the required libraries and packages

In [1]:
!pip install torch torchvision transformers datasets bitsandbytes peft

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


In [2]:
import torch
from torch.utils.data import DataLoader
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from datasets import load_dataset

  from .autonotebook import tqdm as notebook_tqdm


## Configure torch with GPU

In [3]:
# Switch to GPU (cuda)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


## Model Loading from Huggingface

In [4]:
model_name = "NousResearch/Meta-Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|██████████| 4/4 [00:58<00:00, 14.53s/it]


## Data Loading & Preprocessing

In [5]:
# Download the dataset and split it into training and test set

dataset = load_dataset("poornima9348/finance-alpaca-1k-test")

train_test_split = dataset['test'].train_test_split(test_size=0.2, shuffle=True)
train_dataset = train_test_split['train']
test_dataset = train_test_split['test']

In [6]:
# Let's have a look at our dataset and how a single data point looks like
print(dataset, dataset.keys())

DatasetDict({
    test: Dataset({
        features: ['instruction', 'input', 'output', 'text'],
        num_rows: 1000
    })
}) dict_keys(['test'])


In [7]:
# Drop the columns 'input' and 'text' (there are no values in those columns)
train_dataset = train_dataset.remove_columns(['input', 'text'])
test_dataset = test_dataset.remove_columns(['input', 'text'])

In [8]:
# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding=True)

def prompt_builder(row):
    return {"text": row["instruction"] + row["output"]}

train_dataset =train_dataset.map(prompt_builder)
test_dataset =test_dataset.map(prompt_builder)

print("Train dataset example:", train_dataset[0])
print("Test dataset example:", test_dataset[0])

train_dataset = train_dataset.map(tokenize_function, batched=True)
test_dataset = test_dataset.map(tokenize_function, batched=True)

# Convert datasets to PyTorch tensors
train_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask'])
test_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask'])

train_dataset = train_dataset.remove_columns(['instruction', 'output'])
test_dataset = test_dataset.remove_columns(['instruction', 'output'])

Map: 100%|██████████| 800/800 [00:00<00:00, 6625.26 examples/s]
Map: 100%|██████████| 200/200 [00:00<00:00, 9912.10 examples/s]

Train dataset example: {'instruction': 'If I own x% of company A, and A buys company B, do I own x% of B?', 'output': "No, thanks to the principle of corporate personhood.  The legal entity (company C) is the owner and parent of the private company (sub S).  You and C are separate legal entities, as are C and S. This principle helps to legally insulate the parties for purposes such as liability, torts, taxes, and so forth.  If company C is sued, you may be financially at stake (i.e. your investment in C is devalued or made worthless) but you are not personally being sued.  However, the litigant may attach you as an additional litigant if the facts of the suit merit it.  But without legal separateness of corporations, then potentially all owners and maybe a number of the employees would be sued any time somebody sued the business - which is messy for companies and messy for litigants.  It's also far cleaner for lenders to lend to  unified business entities rather than a variety of thous


Map: 100%|██████████| 800/800 [00:00<00:00, 880.02 examples/s]
Map: 100%|██████████| 200/200 [00:00<00:00, 1744.02 examples/s]


## Turn the model into LoRA model

In [9]:
from peft import prepare_model_for_kbit_training

peft_model = prepare_model_for_kbit_training(model)

peft_model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear8bitLt(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear8bitLt(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear8bitLt(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear8bitLt(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear8bitLt(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4

In [10]:
from peft import LoraConfig, TaskType, get_peft_model

lora_config = LoraConfig(
    r=32, #the rank of the update matrices, expressed in int. Lower rank results in smaller update matrices with fewer trainable parameters
    lora_alpha=32, #LoRA scaling factor
    lora_dropout=0.1,
    task_type=TaskType.CAUSAL_LM,
    #target_modules='all-linear' # The modules (for example, attention blocks) to apply the LoRA update matrices.
)

lora_config

LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type=<TaskType.CAUSAL_LM: 'CAUSAL_LM'>, inference_mode=False, r=32, target_modules=None, lora_alpha=32, lora_dropout=0.1, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, use_dora=False, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False))

In [11]:
lora_model = get_peft_model(peft_model, lora_config)
lora_model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear8bitLt(
                (base_layer): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): 

In [12]:
lora_model.print_trainable_parameters()

trainable params: 13,631,488 || all params: 8,043,892,736 || trainable%: 0.1695


In [13]:
import numpy as np
from transformers import DataCollatorWithPadding,DataCollatorForLanguageModeling, Trainer, TrainingArguments

trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="Meta-Llama-3.1-8B-Instruct-finetuned",
        learning_rate=2e-5,
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=1,
        weight_decay=0.01,
        load_best_model_at_end=True,
        logging_steps=1,
        report_to="none"
    ),
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
)



## Let's start the training!

In [14]:
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Epoch,Training Loss,Validation Loss
1,2.7174,2.631765


We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)


TrainOutput(global_step=800, training_loss=2.643162889778614, metrics={'train_runtime': 4276.0884, 'train_samples_per_second': 0.187, 'train_steps_per_second': 0.187, 'total_flos': 8.0586892345344e+16, 'train_loss': 2.643162889778614, 'epoch': 1.0})

## Generate the first response

In [16]:
from peft import PeftModel

# Load the fine-tuned model with the adapter attached
model_with_adapter = PeftModel.from_pretrained(model, "Meta-Llama-3.1-8B-Instruct-finetuned/checkpoint-800").to("cuda")
model_with_adapter.eval()
inputs = tokenizer("How does a brokerage firm work?", return_tensors="pt")

outputs = model_with_adapter.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=100)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


How does a brokerage firm work?A brokerage firm is a company that acts as an intermediary between investors and the stock market. It allows its customers to buy and sell securities, such as stocks, bonds, and mutual funds. In exchange for its services, the brokerage firm charges a commission or fee to the investor. Here's a step-by-step explanation of how a brokerage firm works: 1. The investor opens a brokerage account with the firm. 2. The investor deposits funds into the account. 3. The investor places
