<a href="https://colab.research.google.com/github/etuckerman/SOCOTEC/blob/main/SOCOTEC_FINETUNE_elliot_tuckerman.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
#install unsloth, xformers (for flash attn) and other pckgs
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers trl peft accelerate bitsandbytes

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-v0.3",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2024.12.4: Fast Mistral patching. Transformers:4.46.3.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 8.0. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [3]:
#lora adapters

'''
lora adapters are Low Rank Adaptation,
technique for efficiently adapting the base language model
for new tasks or datasets by adding low rank mtrices to the models weights
enabling for fast and flexible fine tuning
lora enables task specific adaptation
while leveraging the pre trained models general knowledge
therefore good approach for transfer learning in NLP tasks
'''


model = FastLanguageModel.get_peft_model(
    model,
    r = 16, #sets rank of lora weights
            #(number of singular values to keep in the decomp)

    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
            #^^ specify the modeules or layers in the model where lara weights
            # will be added

    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,


)

Unsloth 2024.12.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [4]:
import pandas as pd
import random

def generate_dataset(num_samples):
  data = []
  for _ in range(num_samples):
    # Generate random prompts and function calls
    prompt = random.choice([
      f"What is {random.randint(1, 10)} plus {random.randint(1, 10)}?",
      f"Add {random.randint(1, 10)} and {random.randint(1, 10)}.",
      f"Can you add {random.randint(1, 10)} to {random.randint(1, 10)}?",
      f"Give me the sum of {random.randint(1, 10)} and {random.randint(1, 10)}.",
      f"What's the square of {random.randint(1, 10)}?",
      f"Square the number {random.randint(1, 10)}.",
      f"Cube the number {random.randint(1, 5)}.",
      f"What's the cube of {random.randint(1, 5)}?",
      f"Greet {random.choice(['Alice', 'Bob', 'Charlie'])}.",
      f"Say hello to {random.choice(['Alice', 'Bob', 'Charlie'])}."
    ])

    function_call = None
    if "sum" in prompt or "add" in prompt:
      function_call = f"add({random.randint(1, 10)}, {random.randint(1, 10)})"
    elif "square" in prompt:
      function_call = f"square({random.randint(1, 10)})"
    elif "cube" in prompt:
      function_call = f"cube({random.randint(1, 5)})"
    elif "greet" in prompt or "say hello" in prompt:
      function_call = f"greet('{random.choice(['Alice', 'Bob', 'Charlie'])}')"

    data.append([prompt, function_call])

  df = pd.DataFrame(data, columns=["Prompt", "Function Call"])
  return df

# Generate a dataset of 1000 samples
dataset = generate_dataset(1000)
dataset.to_csv("function_call_dataset.csv", index=False)

In [5]:
import pandas as pd
from datasets import load_dataset, Dataset

# Define the alpaca_prompt
alpaca_prompt = """Given the following prompt, provide a Python function call to perform the specified task:

**Prompt:**
{}"""

# Load your dataset
dataset = pd.read_csv("function_call_dataset.csv")

EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
  instructions = examples["Prompt"]
  outputs = examples["Function Call"]
  # Handle potential missing values (NaN)
  if pd.isnull(instructions) or pd.isnull(outputs):
    examples['text'] = ""  # or any other appropriate value
    return examples

  # Directly create the 'text' column using list comprehension
  # Ensure that instructions and outputs are strings
  examples['text'] = alpaca_prompt.format(str(instructions), "", str(outputs)) + EOS_TOKEN
  return examples  # Return the modified DataFrame

# Apply the formatting function to the pandas DataFrame before conversion
dataset = dataset.apply(formatting_prompts_func, axis=1) # Apply row-wise (axis=1)

# Convert the pandas DataFrame to a Hugging Face Dataset
dataset = Dataset.from_pandas(dataset) # This is the key change

In [6]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60, # Set num_train_epochs = 1 for full training runs
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Map (num_proc=2):   0%|          | 0/1000 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [7]:
#check memory
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.564 GB.
4.52 GB of memory reserved.


In [8]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.7967
2,2.7464
3,2.5584
4,2.2114
5,1.33
6,0.7849
7,0.6224
8,0.4792
9,0.3932
10,0.4502


In [9]:
#show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

106.3838 seconds used for training.
1.77 minutes used for training.
Peak reserved memory = 5.029 GB.
Peak reserved memory for training = 0.509 GB.
Peak reserved memory % of max memory = 12.711 %.
Peak reserved memory for training % of max memory = 1.287 %.


In [10]:
# Your custom prompt template
prompt_template = """Given the following prompt and the available functions, write a Python function call to perform the specified task:

**Available Functions:**
* `add(a, b)`: Adds two numbers, `a` and `b`.
* `square(a)`: Squares the number `a`.
* `cube(a)`: Cubes the number `a`.
* `greet(name)`: Greets the person with the name `name`.

**Prompt:**
{}
"""
# Inference with your custom prompt
FastLanguageModel.for_inference(model)

# Example input text
input_text = "Add 5 and 7"

inputs = tokenizer(
    [prompt_template.format(input_text)],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=16, use_cache=True)  # Adjust max_new_tokens as needed
generated_function_call = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

print(f"Generated Function Call: {generated_function_call}")

Generated Function Call: Given the following prompt and the available functions, write a Python function call to perform the specified task:

**Available Functions:**
* `add(a, b)`: Adds two numbers, `a` and `b`.
* `square(a)`: Squares the number `a`.
* `cube(a)`: Cubes the number `a`.
* `greet(name)`: Greets the person with the name `name`.

**Prompt:**
Add 5 and 7

**Prompt:**
What's the cube of 1?
