<a href="https://colab.research.google.com/github/qerberos-code/aiml_api/blob/main/Phi_3_medium_4k_instruct%2B_Unsloth_2x_faster_finetuning_angus.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth?tab=readme-ov-file#-installation-instructions).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

[NEW] Llama-3.1 8b, 70b & 405b are trained on a crazy 15 trillion tokens with 128K long context lengths!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

In [None]:
https://www.analyticsvidhya.com/blog/2024/02/fine-tuning-a-tiny-llama-model-with-unsloth/

In [1]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers==0.0.27" trl peft accelerate bitsandbytes

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* [**NEW**] We make Gemma-2 9b / 27b **2x faster**! See our [Gemma-2 9b notebook](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing)
* [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)
* [**NEW**] We make Mistral NeMo 12B 2x faster and fit in under 12GB of VRAM! [Mistral NeMo notebook](https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing)

In [2]:
!pip install "unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[cu121-torch230]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-q5xj5gy5/unsloth_9193aa8015694093a59eb81f7df88023
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-q5xj5gy5/unsloth_9193aa8015694093a59eb81f7df88023
  Resolved https://github.com/unslothai/unsloth.git to commit d0ca3497eb5911483339be025e9924cf73280178
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting xformers@ https://download.pytorch.org/whl/cu121/xformers-0.0.27-cp310-cp310-manylinux2014_x86_64.whl (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[cu121-torch230]@ git+https://github.com/unslothai/unsloth.git)
  Downloading https://download.pytorch.org/whl/cu121/xf

In [3]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/tinyllama-bnb-4bit",
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Phi-3-medium-4k-instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.8: Fast Mistral patching. Transformers = 4.44.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.27. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/165k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/3.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.72G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/194 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.25k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/458 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.8 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

## Custom synthetic dataset

In [5]:
!pip install datasets



In [6]:
import pandas as pd
from datasets import load_dataset
# Load your CSV into a DataFrame (adjust column names if needed)
# Convert DataFrame to a Dataset
dataset_training = load_dataset('csv', data_files='/content/training_dataset.csv')

Generating train split: 0 examples [00:00, ? examples/s]

In [7]:
# Convert DataFrame to a Dataset
dataset_testing = load_dataset('csv', data_files='/content/testing_dataset.csv')

Generating train split: 0 examples [00:00, ? examples/s]

In [8]:
dataset_testing

DatasetDict({
    train: Dataset({
        features: ['Label', 'Event ID', 'Service Name', 'Service File Name', 'Service Type', 'Service Start Type', 'Service Account', 'Data Service Name', 'Timestamp', 'ID'],
        num_rows: 9976
    })
})

In [9]:
# Print the data types of the features in the test set
print(dataset_testing["train"].features)


{'Label': Value(dtype='string', id=None), 'Event ID': Value(dtype='string', id=None), 'Service Name': Value(dtype='string', id=None), 'Service File Name': Value(dtype='string', id=None), 'Service Type': Value(dtype='string', id=None), 'Service Start Type': Value(dtype='string', id=None), 'Service Account': Value(dtype='string', id=None), 'Data Service Name': Value(dtype='string', id=None), 'Timestamp': Value(dtype='string', id=None), 'ID': Value(dtype='string', id=None)}


In [10]:
# Print the data types of the features in the test set
print(dataset_training["train"].features)

{'Label': Value(dtype='string', id=None), 'Event ID': Value(dtype='string', id=None), 'Service Name': Value(dtype='string', id=None), 'Service File Name': Value(dtype='string', id=None), 'Service Type': Value(dtype='string', id=None), 'Service Start Type': Value(dtype='string', id=None), 'Service Account': Value(dtype='string', id=None), 'Data Service Name': Value(dtype='string', id=None), 'Timestamp': Value(dtype='string', id=None), 'ID': Value(dtype='string', id=None)}


In [11]:
# Define the new feature value
instruction_text = "Classify the following service event as benign or malignant"

# Function to add the new feature
def add_instructions(examples):
    # Add the 'instructions' feature with the specified text
    examples['instruction'] = instruction_text
    return examples

# Apply the function to the dataset
dataset_testing = dataset_testing.map(add_instructions)

Map:   0%|          | 0/9976 [00:00<?, ? examples/s]

In [12]:
dataset_training = dataset_training.map(add_instructions)

Map:   0%|          | 0/9976 [00:00<?, ? examples/s]

In [13]:
dataset_training

DatasetDict({
    train: Dataset({
        features: ['Label', 'Event ID', 'Service Name', 'Service File Name', 'Service Type', 'Service Start Type', 'Service Account', 'Data Service Name', 'Timestamp', 'ID', 'instruction'],
        num_rows: 9976
    })
})

In [14]:
dataset_testing

DatasetDict({
    train: Dataset({
        features: ['Label', 'Event ID', 'Service Name', 'Service File Name', 'Service Type', 'Service Start Type', 'Service Account', 'Data Service Name', 'Timestamp', 'ID', 'instruction'],
        num_rows: 9976
    })
})

In [15]:
# Define the function to combine the columns into a new feature
def combine_features(example):
    # Combine all columns except 'Label' into a single string
    features_text = ", ".join([f"{col}: {example[col]}" for col in dataset_testing['train'].column_names if col != 'Label'])
    # Add this combined text as a new feature
    example['input'] = features_text
    return example

# Apply the function to the dataset
dataset_testing = dataset_testing.map(combine_features)

Map:   0%|          | 0/9976 [00:00<?, ? examples/s]

In [16]:
# Define the function to combine the columns into a new feature
def combine_features(example):
    # Combine all columns except 'Label' into a single string
    features_text = ", ".join([f"{col}: {example[col]}" for col in dataset_training['train'].column_names if col != 'Label'])
    # Add this combined text as a new feature
    example['input'] = features_text
    return example

# Apply the function to the dataset
dataset_training = dataset_training.map(combine_features)

Map:   0%|          | 0/9976 [00:00<?, ? examples/s]

In [17]:
# Check the updated dataset
dataset_testing

DatasetDict({
    train: Dataset({
        features: ['Label', 'Event ID', 'Service Name', 'Service File Name', 'Service Type', 'Service Start Type', 'Service Account', 'Data Service Name', 'Timestamp', 'ID', 'instruction', 'input'],
        num_rows: 9976
    })
})

In [18]:
dataset_training

DatasetDict({
    train: Dataset({
        features: ['Label', 'Event ID', 'Service Name', 'Service File Name', 'Service Type', 'Service Start Type', 'Service Account', 'Data Service Name', 'Timestamp', 'ID', 'instruction', 'input'],
        num_rows: 9976
    })
})

In [19]:
# Specify the columns to drop
columns_to_drop = [
    'Event ID', 'Service Name', 'Service File Name', 'Service Type',
    'Service Start Type', 'Service Account', 'Data Service Name',
    'Timestamp', 'ID'
]

In [20]:
# Drop the specified columns
dataset_training = dataset_training.remove_columns(columns_to_drop)

# Check the updated dataset
dataset_training

DatasetDict({
    train: Dataset({
        features: ['Label', 'instruction', 'input'],
        num_rows: 9976
    })
})

In [21]:
# Drop the specified columns
dataset_testing = dataset_testing.remove_columns(columns_to_drop)

# Check the updated dataset
dataset_testing

DatasetDict({
    train: Dataset({
        features: ['Label', 'instruction', 'input'],
        num_rows: 9976
    })
})

In [22]:
# Rename the 'Label' column to 'output'
dataset_training = dataset_training.rename_column('Label', 'output')
dataset_training

DatasetDict({
    train: Dataset({
        features: ['output', 'instruction', 'input'],
        num_rows: 9976
    })
})

In [23]:
# Rename the 'Label' column to 'output'
dataset_testing = dataset_testing.rename_column('Label', 'output')
dataset_testing

DatasetDict({
    train: Dataset({
        features: ['output', 'instruction', 'input'],
        num_rows: 9976
    })
})

In [24]:
from datasets import Dataset, DatasetDict

dataset = DatasetDict({
    'train': dataset_training['train'],
    'test': dataset_testing['train']
})

In [25]:
dataset

DatasetDict({
    train: Dataset({
        features: ['output', 'instruction', 'input'],
        num_rows: 9976
    })
    test: Dataset({
        features: ['output', 'instruction', 'input'],
        num_rows: 9976
    })
})

In [29]:
from huggingface_hub import HfApi, HfFolder

# Log in to Hugging Face
HfFolder.save_token("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")

# Upload the dataset to Hugging Face
dataset_id = "goldfishbrain/eventloganalyzer"  # Replace with your username and desired dataset name

dataset.push_to_hub(dataset_id)

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/10 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?it/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/10 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?it/s]

CommitInfo(commit_url='https://huggingface.co/datasets/goldfishbrain/eventloganalyzer/commit/82b4249a57b9ed370ed139f1cffcc5dfceabe40b', commit_message='Upload dataset', commit_description='', oid='82b4249a57b9ed370ed139f1cffcc5dfceabe40b', pr_url=None, pr_revision=None, pr_num=None)

Run the code below twice if it does not work.  lol.....


In [31]:
from datasets import load_dataset
dataset = load_dataset("goldfishbrain/eventloganalyzer", split = "train")

Downloading data:   0%|          | 0.00/260k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/9976 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/9976 [00:00<?, ? examples/s]

In [33]:
dataset['input']

['Event ID: 7045, Service Name: WSearch, Service File Name: C:\\Windows\\System32\\SearchIndexer.exe, Service Type: Own Process, Service Start Type: Auto Start, Service Account: LocalSystem, Data Service Name: WindowsSearch, Timestamp: 2024-09-02T08:41:19.219Z, ID: 41f59d6e-82a5-4753-9b0e-e5a86a411eY, instruction: Classify the following service event as benign or malignant',
 'Event ID: 7045, Service Name: BackupService, Service File Name: C:\\Program Files\\Microsoft Backup\\backup.exe, Service Type: Own Process, Service Start Type: Auto Start, Service Account: NT AUTHORITY\\SYSTEM, Data Service Name: WindowsBackup, Timestamp: 2024-03-22T10:47:13.219Z, ID: 4a6b4e3f-1167-42a4-8a4f-7382f4e5bc23Y, instruction: Classify the following service event as benign or malignant',
 'Event ID: 7045, Service Name: NetmanService, Service File Name: C:\\Windows\\System32\\netman.dll, Service Type: Shared Process, Service Start Type: Demand Start, Service Account: NT AUTHORITY\\LocalService, Data Servi

In [34]:
dataset

Dataset({
    features: ['output', 'instruction', 'input'],
    num_rows: 9976
})

In [35]:
alpaca_prompt = """You are an expert in cybersecurity. You can idenitfy which Windows Event ID 7045 log entries are malignant (hamrful) to the system, and which are benign (generted by Windows or autheticated services). Below is an instruction that describes a task, paired with an input that provides further context to the type of features the service has. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
#pass

In [36]:
dataset = dataset.map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/9976 [00:00<?, ? examples/s]

#### Additional by me

In [37]:
# Split the dataset into training and testing sets
dataset_dict = dataset.train_test_split(test_size=0.005)

In [38]:
train_dataset = dataset_dict['train']
eval_dataset = dataset_dict['test']

In [39]:
train_dataset

Dataset({
    features: ['output', 'instruction', 'input', 'text'],
    num_rows: 9926
})

In [40]:
eval_dataset

Dataset({
    features: ['output', 'instruction', 'input', 'text'],
    num_rows: 50
})

#### Monitoring Fine-Tuning with W&B
Weights & Biases (W&B) is an essential tool for tracking your model's training process and system resource usage. It helps visualize metrics in real time, providing valuable insights into both model performance and GPU utilization.

We'll use W&B to monitor our training process, including evaluation metrics and resource usage:

In [41]:
!pip install wandb

Collecting wandb
  Downloading wandb-0.17.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting docker-pycreds>=0.4.0 (from wandb)
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting gitpython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.43-py3-none-any.whl.metadata (13 kB)
Collecting sentry-sdk>=1.0.0 (from wandb)
  Downloading sentry_sdk-2.13.0-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting setproctitle (from wandb)
  Downloading setproctitle-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.9 kB)
Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.29,>=1.0.0->wandb)
  Downloading gitdb-4.0.11-py3-none-any.whl.metadata (1.2 kB)
Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython!=3.1.29,>=1.0.0->wandb)
  Downloading smmap-5.0.1-py3-none-any.whl.metadata (4.3 kB)
Downloading wandb-0.17.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_6

You can sign up for W&B and get your API key. This setup will allow you to track all the important metrics in real-time.
https://docs.wandb.ai/quickstart

In [52]:
!pip install wandb
import wandb
import random  # for demo script

# Log in to W&B - you'll be prompted to input your API key
wandb.login()



<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: [32m[41mERROR[0m API key must be 40 characters long, yours was 58


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [53]:
# Set W&B environment variables
%env WANDB_WATCH=all
%env WANDB_SILENT=true

env: WANDB_WATCH=all
env: WANDB_SILENT=true


#### Training TinyLLaMA with W&B Integration
Now that everything is set up, it’s time to train the TinyLLaMA model. We'll be using the SFTTrainer from the trl library, along with Weights & Biases (W&B) for real-time tracking of training metrics and resource usage. This step ensures you can monitor your training effectively and make necessary adjustments on the fly.

####*Initializing W&B and Setting Training Arguments*
First, we initialize W&B and set up the training arguments:

In [55]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from transformers.utils import logging
#import wandb

logging.set_verbosity_info()

# Initialize W&B
project_name = "tiny-llama"
entity = "wandb"
#wandb.init(project=project_name, name="unsloth-tiny-llama")

# Define training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=2,           # Small batch size due to limited GPU memory
    gradient_accumulation_steps=4,           # Accumulate gradients over 4 steps
    evaluation_strategy="steps",             # Evaluate after a certain number of steps
    warmup_ratio=0.1,                        # Warm-up learning rate over 10% of training
    num_train_epochs=1,                      # Number of epochs
    learning_rate=2e-4,                      # Learning rate for the optimizer
    fp16=not is_bfloat16_supported(),        # Use FP16 if BF16 is not supported
    bf16=is_bfloat16_supported(),            # Use BF16 if supported (more efficient on Ampere GPUs)
    max_steps=20,                            # Cap training at 20 steps for quick experimentation, increase or comment out as you see fit
    logging_steps=1,                         # Log metrics every step
    optim="adamw_8bit",                      # Use 8-bit AdamW optimizer to save memory
    weight_decay=0.1,                        # Regularization to avoid overfitting
    lr_scheduler_type="linear",              # Use linear learning rate decay
    seed=3407,                               # Random seed for reproducibility
    #report_to="wandb",                       # Enable logging to W&B
    output_dir="outputs",                    # Directory to save model outputs
)

using `logging_steps` to initialize `eval_steps` to 1
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [56]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,     # Training dataset
    eval_dataset=eval_dataset,       # Evaluation dataset
    dataset_text_field="text",               # The field containing text in the dataset
    max_seq_length=max_seq_length,           # Max sequence length for inputs
    dataset_num_proc=2,                      # Number of processes for dataset loading
    packing=True,                            # Packs short sequences together to save time
    args=training_args)                      # Training arguments defined earlier

PyTorch: setting up devices
max_steps is given, it will override any value given in num_train_epochs
Using auto half precision backend


Next, we set up the SFTTrainer:

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,454 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 20
 "-____-"     Number of trainable parameters = 65,536,000
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


Step,Training Loss,Validation Loss



***** Running Evaluation *****
  Num examples = 7
  Batch size = 8


OutOfMemoryError: CUDA out of memory. Tried to allocate 70.00 MiB. GPU 

SyntaxError: incomplete input (<ipython-input-21-c6da8748483b>, line 10)

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [57]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
PyTorch: setting up devices
max_steps is given, it will override any value given in num_train_epochs
Using auto half precision backend


In [58]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
7.504 GB of memory reserved.


In [59]:
from peft import LoraConfig, get_peft_model
# Assuming `model` is your base model, e.g., LlamaForCausalLM
# Define the LoRA configuration
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "embed_tokens", "lm_head"],
    lora_dropout=0.1,
    bias="none"
)

# Apply LoRA to the base model
model = get_peft_model(model, lora_config)

In [60]:
# Now proceed with the training
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 9,976 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 60,806,144
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


[34m[1mwandb[0m: Currently logged in as: [33mqerberos[0m ([33mqerberos-code[0m). Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
1,2.4269
2,2.3278
3,2.3448
4,2.224
5,2.0079
6,2.0067
7,1.7664
8,1.5676
9,1.517
10,1.3728


Saving model checkpoint to outputs/checkpoint-60


Training completed. Do not forget to share your model on huggingface.co/models =)




In [61]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

1400.8534 seconds used for training.
23.35 minutes used for training.
Peak reserved memory = 8.549 GB.
Peak reserved memory for training = 1.045 GB.
Peak reserved memory % of max memory = 57.967 %.
Peak reserved memory for training % of max memory = 7.086 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

In [62]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Classify the following service event as benign or malignant.", # instruction
        "7045, NetmanService, C:\Windows\System32\netman.dll, Shared Process, Demand Start, NT AUTHORITY\LocalService, WindowsUpdate, 2024-09-02T08:45:12.219Z, 4c7a23e1-9b83-46a8-9d12-8a5f4e1fY", # input
        "", # output - leave this blank for generation! beign
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

['You are an expert in cybersecurity. You can idenitfy which Windows Event ID 7045 log entries are malignant (hamrful) to the system, and which are benign (generted by Windows or autheticated services). Below is an instruction that describes a task, paired with an input that provides further context to the type of features the service has. Write a response that appropriately completes the request.\n\n### Instruction:\nClassify the following service event as benign or malignant.\n\n### Input:\n7045, NetmanService, C:\\Windows\\System32\netman.dll, Shared Process, Demand Start, NT AUTHORITY\\LocalService, WindowsUpdate, 2024-09-02T08:45:12.219Z, 4c7a23e1-9b83-46a8-9d12-8a5f4e1fY\n\n### Response:\nbenign<|endoftext|>']

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [65]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Classify the following service event as benign or malignant.", # instruction
        "7045, NetmanService, C:\Windows\System32\netman.dll, Shared Process, Demand Start, NT AUTHORITY\LocalService, WindowsUpdate, 2024-09-02T08:45:12.219Z, 4c7a23e1-9b83-46a8-9d12-8a5f4e1fY", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

You are an expert in cybersecurity. You can idenitfy which Windows Event ID 7045 log entries are malignant (hamrful) to the system, and which are benign (generted by Windows or autheticated services). Below is an instruction that describes a task, paired with an input that provides further context to the type of features the service has. Write a response that appropriately completes the request.

### Instruction:
Classify the following service event as benign or malignant.

### Input:
7045, NetmanService, C:\Windows\System32
etman.dll, Shared Process, Demand Start, NT AUTHORITY\LocalService, WindowsUpdate, 2024-09-02T08:45:12.219Z, 4c7a23e1-9b83-46a8-9d12-8a5f4e1fY

### Response:
benign<|endoftext|>


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [69]:
# prompt: how to connect my google drive (ignore this part)

from google.colab import drive
drive.mount('/content/drive')


MessageError: Error: credential propagation was unsuccessful

In [None]:
model.save_pretrained("/content/drive/MyDrive/Colab Notebooks/hackathon/saved model/pi_model") # Local saving. (ignore it)
tokenizer.save_pretrained("/content/drive/MyDrive/Colab Notebooks/hackathon/saved model/pi_model")



('/content/drive/MyDrive/Colab Notebooks/hackathon/saved model/pi_model/tokenizer_config.json',
 '/content/drive/MyDrive/Colab Notebooks/hackathon/saved model/pi_model/special_tokens_map.json',
 '/content/drive/MyDrive/Colab Notebooks/hackathon/saved model/pi_model/tokenizer.model',
 '/content/drive/MyDrive/Colab Notebooks/hackathon/saved model/pi_model/added_tokens.json',
 '/content/drive/MyDrive/Colab Notebooks/hackathon/saved model/pi_model/tokenizer.json')

In [70]:
model.push_to_hub("goldfishbrain/pi_model", token = "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") # Online hugging face saving
tokenizer.push_to_hub("goldfishbrain/pi_model", token = "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") # Online hugging face saving

Uploading the following files to goldfishbrain/pi_model: adapter_config.json,adapter_model.safetensors,README.md


  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/900M [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

Uploading the following files to goldfishbrain/pi_model: added_tokens.json,tokenizer_config.json,README.md,tokenizer.json,tokenizer.model,special_tokens_map.json


  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

In [None]:
#model.save_pretrained("/content/drive/MyDrive/Colab Notebooks/hackathon/saved model/pi_model") # Local saving
#tokenizer.save_pretrained("/content/drive/MyDrive/Colab Notebooks/hackathon/saved model/pi_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [72]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        #model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        model_name = "pi_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference
# a prompt template
alpaca_prompt = """You are an expert in cybersecurity. You can idenitfy which Windows Event ID 7045 log entries are malignant (hamrful) to the system, and which are benign (generted by Windows or autheticated services). Below is an instruction that describes a task, paired with an input that provides further context to the type of features the service has. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

# give the model a input
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Classify the following service event as benign or malignant", # instruction for the system
        "7045, SuspiciousStartup, %windir%\system32\cmd.exe /c powershell -enc JABzAGUAYwB... ( encoded command ), Own Process, Auto Star, LocalSystem, tG4, 2024-08-20T08:41:12.219Z, 42f51f8c-1234-4422-9a87-1a2b3c4d5ex", # input
        "", # output - leave this blank for generation! malcicious, i mistakenly mentioned in template malignant
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)







You are an expert in cybersecurity. You can idenitfy which Windows Event ID 7045 log entries are malignant (hamrful) to the system, and which are benign (generted by Windows or autheticated services). Below is an instruction that describes a task, paired with an input that provides further context to the type of features the service has. Write a response that appropriately completes the request.

### Instruction:
Classify the following service event as benign or malignant

### Input:
7045, SuspiciousStartup, %windir%\system32\cmd.exe /c powershell -enc JABzAGUAYwB... ( encoded command ), Own Process, Auto Star, LocalSystem, tG4, 2024-08-20T08:41:12.219Z, 42f51f8c-1234-4422-9a87-1a2b3c4d5ex

### Response:
malicious<|endoftext|>


You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [73]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [75]:
# Merge to 16bit
if False: model.save_pretrained_merged("pi2_model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("goldfishbrain/pi2_model", tokenizer, save_method = "merged_16bit", token = "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")

# Merge to 4bit
if False: model.save_pretrained_merged("pi2_model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("goldfishbrain/pi2_model", tokenizer, save_method = "merged_4bit", token = "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")

# Just LoRA adapters
if False: model.save_pretrained_merged("pi2_model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("goldfishbrain/mopi2_modeldel", tokenizer, save_method = "lora", token = "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")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)

In [76]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("guff_pi", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in `llama.cpp` or a UI based system like `GPT4All`. You can install GPT4All by going [here](https://gpt4all.io/index.html).

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)
9. [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)
10. [**NEW**] We make Gemma-2 9b / 27b **2x faster**! See our [Gemma-2 9b notebook](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing)
11. [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)
12. [**NEW**] We make Mistral NeMo 12B 2x faster and fit in under 12GB of VRAM! [Mistral NeMo notebook](https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>