### Installation

In [4]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9\.]{3,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.32.post2" if v == "2.8.0" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo"
    !pip install "unsloth[base] @ git+https://github.com/unslothai/unsloth"
!pip install transformers==4.55.4

In [2]:
from google.colab import drive
drive.mount('/content/drive')
BASE_PATH=f'/content/drive/MyDrive/LLM_TKIG/'

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Unsloth

`FastModel` supports loading nearly any model now! This includes Vision and Text models!

In [6]:
from unsloth import FastModel,FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = f"{BASE_PATH}/models/qwen3_1.7b_finetuned",
        max_seq_length = 2048,
        load_in_4bit = False,
    )

==((====))==  Unsloth 2025.8.9: Fast Qwen3 patching. Transformers: 4.55.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2025.8.9 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


We now add LoRA adapters so we only need to update a small amount of parameters!

In [7]:
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth: Already have LoRA adapters! We shall skip this step.


### Data Prep


In [8]:
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "qwen3-instruct",
)

We now use `standardize_data_formats` to try converting datasets to the correct format for finetuning purposes!

## Load TTP Classification Dataset

We now load the augmented TTP dataset and format it for fine-tuning Qwen model on TTP detection task.


In [9]:
import json
from datasets import Dataset

# Load TTP classification dataset
with open(f"{BASE_PATH}/data/TTP-classification/augmented_ttp_dataset_20250824_070057.json", "r", encoding="utf-8") as f:
    ttp_data = json.load(f)

print(f"Loaded {len(ttp_data['dataset'])} TTP classification examples")
print(f"First example instruction: {ttp_data['dataset'][0]['instruction'][:200]}...")
print(f"First example output: {ttp_data['dataset'][0]['output']}")


Loaded 1376 TTP classification examples
First example instruction: Adversaries may inject malicious code into process via Extra Window Memory (EWM) in order to evade process-based defenses as well as possibly elevate privileges. EWM injection is a method of executing...
First example output: {'techniques': [{'id': 'T1055.011', 'name': 'Extra Window Memory Injection', 'description': "Adversaries may inject malicious code into process via Extra Window Memory (EWM) in order to evade process-based defenses as well as possibly elevate privileges. EWM injection is a method of executing arbitrary code in the address space of a separate live process. \n\nBefore creating a window, graphical Windows-based processes must prescribe to or register a windows class, which stipulate appearance and behavior (via windows procedures, which are functions that handle input/output of data).(Citation: Microsoft Window Classes) Registration of new windows classes can include a request for up to 40 bytes of EW

In [10]:
dataset_records = []
system_prompt = """You are a cybersecurity analyst trained to identify Tactics, Techniques, and Procedures (TTPs) in threat intelligence. Your task is to analyze the description of adversary activity and accurately determine the TTP ID according to the MITRE ATT&CK framework. Return a response containing only the TTP ID (e.g., "T1055.011")."""

for item in ttp_data["dataset"]:
    if item.get("instruction") and item.get("output"):
        instruction = item["instruction"],
        ttp_id =  item["output"]['techniques'][0]['id']
        user_prompt = f'Analyze the following description and identify the corresponding TTP ID: \n{instruction}'
        conversation = [
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt},
                {"role": "assistant", "content": ttp_id}
            ]
        dataset_records.append({"conversations": conversation})

dataset = Dataset.from_list(dataset_records)
print(f"Created dataset with {len(dataset)} examples")

# Take a subset for faster training during development
# dataset = dataset.select(range(100))  # Uncomment for quick testing
print(f"Using {len(dataset)} examples for training")


Created dataset with 1376 examples
Using 1376 examples for training


In [11]:
dataset[0]

{'conversations': [{'content': 'You are a cybersecurity analyst trained to identify Tactics, Techniques, and Procedures (TTPs) in threat intelligence. Your task is to analyze the description of adversary activity and accurately determine the TTP ID according to the MITRE ATT&CK framework. Return a response containing only the TTP ID (e.g., "T1055.011").',
   'role': 'system'},
  {'content': 'Analyze the following description and identify the corresponding TTP ID: \n("Adversaries may inject malicious code into process via Extra Window Memory (EWM) in order to evade process-based defenses as well as possibly elevate privileges. EWM injection is a method of executing arbitrary code in the address space of a separate live process. \\n\\nBefore creating a window, graphical Windows-based processes must prescribe to or register a windows class, which stipulate appearance and behavior (via windows procedures, which are functions that handle input/output of data).(Citation: Microsoft Window Cla

In [12]:
def formatting_prompts_func(examples):
   """
   Create prompt for entity and relationship extraction focusing on core cybersecurity entity types.
   """
   convos = examples["conversations"]
   texts = [tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False) for convo in convos]
   return {"text": texts}



dataset = dataset.map(formatting_prompts_func, batched = True)
print(f"Dataset formatted for training with {len(dataset)} examples")

Map:   0%|          | 0/1376 [00:00<?, ? examples/s]

Dataset formatted for training with 1376 examples


In [14]:
# Apply formatting function to dataset
dataset = dataset.map(formatting_prompts_func, batched=True)
print(f"Dataset formatted for training with {len(dataset)} examples")

# Check formatted example
print("\\nFirst formatted example:")
print(dataset[0]["text"][:500] + "..." if len(dataset[0]["text"]) > 500 else dataset[0]["text"])


Map:   0%|          | 0/1376 [00:00<?, ? examples/s]

Dataset formatted for training with 1376 examples
\nFirst formatted example:
<|im_start|>system
You are a cybersecurity analyst trained to identify Tactics, Techniques, and Procedures (TTPs) in threat intelligence. Your task is to analyze the description of adversary activity and accurately determine the TTP ID according to the MITRE ATT&CK framework. Return a response containing only the TTP ID (e.g., "T1055.011").<|im_end|>
<|im_start|>user
Analyze the following description and identify the corresponding TTP ID: 
("Adversaries may inject malicious code into process via...


In [15]:
# Test the formatting function with a small sample
test_sample = dataset.select(range(2))
print("Testing formatting function with 2 samples...")

for i, example in enumerate(test_sample):
    print(f"\\n=== Example {i+1} ===")
    print("Formatted text length:", len(example["text"]))
    print("\\nFirst 300 characters:")
    print(example["text"][:300] + "...")

    # Check if it contains expected elements
    if "system" in example["text"] and "user" in example["text"] and "assistant" in example["text"]:
        print("✓ Contains system, user, and assistant messages")
    else:
        print("✗ Missing expected message types")

    if "TTP" in example["text"] and "technique" in example["text"]:
        print("✓ Contains TTP-related content")
    else:
        print("✗ Missing TTP-related content")

print(f"\\n✅ TTP dataset successfully formatted with {len(dataset)} examples ready for training!")


Testing formatting function with 2 samples...
\n=== Example 1 ===
Formatted text length: 2848
\nFirst 300 characters:
<|im_start|>system
You are a cybersecurity analyst trained to identify Tactics, Techniques, and Procedures (TTPs) in threat intelligence. Your task is to analyze the description of adversary activity and accurately determine the TTP ID according to the MITRE ATT&CK framework. Return a response conta...
✓ Contains system, user, and assistant messages
✗ Missing TTP-related content
\n=== Example 2 ===
Formatted text length: 2690
\nFirst 300 characters:
<|im_start|>system
You are a cybersecurity analyst trained to identify Tactics, Techniques, and Procedures (TTPs) in threat intelligence. Your task is to analyze the description of adversary activity and accurately determine the TTP ID according to the MITRE ATT&CK framework. Return a response conta...
✓ Contains system, user, and assistant messages
✓ Contains TTP-related content
\n✅ TTP dataset successfully formatted with 1

Let's see how row 100 looks like!

In [16]:
dataset[100]

{'conversations': [{'content': 'You are a cybersecurity analyst trained to identify Tactics, Techniques, and Procedures (TTPs) in threat intelligence. Your task is to analyze the description of adversary activity and accurately determine the TTP ID according to the MITRE ATT&CK framework. Return a response containing only the TTP ID (e.g., "T1055.011").',
   'role': 'system'},
  {'content': "Analyze the following description and identify the corresponding TTP ID: \n('Adversaries may attempt to access the Cloud Instance Metadata API to collect credentials and other sensitive data.\\n\\nMost cloud service providers support a Cloud Instance Metadata API which is a service provided to running virtual instances that allows applications to access information about the running virtual instance. Available information generally includes name, security group, and additional metadata including sensitive data such as credentials and UserData scripts that may contain additional secrets. The Instanc

We now have to apply the chat template for `Qwen-3` onto the conversations, and save it to `text`.

Let's see how the chat template did!


In [17]:
dataset[100]['text']

'<|im_start|>system\nYou are a cybersecurity analyst trained to identify Tactics, Techniques, and Procedures (TTPs) in threat intelligence. Your task is to analyze the description of adversary activity and accurately determine the TTP ID according to the MITRE ATT&CK framework. Return a response containing only the TTP ID (e.g., "T1055.011").<|im_end|>\n<|im_start|>user\nAnalyze the following description and identify the corresponding TTP ID: \n(\'Adversaries may attempt to access the Cloud Instance Metadata API to collect credentials and other sensitive data.\\n\\nMost cloud service providers support a Cloud Instance Metadata API which is a service provided to running virtual instances that allows applications to access information about the running virtual instance. Available information generally includes name, security group, and additional metadata including sensitive data such as credentials and UserData scripts that may contain additional secrets. The Instance Metadata API is pr

<a name="Train"></a>
### Train the model
Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

In [26]:
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    eval_dataset = None,
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 500,
        learning_rate = 2e-4,
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir="outputs",
        save_steps=50,
        save_total_limit=2,
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/1376 [00:00<?, ? examples/s]

We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes!

In [19]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|im_start|>user\n",
    response_part = "<|im_start|>assistant\n",
)

Map (num_proc=2):   0%|          | 0/1376 [00:00<?, ? examples/s]

Let's verify masking the instruction part is done! Let's print the 100th row again.

In [20]:
tokenizer.decode(trainer.train_dataset[100]["input_ids"])

'<|im_start|>system\nYou are a cybersecurity analyst trained to identify Tactics, Techniques, and Procedures (TTPs) in threat intelligence. Your task is to analyze the description of adversary activity and accurately determine the TTP ID according to the MITRE ATT&CK framework. Return a response containing only the TTP ID (e.g., "T1055.011").<|im_end|>\n<|im_start|>user\nAnalyze the following description and identify the corresponding TTP ID: \n(\'Adversaries may attempt to access the Cloud Instance Metadata API to collect credentials and other sensitive data.\\n\\nMost cloud service providers support a Cloud Instance Metadata API which is a service provided to running virtual instances that allows applications to access information about the running virtual instance. Available information generally includes name, security group, and additional metadata including sensitive data such as credentials and UserData scripts that may contain additional secrets. The Instance Metadata API is pr

Now let's print the masked out example - you should see only the answer is present:

In [21]:
tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100]["labels"]]).replace(tokenizer.pad_token, " ")

'                                                                                                                                                                                                                                                                                                                                                             T1522<|im_end|>\n'

In [22]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
3.393 GB of memory reserved.


Let's train the model! To resume a training run, set `trainer.train(resume_from_checkpoint = True)`

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,376 | Num Epochs = 3 | Total steps = 500
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 17,432,576 of 1,738,007,552 (1.00% trained)


Step,Training Loss
10,2.6423
20,1.7666
30,1.6286
40,1.5479
50,1.6211
60,1.4952
70,1.5061
80,1.5223
90,1.4554
100,1.4564


In [24]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

312.5172 seconds used for training.
5.21 minutes used for training.
Peak reserved memory = 5.562 GB.
Peak reserved memory for training = 2.169 GB.
Peak reserved memory % of max memory = 37.731 %.
Peak reserved memory for training % of max memory = 14.714 %.


<a name="Inference"></a>
### Inference
Let's run the model via Unsloth native inference! According to the `Qwen-3` team, the recommended settings for instruct inference are `temperature = 0.7, top_p = 0.8, top_k = 20`

For reasoning chat based inference, `temperature = 0.6, top_p = 0.95, top_k = 20`

In [25]:
messages = [
    {"role" : "user", "content" : "Adversaries may inject malicious code into process via Extra Window Memory (EWM) in order to evade process-based defenses as well as possibly elevate privileges. EWM injection is a method of executing arbitrary code in the address space of a separate live process. \n\nBefore creating a window, graphical Windows-based processes must prescribe to or register a windows class, which stipulate appearance and behavior (via windows procedures, which are functions that handle input/output of data).(Citation: Microsoft Window Classes) Registration of new windows classes can include a request for up to 40 bytes of EWM to be appended to the allocated memory of each instance of that class. This EWM is intended to store data specific to that window and has specific application programming interface (API) functions to set and get its value. (Citation: Microsoft GetWindowLong function) (Citation: Microsoft SetWindowLong function)\n\nAlthough small, the EWM is large enough to store a 32-bit pointer and is often used to point to a windows procedure. Malware may possibly utilize this memory location in part of an attack chain that includes writing code to shared sections of the process’s memory, placing a pointer to the code in EWM, then invoking execution by returning execution control to the address in the process’s EWM.\n\nExecution granted through EWM injection may allow access to both the target process's memory and possibly elevated privileges. Writing payloads to shared sections also avoids the use of highly monitored API calls such as <code>WriteProcessMemory</code> and <code>CreateRemoteThread</code>.(Citation: Elastic Process Injection July 2017) More sophisticated malware samples may also potentially bypass protection mechanisms such as data execution prevention (DEP) by triggering a combination of windows procedures and other system functions that will rewrite the malicious payload inside an executable portion of the target process.  (Citation: MalwareTech Power Loader Aug 2013) (Citation: WeLiveSecurity Gapz and Redyms Mar 2013)\n\nRunning code in the context of another process may allow access to the process's memory, system/network resources, and possibly elevated privileges. Execution via EWM injection may also evade detection from security products since the execution is masked under a legitimate process."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, # Must add for generation
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 1000, # Increase for longer outputs!
    temperature = 0.7, top_p = 0.8, top_k = 20, # For non thinking
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

T1166.005

This technique involves using Extra Window Memory (EWM) to execute arbitrary code in the address space of a separate live process. EWM is a memory region that is allocated when a window class is registered and is used to store data specific to that window. Malicious actors may exploit EWM to evade process-based defenses and elevate privileges.

The EWM is typically accessed using API functions such as GetWindowLong and SetWindowLong, which allow malware to read and write data to the memory location associated with a window. By placing a pointer to malicious code in EWM, malware can then invoke execution by returning control to the address in the process's EWM.

This technique is often used in attack chains where malware communicates with a C2 server or uses a payload to execute a piece of code. EWM injection can also be used to bypass protections such as data execution prevention (DEP) and execute code in memory.

Mitigation strategies include monitoring for unusual memory a

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/chat_template.jinja',
 'lora_model/vocab.json',
 'lora_model/merges.txt',
 'lora_model/added_tokens.json',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 2048,
        load_in_4bit = True,
    )