### **Fine-Tuning Dataset**

In [None]:
import pandas as pd
import numpy as np
import json

In [None]:
df = pd.read_csv("/content/Attack_Dataset.csv")

In [None]:
df.isnull().sum()

Unnamed: 0,0
ID,0
Title,0
Category,0
Attack Type,0
Scenario Description,0
Tools Used,14
Attack Steps,0
Target Type,4
Vulnerability,18
MITRE Technique,24


In [None]:
df = df.drop(columns = ["Unnamed: 15"],axis = 1)

In [None]:
df = df.dropna()

In [None]:
df.isnull().sum()

Unnamed: 0,0
ID,0
Title,0
Category,0
Attack Type,0
Scenario Description,0
Tools Used,0
Attack Steps,0
Target Type,0
Vulnerability,0
MITRE Technique,0


In [None]:
print(df.duplicated().sum())

0


In [None]:
df.head()

Unnamed: 0,ID,Title,Category,Attack Type,Scenario Description,Tools Used,Attack Steps,Target Type,Vulnerability,MITRE Technique,Impact,Detection Method,Solution,Tags,Source
0,1,Authentication Bypass via SQL Injection,Mobile Security,SQL Injection (SQLi),A login form fails to validate or sanitize inp...,"Browser, Burp Suite, SQLMap",1. Reconnaissance: Find a login form on the we...,"Web Login Portals (e.g., banking, admin dashbo...",Unsanitized input fields in SQL queries,"T1078 (Valid Accounts), T1190 (Exploit Public-...","Full account takeover, data theft, privilege e...","Web server logs, anomaly detection (e.g., logi...","Use prepared statements, Sanitize inputs, Limi...","SQLi, Authentication Bypass, Web Security, OWA...","OWASP, MITRE ATT&CK, DVWA"
1,2,Union-Based SQL Injection,AI Agents & LLM Exploits,SQL Injection,This attack occurs when a hacker uses the SQL ...,"SQLMap, Burp Suite, Havij, Browser Developer T...",1. Identify User Input Points: Attacker finds ...,"Web Applications, Login Pages, Search Forms",Improperly filtered input fields that allow SQ...,T1190 – Exploit Public-Facing Application,"Data leakage, Credential theft, Account takeov...",Web Application Firewalls (WAF)Log AnalysisInp...,Use parameterized queries (Prepared Statements...,#SQLInjection #WebSecurity #UnionAttack #OWASP...,"OWASP, MITRE ATT&CK, Acunetix, PortSwigger Web..."
2,3,Error-Based SQL Injection,AI Agents & LLM Exploits,SQL Injection,This attack occurs when an attacker intentiona...,"SQLMap, Burp Suite, Manual Browser Testing, Havij",1. Identify Input Points:Attacker finds a fiel...,"Web Applications, Login Forms, URL Parameters,...",Error message exposure due to lack of input va...,T1190 – Exploit Public-Facing Application,"Information disclosure, Database structure exp...",Review and monitor error logsEnable generic er...,Turn off detailed error messages in production...,#SQLInjection #ErrorLeakage #WebAppSecurity #O...,"OWASP, MITRE ATT&CK, Acunetix, PortSwigger Web..."
3,4,Blind SQL Injection,AI Agents & LLM Exploits,SQL Injection,"In Blind SQL Injection, the attacker doesn’t s...","SQLMap, Burp Suite, sqlninja, Manual Browser T...",1. Find a User Input Point:Attacker finds a pl...,"Web Applications, Login Pages, Search Fields, ...","No error messages, but user input is still pas...",T1190 – Exploit Public-Facing Application,Slow and stealthy data theftFull database comp...,Monitor for slow and repetitive requestsAnalyz...,Use parameterized queries (prepared statements...,#BlindSQLi #TimeBasedSQLi #WebAppSecurity #OWA...,"OWASP, MITRE ATT&CK, Acunetix, PortSwigger, SQ..."
4,5,Second-Order SQL Injection,AI Agents & LLM Exploits,SQL Injection,"In a Second-Order SQL Injection, the attacker ...","Burp Suite, SQLMap, Postman, Browser Dev Tools...",1. Identify Stored Input Fields:The attacker l...,"Web Applications, User Registration Forms, Pro...",Trusting previously stored unvalidated data in...,T1505.003 – SQL Injection,Delayed data theftUnexpected system behaviorSe...,Log monitoring for delayed query failuresTrack...,Sanitize and validate inputs both at entry and...,#SecondOrderSQLi #DelayedInjection #StoredInje...,"OWASP, MITRE ATT&CK, PortSwigger Academy, Acun..."


In [None]:
text = "Category : Mobile Security\nAttack Type : SQL Injection\nSolution : Use prepared statements"
print(text)

Category : Mobile Security
Attack Type : SQL Injection
Solution : Use prepared statements


In [None]:
system_prompt = """You are a SOC analyst.
Read the title of the Attack provided. Based on the Attack Type provide the description of scenario,
The associated MITRE Technique. Also provide for each title associated Category of attack, type of attack.
Once done provide the best solution or action to take"""

In [None]:
file = open("Fine_Tune_Attack_Dataset.jsonl","w")
for _,row in df.iterrows() :
  data = {
      "messages" : [
          {"role" : "system","content" : system_prompt},
          {"role" : "user","content" : str(row["Title"])},
          {"role" : "assistant","content" : "Category : " + str(row["Category"]) + "\nAttack Type : " + str(row["Attack Type"]) + "\nSolution : " + str(row["Solution"])}
      ]
  }
  file.write(json.dumps(data) + "\n")
file.close()

In [None]:
df = pd.read_json("/content/Fine_Tune_Attack_Dataset.jsonl",lines = True)

In [None]:
df.head()

Unnamed: 0,messages
0,"[{'role': 'system', 'content': 'You are a SOC ..."
1,"[{'role': 'system', 'content': 'You are a SOC ..."
2,"[{'role': 'system', 'content': 'You are a SOC ..."
3,"[{'role': 'system', 'content': 'You are a SOC ..."
4,"[{'role': 'system', 'content': 'You are a SOC ..."


In [None]:
from sklearn.model_selection import train_test_split
df_train,df_val = train_test_split(df,test_size = 0.2,
                                               random_state = 42)
print(df_train.shape)
print(df_val.shape)

(11141, 1)
(2786, 1)


In [None]:
df_train.to_json("Fine_Tune_GPT_Train.jsonl",orient = "records",
                 lines = True)
df_val.to_json("Fine_Tune_GPT_Test.jsonl",orient = "records",
                 lines = True)

### **Fine Tuning Job**

In [None]:
from openai import OpenAI
from google.colab import userdata

# Load the API key securely from Colab secrets
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

In [None]:
client = OpenAI(api_key = OPENAI_API_KEY)
client.files.create(
    file = open("/content/Fine_Tune_GPT_Train.jsonl","rb"),
    purpose = "fine-tune"
)

FileObject(id='file-T1Kg6bFKaWeF4L6257NmoQ', bytes=6631837, created_at=1763140419, filename='Fine_Tune_GPT_Train.jsonl', object='file', purpose='fine-tune', status='processed', expires_at=None, status_details=None)

In [None]:
client.files.create(
    file = open("/content/Fine_Tune_GPT_Test.jsonl","rb"),
    purpose = "fine-tune"
)

FileObject(id='file-LkTQi8cGGs2HbAVE74sfmv', bytes=1659342, created_at=1763140482, filename='Fine_Tune_GPT_Test.jsonl', object='file', purpose='fine-tune', status='processed', expires_at=None, status_details=None)

In [None]:
client.fine_tuning.jobs.create(
    model = "gpt-4o-mini-2024-07-18",
    training_file = "file-T1Kg6bFKaWeF4L6257NmoQ",
    validation_file = "file-LkTQi8cGGs2HbAVE74sfmv",
    hyperparameters = {"n_epochs" : 3}
)

FineTuningJob(id='ftjob-5rnXVbxJiziAh4YP0kpS9yk3', created_at=1763140613, error=Error(code=None, message=None, param=None), fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(batch_size='auto', learning_rate_multiplier='auto', n_epochs=3), model='gpt-4o-mini-2024-07-18', object='fine_tuning.job', organization_id='org-1ASoeatyKpzdP7XUIEeN3alQ', result_files=[], seed=169862197, status='validating_files', trained_tokens=None, training_file='file-T1Kg6bFKaWeF4L6257NmoQ', validation_file='file-LkTQi8cGGs2HbAVE74sfmv', estimated_finish=None, integrations=[], metadata=None, method=Method(type='supervised', dpo=None, reinforcement=None, supervised=SupervisedMethod(hyperparameters=SupervisedHyperparameters(batch_size='auto', learning_rate_multiplier='auto', n_epochs=3))), user_provided_suffix=None, usage_metrics=None, shared_with_openai=False, eval_id=None)

### **Using the Fine - Tuned Model**

In [None]:
from openai import OpenAI

In [None]:
system_prompt = """You are a SOC analyst.
Read the title of the Attack provided. Based on the Attack Type provide the description of scenario,
The associated MITRE Technique. Also provide for each title associated Category of attack, type of attack.
Once done provide the best solution or action to take"""

In [None]:
client = OpenAI(api_key = OPENAI_API_KEY)
response = client.responses.create(
    model = "ft:gpt-4o-mini-2024-07-18:shoeb-sutar::CbsZknC6",
    input = [
        {"role" : "system","content" : system_prompt},
        {"role" : "user","content" : "Clickjacking with Popups (Window Redressing)"}
    ]
)
print(response.output_text)

Category : Web Application Security
Attack Type : Clickjacking / UI Redressing
Solution : Use X-Frame-Options; block popups in modern UI workflow


### **Evaluating Fine Tuned Model**

In [None]:
import pandas as pd
import numpy as np
from openai import OpenAI

In [None]:
df = pd.read_csv("/content/Attack_Dataset.csv")

In [None]:
print(df.isnull().sum())

ID                          0
Title                       0
Category                    0
Attack Type                 0
Scenario Description        0
Tools Used                 14
Attack Steps                0
Target Type                 4
Vulnerability              18
MITRE Technique            24
Impact                      3
Detection Method            4
Solution                    3
Tags                        3
Source                    160
Unnamed: 15             14087
dtype: int64


In [None]:
df = df.drop(columns = ["Unnamed: 15"],axis = 1)

In [None]:
df = df.dropna()

In [None]:
print(df.duplicated().sum())

0


In [None]:
df.head()

Unnamed: 0,ID,Title,Category,Attack Type,Scenario Description,Tools Used,Attack Steps,Target Type,Vulnerability,MITRE Technique,Impact,Detection Method,Solution,Tags,Source
0,1,Authentication Bypass via SQL Injection,Mobile Security,SQL Injection (SQLi),A login form fails to validate or sanitize inp...,"Browser, Burp Suite, SQLMap",1. Reconnaissance: Find a login form on the we...,"Web Login Portals (e.g., banking, admin dashbo...",Unsanitized input fields in SQL queries,"T1078 (Valid Accounts), T1190 (Exploit Public-...","Full account takeover, data theft, privilege e...","Web server logs, anomaly detection (e.g., logi...","Use prepared statements, Sanitize inputs, Limi...","SQLi, Authentication Bypass, Web Security, OWA...","OWASP, MITRE ATT&CK, DVWA"
1,2,Union-Based SQL Injection,AI Agents & LLM Exploits,SQL Injection,This attack occurs when a hacker uses the SQL ...,"SQLMap, Burp Suite, Havij, Browser Developer T...",1. Identify User Input Points: Attacker finds ...,"Web Applications, Login Pages, Search Forms",Improperly filtered input fields that allow SQ...,T1190 – Exploit Public-Facing Application,"Data leakage, Credential theft, Account takeov...",Web Application Firewalls (WAF)Log AnalysisInp...,Use parameterized queries (Prepared Statements...,#SQLInjection #WebSecurity #UnionAttack #OWASP...,"OWASP, MITRE ATT&CK, Acunetix, PortSwigger Web..."
2,3,Error-Based SQL Injection,AI Agents & LLM Exploits,SQL Injection,This attack occurs when an attacker intentiona...,"SQLMap, Burp Suite, Manual Browser Testing, Havij",1. Identify Input Points:Attacker finds a fiel...,"Web Applications, Login Forms, URL Parameters,...",Error message exposure due to lack of input va...,T1190 – Exploit Public-Facing Application,"Information disclosure, Database structure exp...",Review and monitor error logsEnable generic er...,Turn off detailed error messages in production...,#SQLInjection #ErrorLeakage #WebAppSecurity #O...,"OWASP, MITRE ATT&CK, Acunetix, PortSwigger Web..."
3,4,Blind SQL Injection,AI Agents & LLM Exploits,SQL Injection,"In Blind SQL Injection, the attacker doesn’t s...","SQLMap, Burp Suite, sqlninja, Manual Browser T...",1. Find a User Input Point:Attacker finds a pl...,"Web Applications, Login Pages, Search Fields, ...","No error messages, but user input is still pas...",T1190 – Exploit Public-Facing Application,Slow and stealthy data theftFull database comp...,Monitor for slow and repetitive requestsAnalyz...,Use parameterized queries (prepared statements...,#BlindSQLi #TimeBasedSQLi #WebAppSecurity #OWA...,"OWASP, MITRE ATT&CK, Acunetix, PortSwigger, SQ..."
4,5,Second-Order SQL Injection,AI Agents & LLM Exploits,SQL Injection,"In a Second-Order SQL Injection, the attacker ...","Burp Suite, SQLMap, Postman, Browser Dev Tools...",1. Identify Stored Input Fields:The attacker l...,"Web Applications, User Registration Forms, Pro...",Trusting previously stored unvalidated data in...,T1505.003 – SQL Injection,Delayed data theftUnexpected system behaviorSe...,Log monitoring for delayed query failuresTrack...,Sanitize and validate inputs both at entry and...,#SecondOrderSQLi #DelayedInjection #StoredInje...,"OWASP, MITRE ATT&CK, PortSwigger Academy, Acun..."


In [None]:
df_eval = df.sample(n = 100,random_state = 42)

In [None]:
df_eval.head()

Unnamed: 0,ID,Title,Category,Attack Type,Scenario Description,Tools Used,Attack Steps,Target Type,Vulnerability,MITRE Technique,Impact,Detection Method,Solution,Tags,Source
13195,13196,Spoof Engine RPM Readings,Automotive / Cyber-Physical Systems → CAN Bus ...,Spoofing,An attacker injects fake engine RPM messages t...,"ICSim, SocketCAN, Python-CAN",1. Tap into vehicle’s CAN bus using USB2CAN ad...,Dashboard ECU,No message authentication between ECUs,T1642,Driver deception or distraction,Compare real sensor output with bus traffic,Message signing between ECUs,"spoofing, RPM, CAN, automotive",Real-world CAN injection test
3085,3086,SSRF via LFI + Wrapper (expect://),Web Application Security,Local File Inclusion + Remote Code Execution v...,Uses expect:// wrapper to execute commands whe...,"Burp Suite, PHP interpreter",Step 1: Find an LFI (Local File Inclusion) or ...,PHP apps using dynamic includes,LFI with wrapper abuse (command execution),T1059 – Command Execution,Full Remote Code Execution (via LFI),Log file reads from unexpected wrappers; audit...,Disable PHP wrappers like expect://; sanitize ...,"LFI to RCE, PHP Wrapper, expect:// Exploit","Exploit-DB, OWASP"
12309,12310,WMI Persistence via Event Filters,DFIR,Persistence,Attacker creates a WMI subscription that execu...,"WMI Command-line, PowerShell",1. Uses command-line or PowerShell to register...,Enterprise System,Lack of WMI auditing,T1546.003 – Event Triggered Execution: WMI Eve...,Long-term stealth persistence,"WMI logs, Autoruns, KAPE registry modules","Monitor WMI namespaces, use WMI Explorer for f...","stealthy persistence, WMI trigger",Real-world IR cases
12719,12720,Seed Corpus Curation for HTML5 Video Players,Zero-Day Research / Fuzzing,Fuzzer Configuration,Collecting and preparing valid HTML5 video fil...,"AFL++, libFuzzer",1. Identify target HTML5 video player componen...,Multimedia browsers,"Buffer overflow, memory corruption",T1201,Discovery of crashes or remote code execution,"Fuzzer logs, crash analysis","Patch multimedia parsing code, validate inputs","fuzzing, seed corpus, multimedia, HTML5 video",https://www.w3.org/TR/html52/
2105,2106,Auto-Login to Phishing OAuth URLs,Mobile Security,Agent follows OAuth flow to malicious site,An attacker injects a fake OAuth URL into the ...,"Burp Suite, Fake OAuth server, Chat App",Step 1: Identify a mobile app or chatbot that ...,"AI Chatbots, OAuth Apps",No link validation; unsafe OAuth redirect hand...,T1557.003 – OAuth Redirect Abuse,"Credential theft, account takeover",Monitor outbound OAuth redirects; validate all...,Whitelist OAuth providers; validate OAuth link...,"OAuth Phishing, Auto-Login, Redirect Abuse","Evilginx, OWASP Labs"


In [None]:
eval_input = df_eval["Title"].tolist()
references = df_eval["Solution"].tolist()

In [None]:
len(eval_input)

100

In [None]:
system_prompt = """You are a SOC analyst.
Read the title of the Attack provided. Based on the Attack Type provide ONLY provide detailed solution or action to take aginst it
without any fancy title or bold text, simple in the form of simple python string, Do not include any double quotation at beginning or end"""

In [None]:
client = OpenAI(api_key = OPENAI_API_KEY)
def predict (prompt) :
  response = client.responses.create(
      model = "ft:gpt-4o-mini-2024-07-18:shoeb-sutar::CbsZknC6",
      input = [
          {"role" : "system","content" : system_prompt},
          {"role" : "user","content" : prompt}
      ]
  )
  return str(response.output_text)

In [None]:
hypothesis = [predict(x) for x in eval_input]
hypothesis

#### **BLEU Score**

In [None]:
!pip install sacrebleu

In [None]:
import sacrebleu
references_single = [[r] for r in references]
# corpus BLEU:
score = sacrebleu.corpus_bleu(hypothesis, [references])
print("BLEU (corpus):", score.score)

BLEU (corpus): 0.9042132302660617


#### **ROUGE Score**

In [None]:
!pip install evaluate

In [None]:
!pip install rouge_score

In [None]:
import evaluate
rouge = evaluate.load('rouge')
res = rouge.compute(predictions = hypothesis,
                    references = references)
print(res)

{'rouge1': np.float64(0.17157806881742296), 'rouge2': np.float64(0.018743130966414723), 'rougeL': np.float64(0.14709432545671658), 'rougeLsum': np.float64(0.1462477540837314)}


#### **Accuracy**

In [None]:
system_prompt = """You are a SOC analyst.
Read the title of the Attack provided. Based on the Attack Type provide ONLY provide Attack Type out of the 84 types only learnt during training
without any fancy title or bold text, simple in the form of simple python string, Do not include any double quotation at beginning or end"""

In [None]:
def attack_type (prompt) :
  response = client.responses.create(
      model = "ft:gpt-4o-mini-2024-07-18:shoeb-sutar::CbsZknC6",
      input = [
          {"role" : "system","content" : system_prompt},
          {"role" : "user","content" : prompt}
      ]
  )
  return str(response.output_text)

In [None]:
actual_attack = df_eval["Attack Type"].tolist()
prdt_attack = [attack_type(x) for x in actual_attack]
prdt_attack

In [None]:
n = 100
count = 0
for x,y in zip(prdt_attack,actual_attack) :
  if x.lower() == y.lower() :
    count = count + 1

acc = (count / n)
print("The Accuracy of Model is : ",acc)

The Accuracy of Model is :  0.0


### **Screen Shots**

In [None]:
system_prompt = """You are a SOC analyst.
Read the title of the Attack provided. Based on the Attack Type provide the description of scenario,
The associated MITRE Technique. Also provide for each title associated Category of attack, type of attack.
Once done provide the detailed best solution or action to take in simple language and 2 to 3 lines"""

In [None]:
{"messages": [
   {"role": "system", "content": "SYSTEM_PROMPT"},
   {"role": "user", "content": "<TITLE/INCIDENT DESCRIPTION>"},
   {"role": "assistant", "content": "<CATEGORY / ATTACK TYPE / SOLUTION>"}
]}

In [None]:
from openai import OpenAI
client = OpenAI()

client.fine_tuning.jobs.create (
    model = "gpt-4o-mini-2024-07-18",
    training_file = "file-training-id",
    validation_file = "file-validation-id",
    hyperparameters = {"n_epochs": 3}
)

In [None]:
response = client.responses.create (
   model = "ft:gpt-4o-mini:<custom-id>",
   input = "Investigate unusual outbound traffic detected on port 4444."
)
print(response.output_text)

In [None]:
prompt = "This attack abuses the browser’s ability to load external resources like <script> or <link> to trigger unintended GET/POST requests to a target server from a different domain. Because browsers automatically send cookies with these requests, attackers can exploit this to perform unauthorized actions without needing JavaScript execution."

In [None]:
client = OpenAI(api_key = OPENAI_API_KEY)
response = client.responses.create(
    model = "ft:gpt-4o-mini-2024-07-18:shoeb-sutar::CbsZknC6",
    input = [
        {"role" : "system","content" : system_prompt},
        {"role" : "user","content" : prompt}
    ]
)
print(response.output_text)

Category : Web Application Security
Attack Type : DOM Injection (Stored/Reflected)
Solution : SameSite cookies, CSRF protection for all state-changing actions
