# Security Classification Use Case

As a SOC member of your security team, you may often face an overwhelming amount of security-related texts and need a way to quickly identify which ones are critical.

Using our models, we can classify cybersecurity-related descriptions -such as excerpts from security blogs or security alert messages- into categories like MITRE ATT&CK IDs or CVSS classes.

## Model used for this use case
Base Model (Foundation-Sec-8B) is best suited for this use case because the task involves classification, and the output is a single word or short phrase. We're using it via SageMaker endpoint.

**Note**: Update the configuration variables below to match your deployment.

## Configuration
Update these variables to match your SageMaker deployment:

In [None]:
# Update these variables to match your deployemnt
endpoint_name = 'foundation-sec-8b-endpoint'
aws_region = 'us-east-1'

print(f"Configuration:")
print(f"Endpoint: {endpoint_name}")
print(f"Region: {aws_region}")

## Setup
The setup uses SageMaker endpoint instead of loading the model locally.

In [None]:
import boto3
import json
import re
from IPython.display import display, Markdown

# Initialize SageMaker runtime client
sagemaker_runtime = boto3.client('sagemaker-runtime', region_name=aws_region)

print(f"Connected to SageMaker endpoint: {endpoint_name}")

In [None]:
# Generation arguments optimized for classification (short outputs)
generation_args = {
    "max_new_tokens": 10,  # Small since we only need class labels
    "temperature": None,   # Deterministic for consistent classification
    "repetition_penalty": 1.2,
    "do_sample": False,
    "use_cache": True,
}

print("Generation configuration for classification:")
for key, value in generation_args.items():
    print(f"  {key}: {value}")

In [None]:
def inference(prompt):
    """Inference function using SageMaker endpoint for classification tasks"""
    
    # Prepare payload for SageMaker endpoint
    payload = {
        "inputs": prompt,
        "parameters": generation_args
    }
    
    try:
        response = sagemaker_runtime.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType='application/json',
            Body=json.dumps(payload)
        )
        
        result = json.loads(response['Body'].read().decode())
        
        # Handle different TGI response formats
        if isinstance(result, list) and len(result) > 0:
            generated_text = result[0].get('generated_text', '')
        elif isinstance(result, dict):
            generated_text = result.get('generated_text', str(result))
        else:
            generated_text = str(result)
        
        # For classification, we often just want the new tokens, not the full prompt
        # Remove the original prompt if it's included
        if generated_text.startswith(prompt):
            response_text = generated_text[len(prompt):].strip()
        else:
            response_text = generated_text.strip()
            
        # Remove any trailing special tokens
        response_text = re.sub(r'<\|.*?\|>$', '', response_text).strip()
        
        return response_text
        
    except Exception as e:
        print(f"Error invoking endpoint: {str(e)}")
        return f"Error: {str(e)}"

# Test the inference function
test_response = inference("Test classification: malware")
print("Test Response:")
print(test_response)

## MITRE ATT&CK ID Classification

This task involves mapping a given context to the corresponding MITRE ATT&CK ID.

To ensure the model responds in the correct format, five example pairs are provided. "T" is appended to the prompt to minimize distraction and guide the model to complete the technique ID.

In [None]:
mitre_prompt = '''
context: This downloader is unique per system and contains a customized backdoor written in Assembler
technique: T1059

context: This malware was capable of stealing significant system and network information
technique: T1082

context: Email phishing credential theft
technique: T1566

context: they are served a ZIP archive containing a malicious LNK file.
technique: T1204

context: download and deploy Trickbot on the user's machine
technique: T1105

context: POSHSPY's use of WMI to both store and persist the backdoor code makes it nearly invisible to anyone not familiar with the intricacies of WMI.
technique: T'''

print("MITRE ATT&CK classification prompt ready!")

In [None]:
print("=== MITRE ATT&CK ID CLASSIFICATION ===")
output = inference(mitre_prompt)

# Extract just the technique ID
if "technique: T" in mitre_prompt + output:
    full_output = mitre_prompt + output
    technique_id = "T" + full_output.split("technique: T")[-1].strip().split()[0]
else:
    technique_id = output.strip()

print(f"Raw output: '{output}'")
print(f"Extracted MITRE ATT&CK ID: {technique_id}")
print(f"\nContext: POSHSPY's use of WMI to both store and persist the backdoor code")
print(f"Predicted Technique: {technique_id}")

if technique_id == "T1047":
    print("✅ Correct! T1047 is 'Windows Management Instrumentation'")
else:
    print(f"❓ Result: {technique_id} (Expected: T1047 for WMI usage)")

## Test Additional MITRE ATT&CK Classifications

Let's test a few more examples:

In [None]:
def classify_mitre_attack(context_description):
    """Classify a security context into MITRE ATT&CK technique"""
    
    prompt_template = '''
context: This downloader is unique per system and contains a customized backdoor written in Assembler
technique: T1059

context: This malware was capable of stealing significant system and network information
technique: T1082

context: Email phishing credential theft
technique: T1566

context: they are served a ZIP archive containing a malicious LNK file.
technique: T1204

context: download and deploy Trickbot on the user's machine
technique: T1105

context: {}
technique: T'''.format(context_description)
    
    result = inference(prompt_template)
    technique_id = "T" + result.strip().split()[0] if result.strip() else "Unknown"
    
    return technique_id

# Test cases
test_contexts = [
    "Attackers used PowerShell to execute malicious scripts",
    "The malware created a scheduled task for persistence",
    "Credential dumping from memory using Mimikatz",
    "Lateral movement using SMB protocol",
    "Data exfiltration through encrypted channels"
]

print("=== ADDITIONAL MITRE ATT&CK CLASSIFICATIONS ===")
for i, context in enumerate(test_contexts, 1):
    technique = classify_mitre_attack(context)
    print(f"{i}. Context: {context}")
    print(f"   Predicted Technique: {technique}")
    print()

## Common Vulnerability Scoring System (CVSS) Classification

This task involves classifying a given description into the correct label based on the CVSS category.

For example, if the category is Attack Vector, choices are Network, Adjacent, Local or Physical, while if the category is Integrity Impact, choices are None, Low or High.

In [None]:
cvss_prompt = '''
I have a description about a threat intelligence analysis.

description: Cross Site Scripting vulnerability in the input parameter in eyoucms v.1.6.5 allows a remote attacker to run arbitrary code via crafted URL.
Regarding Integrity Impact I will answer only one of the following choices in 1 word: None, Low or High
My choice: Low

description: The EventON WordPress plugin before 2.2 does not sanitise and escape some of its settings, which could allow high privilege users such as admin to perform Stored HTML Injection attacks even when the unfiltered_html capability is disallowed.
Regarding Availability Impact I will answer only one of the following choices in 1 word: None, Low or High
My choice: None

description: A vulnerability, which was classified as critical, was found in Youke365 up to 1.5.3. Affected is an unknown function of the file /app/api/controller/caiji.php of the component Parameter Handler. The manipulation of the argument url leads to server-side request forgery. It is possible to launch the attack remotely. The exploit has been disclosed to the public and may be used. VDB-249870 is the identifier assigned to this vulnerability.
Regarding Attack Vector Impact I will answer only one of the following choices in 1 word: Network, Adjacent, Local or Physical
My choice: Network

description: ASQL injection vulnerability in EmpireCMS v7.5, allows remote attackers to execute arbitrary code and obtain sensitive information via the DoExecSql function.
Regarding Confidentiality Impact I will answer only one of the following choices in 1 word: Low or High
My choice: High

description: IBM WebSphere Application Server Liberty 17.0.0.3 through 24.0.0.4 is vulnerable to a denial of service, caused by sending a specially crafted request. A remote attacker could exploit this vulnerability to cause the server to consume memory resources.  IBM X-Force ID:  280400.
Regarding Privileges Required I will answer only one of the following choices in 1 word: None, Low or High
My choice: None

description: jshERP v3.3 is vulnerable to Arbitrary File Upload. The jshERP-boot/systemConfig/upload interface does not check the uploaded file type, and the biz parameter can be spliced into the upload path, resulting in arbitrary file uploads with controllable paths.
Regarding Attack Complexity I will answer only one of the following choices in 1 word: Low or High
My choice:'''

print("CVSS classification prompt ready!")

In [None]:
print("=== CVSS ATTACK COMPLEXITY CLASSIFICATION ===")
output = inference(cvss_prompt)

# Extract the classification result
classification = output.strip().split()[0] if output.strip() else "Unknown"

print(f"Raw output: '{output}'")
print(f"Classification: {classification}")
print(f"\nVulnerability: jshERP v3.3 Arbitrary File Upload")
print(f"Attack Complexity Classification: {classification}")

if classification.lower() == "low":
    print("✅ Correct! File upload vulnerabilities typically have Low attack complexity")
else:
    print(f"❓ Result: {classification} (Expected: Low for file upload vulnerabilities)")

## Advanced CVSS Classification Function

Let's create a flexible function for different CVSS categories:

In [None]:
def classify_cvss_metric(description, metric_name, choices):
    """Classify a vulnerability description for a specific CVSS metric"""
    
    # Create few-shot examples based on the metric
    examples = {
        "Attack Vector": [
            ("Remote SQL injection via web interface", "Network"),
            ("USB-based malware requires physical access", "Physical"),
            ("WiFi network vulnerability", "Adjacent")
        ],
        "Attack Complexity": [
            ("Simple buffer overflow", "Low"),
            ("Race condition requiring precise timing", "High")
        ],
        "Privileges Required": [
            ("Anonymous access vulnerability", "None"),
            ("Admin panel vulnerability", "High"),
            ("User-level privilege escalation", "Low")
        ],
        "Confidentiality Impact": [
            ("Information disclosure vulnerability", "High"),
            ("Limited data exposure", "Low")
        ]
    }
    
    # Build prompt with examples
    prompt_parts = ["I have a description about a threat intelligence analysis.\n"]
    
    # Add relevant examples
    if metric_name in examples:
        for desc, choice in examples[metric_name]:
            prompt_parts.append(f"description: {desc}")
            prompt_parts.append(f"Regarding {metric_name} I will answer only one of the following choices in 1 word: {', '.join(choices)}")
            prompt_parts.append(f"My choice: {choice}\n")
    
    # Add the actual query
    prompt_parts.extend([
        f"description: {description}",
        f"Regarding {metric_name} I will answer only one of the following choices in 1 word: {', '.join(choices)}",
        "My choice:"
    ])
    
    full_prompt = "\n".join(prompt_parts)
    result = inference(full_prompt)
    
    classification = result.strip().split()[0] if result.strip() else "Unknown"
    return classification

# Test different CVSS metrics
test_vulnerability = "A remote code execution vulnerability in Apache Struts allows unauthenticated attackers to execute arbitrary commands via HTTP requests."

cvss_tests = [
    ("Attack Vector", ["Network", "Adjacent", "Local", "Physical"]),
    ("Attack Complexity", ["Low", "High"]),
    ("Privileges Required", ["None", "Low", "High"]),
    ("Confidentiality Impact", ["None", "Low", "High"])
]

print("=== COMPREHENSIVE CVSS CLASSIFICATION ===")
print(f"Vulnerability: {test_vulnerability}\n")

for metric, choices in cvss_tests:
    classification = classify_cvss_metric(test_vulnerability, metric, choices)
    print(f"{metric}: {classification}")

print("\n" + "="*50)

## Custom Classification Examples

Try your own security descriptions:

In [None]:
# Custom security descriptions for testing
custom_descriptions = [
    "Ransomware encrypted files and demanded Bitcoin payment",
    "Insider threat copied sensitive documents to USB drive",
    "Social engineering attack tricked users into revealing passwords",
    "DDoS attack overwhelmed web servers with traffic",
    "Zero-day exploit bypassed all security controls"
]

print("=== CUSTOM MITRE ATT&CK CLASSIFICATIONS ===")
for i, desc in enumerate(custom_descriptions, 1):
    technique = classify_mitre_attack(desc)
    print(f"{i}. {desc}")
    print(f"   → MITRE ATT&CK Technique: {technique}\n")

print("=== CUSTOM CVSS CLASSIFICATIONS ===")
sample_vuln = custom_descriptions[0]  # Use first description
print(f"Analyzing: {sample_vuln}\n")

for metric, choices in cvss_tests[:2]:  # Test first 2 metrics
    result = classify_cvss_metric(sample_vuln, metric, choices)
    print(f"{metric}: {result}")

## Interactive Classification

Add your own security descriptions for classification:

In [None]:
# Interactive classification - modify this cell with your own descriptions
your_security_description = "Enter your security incident description here"

if your_security_description != "Enter your security incident description here":
    print("=== YOUR CUSTOM CLASSIFICATION ===")
    print(f"Description: {your_security_description}\n")
    
    # MITRE ATT&CK classification
    mitre_result = classify_mitre_attack(your_security_description)
    print(f"MITRE ATT&CK Technique: {mitre_result}")
    
    # CVSS classification for Attack Vector
    cvss_result = classify_cvss_metric(
        your_security_description, 
        "Attack Vector", 
        ["Network", "Adjacent", "Local", "Physical"]
    )
    print(f"CVSS Attack Vector: {cvss_result}")
else:
    print("Replace the placeholder text above with your security description to test classification!")

## Classification Summary

This notebook demonstrates how to use the Cisco Foundation Security model via SageMaker endpoint for:

- **MITRE ATT&CK Technique Classification**: Mapping security contexts to specific attack techniques
- **CVSS Metric Classification**: Categorizing vulnerabilities across different CVSS dimensions
- **Custom Security Classification**: Adapting the approach for your specific use cases

The model excels at these classification tasks due to its security-focused training and the few-shot learning approach used in the prompts.