## Getting Started
This notebook demonstrates how to use the Foundation-Sec-8B model for cybersecurity text classification. Whether you're a security analyst, researcher, or just getting started with AI, this guide will help you automatically categorize cybersecurity-related text into relevant categories.

---

### What This Notebook Does
Foundation-Sec-8B is a specialized language model trained specifically for cybersecurity tasks. This notebook shows you how to:
- Automatically classify cybersecurity text into categories like:
    - Malware (viruses, ransomware, trojans)
    - Phishing (fake emails, social engineering)
    - Vulnerabilities (security flaws, CVEs)
    - Incident Response (breach containment, forensics)
    - Threat Intelligence (APT groups, attack patterns)
    - Compliance (GDPR, regulations, audits)
    - Data Breaches (leaked information, exposures)

- Use advanced perplexity-based classification for high accuracy
- Provide confidence scores for each prediction
- Process single texts or batches of documents

---

### When to Use This Approach (and When Not To!)

This notebook uses perplexity-based classification, which evaluates how well each category "explains" the input text by computing its perplexity under prompts conditioned on each class label.

#### ✅ Advantages – When to Use:

* Instruction fidelity is guaranteed: Unlike completion-style classification, the model is never asked to generate an open-ended label, so it cannot hallucinate or disobey label constraints.

* Label control: All predictions are restricted to a fixed set of known categories, making it suitable for high-stakes or rule-based classification environments.

* No need for task-specific fine-tuning: You can use this out-of-the-box with Foundation-Sec-8B without collecting or labeling new data.

* High interpretability: Confidence scores are derived from actual model perplexity, making decision boundaries easier to inspect.

#### ⚠️ Limitations – When Not To Use:

* **Scalability**: This approach runs a separate forward pass per class, so latency and cost scale linearly with the number of categories. For small to moderate class counts (\~10 or fewer), this is acceptable. For hundreds or thousands of classes, this becomes infeasible.

* **When high-throughput is critical**: If you're deploying real-time systems or large-scale pipelines, this may not be performant enough.

#### 👉 Alternatives:

If you're working with many categories or need real-time performance, consider using the finetuned classification model instead. See the [Finetuning Classification Notebook](https://github.com/RobustIntelligence/foundation-ai-cookbook/blob/main/2_examples/Classification_cybersecurity_descriptions.ipynb) for instructions on how to train and deploy a task-specific classifier.

---

### What is Perplexity Classification?
Perplexity measures how "surprised" a language model is by a piece of text. Lower perplexity means the model finds the text more predictable given a certain context. We use this to determine which category best fits your cybersecurity text.

## Installing Libraries

### Model Available here([Link](https://huggingface.co/fdtn-ai/Foundation-Sec-8B))

In [None]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.0/67.0 MB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m84.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m62.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m46.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Imports

In [None]:
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from typing import List, Dict, Tuple
import numpy as np
from dataclasses import dataclass

## NTP Classification

In [None]:
@dataclass
class ClassificationResult:
    """
    Stores the results of text classification

    Attributes:
        predicted_label: The most likely category (e.g., "malware", "phishing")
        confidence_score: How confident the model is (0-1, higher = more confident)
        perplexity_scores: Technical scores for each category
        raw_probabilities: Probability distribution across all categories
    """
    predicted_label: str
    confidence_score: float
    perplexity_scores: Dict[str, float]
    raw_probabilities: Dict[str, float]

In [None]:
class PerplexityClassifier:
    """
    A cybersecurity text classifier using the Foundation-Sec-8B model

    This class handles:
    - Loading the specialized cybersecurity model
    - Processing text through perplexity analysis
    - Returning classification results with confidence scores
    """

    def __init__(self, model_name: str = "fdtn-ai/Foundation-Sec-8B",
                 labels: List[str] = None, device: str = "auto",
                 run_quantized: bool = False):
        """
        Initialize the classifier

        Parameters:
            model_name: The AI model to use (default: Foundation-Sec-8B)
            labels: List of categories to classify into
            device: Where to run the model ("auto", "cpu", or "cuda")
            run_quantized: Use memory-efficient loading (recommended for large models)
        """
        self.labels = labels or []
        self.device = self._get_device(device)
        self.run_quantized = run_quantized

        # Load the tokenizer (converts text to numbers the model understands)
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)

        # Load the model with appropriate settings
        if run_quantized:
            # Memory-efficient loading - uses 4-bit precision
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_compute_dtype=torch.bfloat16,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type="nf4"
            )
            self.model = AutoModelForCausalLM.from_pretrained(
                model_name,
                quantization_config=quantization_config,
                device_map="auto"
            )
        else:
            # Standard loading
            self.model = AutoModelForCausalLM.from_pretrained(
                model_name,
                torch_dtype=torch.bfloat16 if self.device.type == 'cuda' else torch.float32,
                device_map="auto" if self.device.type == 'cuda' else None
            )
            if self.device.type != 'cuda':
                self.model = self.model.to(self.device)

        # Set model to evaluation mode
        self.model.eval()

        # Handle padding token for proper text processing
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token

    def _get_device(self, device: str) -> torch.device:
        """Automatically detect the best device (GPU vs CPU)"""
        if device == "auto":
            return torch.device("cuda" if torch.cuda.is_available() else "cpu")
        return torch.device(device)

    def set_labels(self, labels: List[str]) -> None:
        """Set or update the classification categories"""
        self.labels = labels

    def _create_classification_prompt(self, text: str,
                                    few_shot_examples: List[Tuple[str, str]] = None) -> str:
        """
        Create a structured prompt for the model to understand the classification task

        Parameters:
            text: The cybersecurity text to classify
            few_shot_examples: Optional examples to guide the model (format: [(text, label), ...])

        Returns:
            A formatted prompt that helps the model understand what to do
        """
        prompt_parts = [
            "This is a cybersecurity text classification task.",
            f"Available labels: {', '.join(self.labels)}",
            "Choose the most appropriate label for the given text.\n"
        ]

        # Add examples if provided
        if few_shot_examples:
            prompt_parts.append("Examples:")
            for example_text, example_label in few_shot_examples:
                prompt_parts.append(f'Text: """{example_text}"""')
                prompt_parts.append(f"Chosen label: {example_label}\n")

        # Add the text to classify
        prompt_parts.extend([
            f'Text: """{text}"""',
            "Chosen label:"
        ])

        return "\n".join(prompt_parts)

    def _calculate_batch_perplexities(self, prompt: str,
                                    completions: List[str]) -> Dict[str, float]:
        """
        Calculate perplexity scores for each possible label

        Lower perplexity = model is more confident in that classification

        Parameters:
            prompt: The classification prompt
            completions: List of possible labels to test

        Returns:
            Dictionary mapping each label to its perplexity score
        """
        perplexities = {}

        for completion in completions:
            # Create full text: prompt + potential answer
            full_text = prompt + " " + completion

            # Convert text to model input
            inputs = self.tokenizer(full_text, return_tensors="pt",
                                  truncation=True, max_length=2048)
            inputs = {k: v.to(self.device) for k, v in inputs.items()}

            # Get length of just the prompt (not the completion)
            prompt_inputs = self.tokenizer(prompt, return_tensors="pt",
                                         truncation=True, max_length=2048)
            prompt_length = prompt_inputs["input_ids"].shape[1]

            # Calculate perplexity
            with torch.no_grad():  # Don't update model weights
                outputs = self.model(**inputs)
                logits = outputs.logits

                # Focus only on the completion part
                completion_logits = logits[0, prompt_length-1:-1]
                completion_targets = inputs["input_ids"][0, prompt_length:]

                # Calculate cross-entropy loss (related to perplexity)
                loss = F.cross_entropy(completion_logits, completion_targets, reduction='mean')

                # Convert loss to perplexity
                perplexities[completion] = torch.exp(loss).item()

        return perplexities

    def classify(self, text: str,
                few_shot_examples: List[Tuple[str, str]] = None,
                return_all_scores: bool = False) -> ClassificationResult:
        """
        Classify a piece of cybersecurity text

        Parameters:
            text: The cybersecurity text to classify
            few_shot_examples: Optional examples to improve accuracy
            return_all_scores: Whether to return scores for all categories

        Returns:
            ClassificationResult with prediction and confidence scores
        """
        if not self.labels:
            raise ValueError("No labels set. Use set_labels() to define classification labels.")

        # Create the classification prompt
        prompt = self._create_classification_prompt(text, few_shot_examples)

        # Calculate perplexity for each possible label
        perplexity_scores = self._calculate_batch_perplexities(prompt, self.labels)

        # Convert perplexities to probabilities (lower perplexity = higher probability)
        raw_probabilities = {
            label: 1.0 / perp if perp != float('inf') else 0.0
            for label, perp in perplexity_scores.items()
        }

        # Find the best (lowest perplexity) label
        best_label = min(perplexity_scores.keys(), key=lambda x: perplexity_scores[x])

        # Normalize probabilities to sum to 1
        total_inverse_perplexity = sum(raw_probabilities.values())
        normalized_probabilities = {
            label: prob / total_inverse_perplexity if total_inverse_perplexity > 0 else 1.0/len(self.labels)
            for label, prob in raw_probabilities.items()
        }

        # Calculate confidence as concentration of probability mass
        confidence_score = sum(p**2 for p in normalized_probabilities.values())

        return ClassificationResult(
            predicted_label=best_label,
            confidence_score=confidence_score,
            perplexity_scores=perplexity_scores if return_all_scores else {best_label: perplexity_scores[best_label]},
            raw_probabilities=normalized_probabilities if return_all_scores else {best_label: normalized_probabilities[best_label]}
        )

    def batch_classify(self, texts: List[str],
                      few_shot_examples: List[Tuple[str, str]] = None) -> List[ClassificationResult]:
        """
        Classify multiple texts at once

        Parameters:
            texts: List of cybersecurity texts to classify
            few_shot_examples: Optional examples to improve accuracy

        Returns:
            List of ClassificationResult objects, one for each input text
        """
        results = []
        for text in texts:
            result = self.classify(text, few_shot_examples)
            results.append(result)
        return results

## Real-World Example

In [None]:
def demo_security_classification():
    classifier = PerplexityClassifier(model_name="fdtn-ai/Foundation-Sec-8B", labels=[
        "malware",
        "phishing",
        "vulnerability",
        "incident_response",
        "threat_intelligence",
        "compliance",
        "data_breach"
    ], run_quantized=True)
    test_texts = [
        "A critical buffer overflow vulnerability was discovered in the Apache HTTP server that could allow remote code execution.",
        "Suspicious email attachment containing executable file detected in employee inbox with urgent payment request.",
        "GDPR compliance audit revealed inadequate data processing consent mechanisms across customer database systems.",
        "Containment procedures activated following detection of lateral movement in network segment 192.168.1.0/24.",
        "APT29 group observed using new PowerShell-based persistence mechanism targeting government entities."
    ]
    few_shot_examples = [
        ("SQL injection flaw found in web application login form", "vulnerability"),
        ("Fake Microsoft login page sent to 500 employees", "phishing"),
        ("Ransomware encrypted file server overnight", "malware")
    ]
    for i, text in enumerate(test_texts, 1):
        result = classifier.classify(text, few_shot_examples, return_all_scores=True)
        print(f"{text[:60]}...")
        print(f"Predicted: {result.predicted_label}")
        print(f"Probability: {result.raw_probabilities[result.predicted_label]:.3f}")
        print(f"Confidence: {result.confidence_score:.3f}")
        print("-" * 50)

if __name__ == "__main__":
    demo_security_classification()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/630 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/840 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/121 [00:00<?, ?B/s]

A critical buffer overflow vulnerability was discovered in t...
Predicted: vulnerability
Probability: 0.664
Confidence: 0.486
--------------------------------------------------
Suspicious email attachment containing executable file detec...
Predicted: phishing
Probability: 0.375
Confidence: 0.260
--------------------------------------------------
GDPR compliance audit revealed inadequate data processing co...
Predicted: compliance
Probability: 0.742
Confidence: 0.578
--------------------------------------------------
Containment procedures activated following detection of late...
Predicted: incident_response
Probability: 0.811
Confidence: 0.677
--------------------------------------------------
APT29 group observed using new PowerShell-based persistence ...
Predicted: threat_intelligence
Probability: 0.721
Confidence: 0.568
--------------------------------------------------


### What the Output Means

When you run the classification, you'll see several key metrics:

1. **Predicted Category**: The most likely classification
   - Examples: "vulnerability", "phishing", "malware"

2. **Confidence Score**: How certain the model is (0-1 scale)
   - `0.8-1.0`: Very confident
   - `0.6-0.8`: Moderately confident  
   - `0.4-0.6`: Somewhat uncertain
   - `0.0-0.4`: Low confidence (manual review recommended)

3. **Probability**: Likelihood of the predicted category (0-1 scale)
   - Higher values indicate stronger evidence for that category

4. **Top 3 Categories**: Shows alternative classifications
   - Useful for edge cases or ambiguous text

### Example Output Interpretation
```markdown
Scenario 1:

Text: A critical buffer overflow vulnerability was discovered in t...
Predicted Category: vulnerability
Confidence: 0.486
Probability: 0.664
Top 3 categories:
   vulnerability: 0.664
   malware: 0.201
   incident_response: 0.135
```

**What this tells us:**
- The model is 66.4% sure this is about a vulnerability
- Confidence is moderate (0.486) - could benefit from more context
- Alternative interpretations include malware (20.1%) and incident response (13.5%)