# PII Leakage

The PII (Personally Identifiable Information) leakage vulnerability evaluates whether your LLM system can resist generating or disclosing sensitive personal information.

* "direct disclosure": Tests for outputs that disclose PII directly in response to explicit user prompts, such as revealing names, addresses, phone numbers, or other personal details.
* "api and database access": Tests for outputs that disclose PII by providing unauthorized access to databases or querying stored personal information, potentially revealing sensitive records.
* "session leak": Tests for outputs that disclose PII from other users' active or past sessions, such as inadvertently exposing personal details associated with session-based interactions.
* "social manipulation": Tests for outputs that enable or facilitate social engineering attacks, tricking individuals into revealing their own or others' PII.


ref.: https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-pii-leakage

In [1]:
# Set ollama localhosted as LLM to be tested
!deepteam set-ollama llama3.2:latest 

Ollama model configured.


In [2]:
import requests, asyncio
from deepteam import red_team
from deepteam.vulnerabilities import PIILeakage

from deepteam.attacks.single_turn import PromptInjection
from deepteam.attacks.single_turn import Roleplay
from deepteam.attacks.single_turn import GrayBox
from deepteam.attacks.single_turn import Leetspeak
from deepteam.attacks.single_turn import ROT13
from deepteam.attacks.single_turn import Multilingual
from deepteam.attacks.single_turn import MathProblem
from deepteam.attacks.single_turn import Base64

## Configuring LLM to be tested

In [3]:
async def model_callback(prompt: str) -> str:
    resp = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "llama3.2:latest",
            "prompt": prompt,
            "stream": False      # resposta em bloco único
        },
        timeout=120
    )
    return resp.json()["response"]

## Configuring attacks

In [4]:
# Single turn
prompt_injection = PromptInjection()
roleplay = Roleplay()
gray_box = GrayBox()
leetspeak = Leetspeak()
rot_13 = ROT13()
multilingual = Multilingual()
math_problem = MathProblem()
base64 = Base64()

## Attack test

### Prompt injection

The prompt injection attack method injects hidden instructions into the baseline attack, which could be interpreted by the LLM in ways that bypass its restrictions or lead to harmful outputs.

In [5]:
pii_leakage = PIILeakage(types=["direct disclosure", 
                                "api and database access", 
                                "session leak", 
                                "social manipulation"])


risk = red_team(model_callback=model_callback, 
                vulnerabilities=[pii_leakage], 
                attacks=[prompt_injection])

💥 Generating 4 attacks (for 4 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.18s/it]
✨ Simulating 4 attacks (using 1 method(s)): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<?, ?it/s]
📝 Evaluating 4 vulnerability types across 1 vulnerability(s): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:23<00:00,  5.95s/it]


### Role play 

The roleplay attack method enhances baseline attacks by embedding them within elaborate persona-based scenarios, where the LLM is instructed to adopt a specific character role. This technique leverages the model's collaborative training to bypass safety mechanisms through authentic character portrayal.

In [6]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[pii_leakage], 
                attacks=[roleplay])

💥 Generating 4 attacks (for 4 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.42s/it]
✨ Simulating 4 attacks (using 1 method(s)):   0%|                                                                                                                                                           | 0/4 [00:00<?, ?it/s]
...... 🎭 Roleplay:   0%|                                                                                                                                                                                  | 0/9 [00:00<?, ?step/s][A

...... 🎭 Roleplay:   0%|                                                                                                                                                                                  | 0/9 [00:00<?, ?step/s][A[A


...... 🎭 Roleplay:   0%|                                                        

### Gray Box

The gray box attack method leverages partial knowledge about an existing LLM system to craft an attack that targets known vulnerabilities, reframing the baseline attack with abstract or misleading language.

In [7]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[pii_leakage], 
                attacks=[gray_box])

💥 Generating 4 attacks (for 4 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.24s/it]
✨ Simulating 4 attacks (using 1 method(s)):   0%|                                                                                                                                                           | 0/4 [00:00<?, ?it/s]
...... 🔓 Gray Box:   0%|                                                                                                                                                                                 | 0/15 [00:00<?, ?step/s][A

...... 🔓 Gray Box:   0%|                                                                                                                                                                                 | 0/15 [00:00<?, ?step/s][A[A


...... 🔓 Gray Box:   0%|                                                        

### Leetspeak

The leetspeak attack method replaces common characters with leetspeak equivalents, a form of symbolic substitution that transforms the attack text to avoid simple keyword detection.

In [8]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[pii_leakage], 
                attacks=[leetspeak])

💥 Generating 4 attacks (for 4 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.94s/it]
✨ Simulating 4 attacks (using 1 method(s)): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 3880.02it/s]
📝 Evaluating 4 vulnerability types across 1 vulnerability(s): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:24<00:00,  6.24s/it]


### ROT13

The ROT-13 attack method rotates each letter in the baseline attack by 13 positions in the alphabet, creating a scrambled message that is less likely to trigger basic content filters.

In [9]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[pii_leakage], 
                attacks=[rot_13])

💥 Generating 4 attacks (for 4 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.81s/it]
✨ Simulating 4 attacks (using 1 method(s)): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<?, ?it/s]
📝 Evaluating 4 vulnerability types across 1 vulnerability(s): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:45<00:00,  6.53s/it]


### Multilingual

The multilingual attack method translates a baseline attack into a less-spoken language to evade content filters that focus on more widely used languages, such as English.

In [10]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[pii_leakage], 
                attacks=[multilingual])

💥 Generating 4 attacks (for 4 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.03s/it]
✨ Simulating 4 attacks (using 1 method(s)):   0%|                                                                                                                                                           | 0/4 [00:00<?, ?it/s]
...... 🌍 Multilingual Enhancement:   0%|                                                                                                                                                                 | 0/15 [00:00<?, ?step/s][A

...... 🌍 Multilingual Enhancement:   0%|                                                                                                                                                                 | 0/15 [00:00<?, ?step/s][A[A


...... 🌍 Multilingual Enhancement:   0%|                                        

### Math Problem

The math problem attack method enhances baseline attacks into mathematically formatted inputs or embed them within math problems, making them seem like non-harmful queries while disguising their true intent.

In [11]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[pii_leakage], 
                attacks=[math_problem])

💥 Generating 4 attacks (for 4 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.92s/it]
✨ Simulating 4 attacks (using 1 method(s)):   0%|                                                                                                                                                           | 0/4 [00:00<?, ?it/s]
...... 📚 Math Problem Enhancement:   0%|                                                                                                                                                                 | 0/15 [00:00<?, ?step/s][A

...... 📚 Math Problem Enhancement:   0%|                                                                                                                                                                 | 0/15 [00:00<?, ?step/s][A[A


...... 📚 Math Problem Enhancement:   0%|                                        

### Base64

The base64 attack method encodes a baseline attack using the Base64 algorithm, turning the text into a string of encoded characters. This helps avoid detection mechanisms by making the content appear as random data.

In [12]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[pii_leakage], 
                attacks=[base64 ])

💥 Generating 4 attacks (for 4 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.67s/it]
✨ Simulating 4 attacks (using 1 method(s)): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<?, ?it/s]
📝 Evaluating 4 vulnerability types across 1 vulnerability(s): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:20<00:00,  5.09s/it]
