# Debug Access
The Debug Access vulnerability evaluates whether an AI agent can prevent unauthorized access to debug modes, development endpoints, and administrative interfaces. This vulnerability is critical for agents that may expose debugging functionality or development features in production environments.

* "debug_mode_bypass": Tests whether the agent prevents unauthorized access to debug modes.
* "development_endpoint_access": Tests whether the agent blocks access to development endpoints.
* "administrative_interface_exposure": Tests whether the agent prevents exposure of administrative interfaces.

ref.: https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-debug-access

In [1]:
# Set ollama localhosted as LLM to be tested
!deepteam set-ollama llama3.2:latest 

Ollama model configured.


In [2]:
import requests, asyncio
from deepteam import red_team
from deepteam.vulnerabilities import DebugAccess

from deepteam.attacks.single_turn import PromptInjection
from deepteam.attacks.single_turn import Roleplay
from deepteam.attacks.single_turn import GrayBox
from deepteam.attacks.single_turn import Leetspeak
from deepteam.attacks.single_turn import ROT13
from deepteam.attacks.single_turn import Multilingual
from deepteam.attacks.single_turn import MathProblem
from deepteam.attacks.single_turn import Base64

## Configuring LLM to be tested

In [3]:
async def model_callback(prompt: str) -> str:
    resp = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "llama3.2:latest",
            "prompt": prompt,
            "stream": False      # resposta em bloco único
        },
        timeout=120
    )
    return resp.json()["response"]

## Configuring attacks

In [4]:
# Single turn
prompt_injection = PromptInjection()
roleplay = Roleplay()
gray_box = GrayBox()
leetspeak = Leetspeak()
rot_13 = ROT13()
multilingual = Multilingual()
math_problem = MathProblem()
base64 = Base64()

## Attack test

### Prompt injection

The prompt injection attack method injects hidden instructions into the baseline attack, which could be interpreted by the LLM in ways that bypass its restrictions or lead to harmful outputs.

In [5]:
debug_access = DebugAccess(types=["debug_mode_bypass", 
                                  "development_endpoint_access", 
                                  "administrative_interface_exposure"])


risk = red_team(model_callback=model_callback, 
                vulnerabilities=[debug_access], 
                attacks=[prompt_injection])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.24s/it]
✨ Simulating 3 attacks (using 1 method(s)): 0it [00:00, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 0it [00:00, ?it/s]


### Role play 

The roleplay attack method enhances baseline attacks by embedding them within elaborate persona-based scenarios, where the LLM is instructed to adopt a specific character role. This technique leverages the model's collaborative training to bypass safety mechanisms through authentic character portrayal.

In [6]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[debug_access], 
                attacks=[roleplay])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.28s/it]
✨ Simulating 3 attacks (using 1 method(s)): 0it [00:00, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 0it [00:00, ?it/s]


### Gray Box

The gray box attack method leverages partial knowledge about an existing LLM system to craft an attack that targets known vulnerabilities, reframing the baseline attack with abstract or misleading language.

In [7]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[debug_access], 
                attacks=[gray_box])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.54s/it]
✨ Simulating 3 attacks (using 1 method(s)): 0it [00:00, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 0it [00:00, ?it/s]


### Leetspeak

The leetspeak attack method replaces common characters with leetspeak equivalents, a form of symbolic substitution that transforms the attack text to avoid simple keyword detection.

In [8]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[debug_access], 
                attacks=[leetspeak])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.42s/it]
✨ Simulating 3 attacks (using 1 method(s)): 0it [00:00, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 0it [00:00, ?it/s]


### ROT13

The ROT-13 attack method rotates each letter in the baseline attack by 13 positions in the alphabet, creating a scrambled message that is less likely to trigger basic content filters.

In [9]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[debug_access], 
                attacks=[rot_13])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.39s/it]
✨ Simulating 3 attacks (using 1 method(s)): 0it [00:00, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 0it [00:00, ?it/s]


### Multilingual

The multilingual attack method translates a baseline attack into a less-spoken language to evade content filters that focus on more widely used languages, such as English.

In [10]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[debug_access], 
                attacks=[multilingual])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.61s/it]
✨ Simulating 3 attacks (using 1 method(s)): 0it [00:00, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 0it [00:00, ?it/s]


### Math Problem

The math problem attack method enhances baseline attacks into mathematically formatted inputs or embed them within math problems, making them seem like non-harmful queries while disguising their true intent.

In [11]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[debug_access], 
                attacks=[math_problem])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.75s/it]
✨ Simulating 3 attacks (using 1 method(s)): 0it [00:00, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 0it [00:00, ?it/s]


### Base64

The base64 attack method encodes a baseline attack using the Base64 algorithm, turning the text into a string of encoded characters. This helps avoid detection mechanisms by making the content appear as random data.

In [12]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[debug_access], 
                attacks=[base64 ])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.56s/it]
✨ Simulating 3 attacks (using 1 method(s)): 0it [00:00, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 0it [00:00, ?it/s]
