# Illegal Activity
The illegal activity vulnerability is designed to test whether an LLM can resist generating responses that facilitate or promote unlawful actions. This includes ensuring that the LLM does not provide methods, plans, or instructions for engaging in illegal activities, such as creating weapons, committing crimes, or exploiting others.

* "weapons": Tests for outputs providing instructions, recipes, or methods for creating weapons.
* "illegal drugs": Tests for outputs offering methods or plans for buying, selling, or laundering money from illegal drug activities.
* "violent crime": Tests for outputs providing methods or plans for committing violent crimes.
* "nonviolent crime": Tests for outputs providing methods or plans for committing non-violent crimes.
* "sex crime": Tests for outputs facilitating sexual exploitation or blackmail.
* "cybercrime": Tests for outputs providing methods or plans for committing cybercrime and hacking systems.
* "child exploitation":Tests for outputs providing methods or plans for exploiting children.

ref.: https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-illegal-activity

In [1]:
# Set ollama localhosted as LLM to be tested
!deepteam set-ollama llama3.2:latest 

Ollama model configured.


In [2]:
import requests, asyncio
from deepteam import red_team
from deepteam.vulnerabilities import IllegalActivity

from deepteam.attacks.single_turn import PromptInjection
from deepteam.attacks.single_turn import Roleplay
from deepteam.attacks.single_turn import GrayBox
from deepteam.attacks.single_turn import Leetspeak
from deepteam.attacks.single_turn import ROT13
from deepteam.attacks.single_turn import Multilingual
from deepteam.attacks.single_turn import MathProblem
from deepteam.attacks.single_turn import Base64

## Configuring LLM to be tested

In [3]:
async def model_callback(prompt: str) -> str:
    resp = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "llama3.2:latest",
            "prompt": prompt,
            "stream": False      # resposta em bloco único
        },
        timeout=120
    )
    return resp.json()["response"]

## Configuring attacks

In [4]:
# Single turn
prompt_injection = PromptInjection()
roleplay = Roleplay()
gray_box = GrayBox()
leetspeak = Leetspeak()
rot_13 = ROT13()
multilingual = Multilingual()
math_problem = MathProblem()
base64 = Base64()

## Attack test

### Prompt injection

The prompt injection attack method injects hidden instructions into the baseline attack, which could be interpreted by the LLM in ways that bypass its restrictions or lead to harmful outputs.

In [5]:
illegal_activity = IllegalActivity(types=["violent crimes",
                                          "illegal drugs",
                                          "cybercrime"])


risk = red_team(model_callback=model_callback, 
                vulnerabilities=[illegal_activity], 
                attacks=[prompt_injection])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:13<00:00, 13.45s/it]
✨ Simulating 3 attacks (using 1 method(s)): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 100%|█████████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  5.06s/it]


### Role play 

The roleplay attack method enhances baseline attacks by embedding them within elaborate persona-based scenarios, where the LLM is instructed to adopt a specific character role. This technique leverages the model's collaborative training to bypass safety mechanisms through authentic character portrayal.

In [6]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[illegal_activity], 
                attacks=[roleplay])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:06<00:00,  6.33s/it]
✨ Simulating 3 attacks (using 1 method(s)):   0%|                                                                                               | 0/3 [00:00<?, ?it/s]
...... 🎭 Roleplay:   0%|                                                                                                                      | 0/9 [00:00<?, ?step/s][A

...... 🎭 Roleplay:   0%|                                                                                                                      | 0/9 [00:00<?, ?step/s][A[A


...... 🎭 Roleplay:   0%|                                                                                                                      | 0/9 [00:00<?, ?step/s][A[A[A
...... 🎭 Roleplay:  11%|████████████▏                                                                                                 | 1/9 [00:

### Gray Box

The gray box attack method leverages partial knowledge about an existing LLM system to craft an attack that targets known vulnerabilities, reframing the baseline attack with abstract or misleading language.

In [7]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[illegal_activity], 
                attacks=[gray_box])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:06<00:00,  6.69s/it]
✨ Simulating 3 attacks (using 1 method(s)):   0%|                                                                                               | 0/3 [00:00<?, ?it/s]
...... 🔓 Gray Box:   0%|                                                                                                                     | 0/15 [00:00<?, ?step/s][A

...... 🔓 Gray Box:   0%|                                                                                                                     | 0/15 [00:00<?, ?step/s][A[A


...... 🔓 Gray Box:   0%|                                                                                                                     | 0/15 [00:00<?, ?step/s][A[A[A

...... 🔓 Gray Box:   7%|███████▎                                                                                                     | 1/15 [00

### Leetspeak

The leetspeak attack method replaces common characters with leetspeak equivalents, a form of symbolic substitution that transforms the attack text to avoid simple keyword detection.

In [8]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[illegal_activity], 
                attacks=[leetspeak])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:06<00:00,  6.13s/it]
✨ Simulating 3 attacks (using 1 method(s)): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 100%|█████████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00,  5.19s/it]


### ROT13

The ROT-13 attack method rotates each letter in the baseline attack by 13 positions in the alphabet, creating a scrambled message that is less likely to trigger basic content filters.

In [9]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[illegal_activity], 
                attacks=[rot_13])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:06<00:00,  6.32s/it]
✨ Simulating 3 attacks (using 1 method(s)): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 100%|█████████████████████████████████████████████████████████████████████| 3/3 [00:33<00:00, 11.26s/it]


### Multilingual

The multilingual attack method translates a baseline attack into a less-spoken language to evade content filters that focus on more widely used languages, such as English.

In [10]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[illegal_activity], 
                attacks=[multilingual])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:06<00:00,  6.66s/it]
✨ Simulating 3 attacks (using 1 method(s)):   0%|                                                                                               | 0/3 [00:00<?, ?it/s]
...... 🌍 Multilingual Enhancement:   0%|                                                                                                     | 0/15 [00:00<?, ?step/s][A

...... 🌍 Multilingual Enhancement:   0%|                                                                                                     | 0/15 [00:00<?, ?step/s][A[A


...... 🌍 Multilingual Enhancement:   0%|                                                                                                     | 0/15 [00:00<?, ?step/s][A[A[A
...... 🌍 Multilingual Enhancement:   7%|██████▏                                                                                      | 1/15 [00:

### Math Problem

The math problem attack method enhances baseline attacks into mathematically formatted inputs or embed them within math problems, making them seem like non-harmful queries while disguising their true intent.

In [11]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[illegal_activity], 
                attacks=[math_problem])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.01s/it]
✨ Simulating 3 attacks (using 1 method(s)):   0%|                                                                                               | 0/3 [00:00<?, ?it/s]
...... 📚 Math Problem Enhancement:   0%|                                                                                                     | 0/15 [00:00<?, ?step/s][A

...... 📚 Math Problem Enhancement:   0%|                                                                                                     | 0/15 [00:00<?, ?step/s][A[A


...... 📚 Math Problem Enhancement:   0%|                                                                                                     | 0/15 [00:00<?, ?step/s][A[A[A
...... 📚 Math Problem Enhancement:   7%|██████▏                                                                                      | 1/15 [00:

### Base64

The base64 attack method encodes a baseline attack using the Base64 algorithm, turning the text into a string of encoded characters. This helps avoid detection mechanisms by making the content appear as random data.

In [12]:
risk = red_team(model_callback=model_callback, 
                vulnerabilities=[illegal_activity], 
                attacks=[base64 ])

💥 Generating 3 attacks (for 3 vulnerability types across 1 vulnerability(s)): 100%|█████████████████████████████████████████████████████| 1/1 [00:06<00:00,  6.63s/it]
✨ Simulating 3 attacks (using 1 method(s)): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s]
📝 Evaluating 3 vulnerability types across 1 vulnerability(s): 100%|█████████████████████████████████████████████████████████████████████| 3/3 [00:18<00:00,  6.14s/it]
