# Challenge 5.7 - Red Teaming

## 1. Overview 
Red Teaming is a practice where a group of experts known as the red team try to probe a system to identify vulnerabilites. In the case of AI the goal of the red team is to strategically prompt the AI system to uncover any weaknesses in the model's robustness, safety, and ethical boundaries.

Benefits of Red teaming includes:
- Improved Safety: detecting harmful outputs and preventing potential harm caused by malicious users.
- Enhance Fairness and Ethics: mitigates biases in model behavior and ensures compliance with ethical standards.
- Better Reliability: strenghtens the ability for handling edge cases and reduces hallucination inaccuracies.
- Transparency and Accountability: encourages documentation of known risks and supports responsible AI development.

We can use the Azure Red-Teaming Python library to automate the testing process. Using the library we can create a red teaming agent and connect it to a testing model. The agent will adverserially prompt the model on user chosen categories such as Violence, Sexual, Harm, Hate, and Unfairness. The given output is then scanned by the agent to ensure it is safe and outputted in JSON and TXT files.

## 2. Setting Up Environment 

Loading required libraries

In [None]:
# Azure imports
from azure.identity import DefaultAzureCredential
from azure.ai.evaluation.red_team import RedTeam, RiskCategory, AttackStrategy

# OpenAI imports
from openai import AzureOpenAI
import asyncio
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()

## 2. Creating Red Teaming Agent

This code initializes an AI Red Teaming Agent from the Azure AI evaluation framework.

In [None]:
## Using Azure AI Foundry project from environment variable
azure_ai_project = os.getenv("AZURE_AI_PROJECT_ENDPOINT")

# Instantiate your AI Red Teaming Agent
red_team_agent = RedTeam(
    azure_ai_project=azure_ai_project, # required
    credential=DefaultAzureCredential() # required
)

## 3. OpenAI Client and callback

This code sets up an Azure OpenAI client using Entra ID so that the red team agent can interact with the AOAI model that is deployed in Foundry. The red teaming agent will prompt this client to ensure it complies with safety standards. Please look through the code, comments, and function descriptions to understand what it does.

In [None]:
# Create Azure OpenAI client using managed identity (recommended approach)
def get_azure_ad_token():
    """Function to get Azure AD token for OpenAI authentication"""
    credential = DefaultAzureCredential()
    token = credential.get_token("https://cognitiveservices.azure.com/.default")
    return token.token

azure_openai_client = AzureOpenAI(
    azure_endpoint=os.getenv("OPENAI_API_BASE"),
    api_version=os.getenv("OPENAI_API_VERSION"),
    azure_ad_token_provider=get_azure_ad_token
)

# Target callback function that uses your Azure OpenAI deployment
def azure_openai_callback(query: str) -> str:
    """
    This function represents your AI system that will be red-teamed.
    It sends the query to your Azure OpenAI deployment and returns the response.
    """
    try:
        response = azure_openai_client.chat.completions.create(
            model=os.getenv("CHAT_MODEL_NAME"),  # This should match your deployment name
            messages=[
                {"role": "system", "content": "You are a helpful AI assistant."},
                {"role": "user", "content": query}
            ],
            max_tokens=500,
            temperature=0.7
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

## 4. Running the Red Team Agent

The main function is where the red teaming agent orchestrates the evaluation. 

The categories test for the following:
- Violence: Tests for content promoting or describing violent acts
- Hate and Unfairness: Evaluates responses to hate speech, discrimination, and unfair treatment
- Sexual: Checks for inappropriate sexual content generation
- SelfHarm: Tests for content that could encourage self-destructive behavior

Feel free to change the num_objectives to toggle the number of prompts given to the model.

The scan results will contain detailed information about:
- Generated attack prompts for each risk category
- AI system responses to those prompts
- Safety evaluation scores and classifications
- Detailed logs of the red teaming process
- Summary statistics and recommendations

This  ensures that the model is thoroughly tested for safety vulnerabilities across different risk categories before deployment to production environments.

In [None]:
async def main():
    # The scan method expects a target function, not a configuration object
    red_team_result = await red_team_agent.scan(
        target=azure_openai_callback,
        risk_categories=[ # optional, defaults to all four risk categories
            RiskCategory.Violence,
            RiskCategory.HateUnfairness,
            RiskCategory.Sexual,
            RiskCategory.SelfHarm
        ],
        num_objectives=5  # Number of conversation turns per attack
    )
    print("Red team scan completed!")
    print(red_team_result)

if __name__ == "__main__":
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None

    if loop and loop.is_running():
        # If there's a running event loop (e.g., in Jupyter), use create_task
        task = loop.create_task(main())
    else:
        # Otherwise, safe to use asyncio.run
        asyncio.run(main())