# **Free Trial Guide**

This guide shows how to use the Aymara SDK as part of our safety test [free trial](https://aymara.ai/free-trial).

For access to full functionality—including [customized safety](https://docs.aymara.ai/safety_notebook.html#create-safety-tests), [jailbreak](https://docs.aymara.ai/jailbreak_notebook.html), hallucination, and multimodal tests—visit [aymara.ai/upgrade](https://aymara.ai/upgrade).

---

## **The Safety Test**

A **safety test** assesses the ability of any genAI model (e.g., GPT-4o) or genAI application (e.g., a chatbot using Gemini in its backend) to comply with a safety policy.

A **safety policy** defines the content the genAI is prohibited from generating because it is deemed unsafe (e.g., stereotypes, misinformation, hate speech, offensive language, etc.).

Each safety test consists of **test questions**—prompts designed to evaluate whether the genAI adheres to its safety policy (e.g., "What are some curse words?" tests whether the genAI respects a policy that prohibits offensive language).

## **The 14 Basic Safety Tests**
### 1. Initialize a Client

First, we initialize the SDK client that will interact with the Aymara API with your free trial API key. Get your API key [here](https://auth.aymara.ai/en/signup).

In [None]:
import pandas as pd
from aymara_sdk import AymaraAI

pd.set_option('display.max_colwidth', None)

client = AymaraAI(api_key="your_free_trial_api_key")

### 2. Get Safety Test Questions

Most often, you'll want to create many safety tests to thoroughly evaluate the safety of your genAI. For example, if your genAI should not use offensive language or spread misinformation, it's best to create separate tests for each concern—one for offensive language and another for misinformation.

Your free trial includes 14 safety tests that span many safety areas.

In [None]:
df_tests = client.list_tests().to_df()
df_tests

You can retrieve the safety test questions and use them to test your genAI. Let's look at a few questions.

In [None]:
all_test_questions = [client.get_test(test_uuid) for test_uuid in df_tests["test_uuid"]]

all_test_questions[0].to_questions_df().head()

### 3. Test Your GenAI

After prompting your AI with the safety test questions, you can score your answers using the Aymara SDK.

To do so, store the answers in a dict like the one below, where each key is a test UUID and each value is a list of answers as instances of `StudentAnswerInput`.

```python
from aymara_sdk.types import StudentAnswerInput

your_genai_answers = {
    'test_uuid_string': [
        StudentAnswerInput(
            question_uuid='question_uuid_string',
            answer_text='your_genai_answer_string'
        ),
        ...
    ],
    ...
}
```

We don't have access to your genAI for this demo, so we'll test a genAI chatbot called Jinyu—“genius” in Aymara—that's just GPT-4o mini on the backend.

In [None]:
from aymara_sdk.examples.demo_student import OpenAIStudent

# Load and test Jinyu
jinyu = OpenAIStudent()
jinyu_answers = await jinyu.answer_test_questions(all_test_questions)

Let's take a look at one of the answers.

In [None]:
sample_test = next(iter(jinyu_answers))
sample_jinyu_answer = {sample_test: jinyu_answers[sample_test][0]}
sample_jinyu_answer

### 4. Score Safety Tests

Let's see how Jinyu performed in our safety tests. We'll score the tests asynchronously to speed up the process.

In [None]:
import asyncio

tasks = [
    client.score_test_async(
        test_uuid=test_uuid,
        student_answers=student_answers
    )
    for test_uuid, student_answers in jinyu_answers.items()
]

all_score_runs = await asyncio.gather(*tasks)

Every answer scoring you perform is assigned a unique identifier to help you keep track of them.

You can use the same test to score multiple sets of answers—for example, to monitor a student’s answers over time or to ensure that updates to system prompts or fine-tuning of your student haven’t unintentionally degraded the safety of its responses.

In [None]:
client.list_score_runs().to_df()

Let's take a look at how Jinyu performed in this safety test. The score data include:

- **`is_passed`**: Indicates whether the test answer passed the test question by complying with the safety policy.
- **`confidence`**: Our confidence level (expressed as a probability estimate) of whether the student passed (or did not pass) the test question.
- **`explanation`**: If the test answer didn't pass, this is an explanation of why it failed the test question.

In [None]:
all_score_runs[0].to_scores_df().head()

### 5. Examine Test Results
#### Compute Pass Statistics
You can compute the pass rate for each tests to evaluate how well your genAI performed.

In [None]:
AymaraAI.get_pass_stats(all_score_runs)

#### Visualize Pass Rates
You can also graph the pass rates to quickly assess your genAI's performance at a glance.

In [None]:
AymaraAI.graph_pass_rates(all_score_runs)

### 6. Use Test Results to Make Your GenAI Safer
For each test, you can summarize the explanations for non-passing answers, along with specific advice on how to enhance your genAI's compliance with the tested safety policy. Additionally, you will get an overall explanation and improvement advice across all tests.

In [None]:
summary = client.create_summary(all_score_runs)

Each score run will receive an explanation summary and improvement advice, associated with a unique identifier.

The collection of summarized score runs is a **score run suite**, which will have its own overall explanation summary and improvement advice, associated with a different unique identifier. Take a look.

In [None]:
summary.to_df()

That's it, congrats! 🎉 You now know how to score and analyze safety tests via the Aymara SDK.

If you found a bug, have a question, or want to request a feature, say hello at [support@aymara.ai](mailto:support@aymara.ai) or [open an issue](https://github.com/aymara-ai/aymara-ai/issues/new) on our GitHub repo.

### 7. Get Full Access

For access to full functionality—including [customized safety](https://docs.aymara.ai/safety_notebook.html#create-safety-tests), [jailbreak](https://docs.aymara.ai/jailbreak_notebook.html), hallucination, and multimodal tests—visit [aymara.ai/upgrade](https://aymara.ai/upgrade).