# Tutorial 5: Building an Email Search Agent

This tutorial walks through a complete, real-world use case: building and training an agent that can search a database of emails to answer user questions. This example, located in `examples/07_email_search_agent/`, demonstrates many of ToolBrain's advanced features, including custom tools, complex reward functions, and a full experiment workflow.

## The Goal

We will build an agent that can intelligently use two tools, `search_emails` and `read_email`, to navigate the Enron email dataset and find answers to questions like "When is Shari's move to Portland targeted for?"

This is a multi-step reasoning task. The agent must first devise a search query, execute it, read the relevant email(s), and then synthesize an answer.

## Step 1: Setting Up the Environment

First, we need to set up the email database. The Enron email dataset will be downloaded from Hugging Face and loaded into a local SQLite database.

Run the setup script from the project root:

```bash
python -m examples.07_email_search_agent.1_setup_enron_environment
```

This script will create the file `data/enron_emails.db`, which the agent's tools will interact with.

## Step 2: The Tools of the Trade

The agent's capabilities are defined in `examples/07_email_search_agent/email_tools.py`. It has two primary tools:

1.  **`search_emails(...)`**: This tool performs a keyword search on the email database. It can filter by sender, recipient, and date. It returns a list of matching emails with their `message_id` and a short `snippet`.

2.  **`read_email(message_id: str)`**: This tool takes a `message_id` from a search result and retrieves the full content of that email, including the subject, sender, recipients, and body.

These tools work in tandem, mimicking how a human would use a search engine: search first, then click to read.

## Step 3: The Training Script

The core of this example is `2_run_training_experiments.py`. This script orchestrates the entire training and evaluation process. Let's look at the key parts.

### Configuration (`config.py`)

All important parameters are stored in `config.py`. This includes:
- `BASE_MODEL_ID`: The language model to be fine-tuned.
- `JUDGE_MODEL_ID`: The LLM to use for the `art_judge` reward function.
- `GRPO_CONFIG` and `DPO_CONFIG`: Hyperparameters for each learning algorithm.
- `SYSTEM_PROMPT_TEMPLATE`: A detailed prompt that sets the context for the agent.

### Custom Reward Functions (`custom_rewards.py`)

This use case introduces two custom reward functions:

1.  **`reward_f1_score`**: A fast, lexical-based reward that calculates the F1 score between the agent's answer and the ground truth. It's objective but can be brittle.

2.  **`reward_art_style_judge`**: A more advanced reward function that uses an external LLM (the "judge") to determine if the agent's answer is semantically correct. It also includes a small bonus for efficiency (fewer turns). This allows for more nuanced and robust evaluation.

### Running an Experiment

The script is designed to be run from the command line, allowing you to easily experiment with different settings.

**To train with GRPO and the LLM-as-a-Judge reward:**
```bash
python -m examples.07_email_search_agent.2_run_training_experiments \
    --algorithm GRPO \
    --reward_function art_judge \
    --output_dir ./models/art_e_grpo_art_judge
```

**To train with DPO and the F1 score reward:**
```bash
python -m examples.07_email_search_agent.2_run_training_experiments \
    --algorithm DPO \
    --reward_function f1 \
    --output_dir ./models/art_e_dpo_f1
```

During the run, the script will periodically evaluate the agent on a held-out test set and save the results.

## Step 4: Evaluation and Visualization

- **Evaluation**: The script `run_evaluation_art_style.py` provides a standalone way to evaluate a trained agent, faithfully replicating the methodology from the ART-E paper on which this example is based.

- **Plotting**: After training, you can visualize the learning curves using `4_plot_learning_curves.py`.

    ```bash
    python -m examples.07_email_search_agent.4_plot_learning_curves \
        --results_files ./models/art_e_grpo_art_judge/validation_history_art.json \
        --labels "GRPO_with_ART_Judge" \
        --output_image learning_curve.png
    ```

## Step 5: Live Demo with Gradio

Once you have a trained model, you can interact with it live using the Gradio web interface.

```bash
python -m examples.07_email_search_agent.3_gradio_live_demo \
    --model_dir ./models/art_e_grpo_art_judge
```

This will launch a web server with a chat interface where you can ask your trained agent questions about the Enron emails.

---

This email search agent example showcases the power and flexibility of ToolBrain for building sophisticated, multi-step agents and systematically improving their performance through targeted reinforcement learning.