<a href="https://www.nvidia.com/dli"> <img src="images/nvidia_header.png" style="margin-left: -30px; width: 300px; float: left;"> </a>

> **Deep Dive**: This notebook is part of the deep-dive series that extends [03_Evaluation_Observability_And_Optimization.ipynb](../03_Evaluation_Observability_And_Optimization.ipynb). In the previous notebooks, you were introduced to evaluation, observability, and optimization using a simple math agent. This deep-dive takes those same concepts to production depth using a real-world email phishing analyzer workflow with custom evaluators, advanced profiling, and multi-objective optimization.

# Email Phishing Analyzer Profiling Notebook

Welcome to the companion "notebook" for profiling the Email Phishing Analyzer workflow. If you just finished the evaluation notebook, this guide picks up where accuracy left off and shows you how to reason about runtime, cost, and scaling characteristics using the NeMo Agent Toolkit (NAT) Profiler.


## Why Profile?

**Profiling goals for phishing detection**
- Reveal which tools and LLM calls dominate latency so you can trim response times before production.
- Forecast future workload characteristics (tokens, runtime) using the profiler's forecasting models.
- Detect concurrency spikes or bottlenecks that could overwhelm shared services.
- Surface prompt fragments worth caching to reduce repeated work and latency.

> **Mental model:** Evaluation tells you "*Did we get the right answer?*" Profiling adds "*How expensive was that answer, and what happens under load?*" Keep both in sync for a resilient workflow.

## Prerequisites & Environment Setup

In [None]:
# Install the phishing workflow package in editable mode
! uv pip install -e .

# Confirm the CLI entry point is available
! nat --version

**Before moving on:**
For this notebook, you will need the following API keys to run all examples end-to-end:

NVIDIA Build: You can obtain an NVIDIA Build API Key by creating an NVIDIA Build account and generating a key at https://build.nvidia.com/settings/api-keys
Then you can run the cell below:

In [None]:
import getpass
import os

if "NVIDIA_API_KEY" not in os.environ:
    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key: ")
    os.environ["NVIDIA_API_KEY"] = nvidia_api_key

## Establish Paths & Helpers

In [None]:
from pathlib import Path
root = Path.cwd()
workflow_dir = root
config_path = workflow_dir / "configs" / "config_profiler.yml"
profile_output_dir = Path("eval_output_with_profiler")
config_path, profile_output_dir

**Tip:** Switch `profile_output_dir` to a timestamped folder (e.g., `profile_output_dir = root / '.tmp' / f"phishing_profile_{datetime.utcnow():%Y%m%d_%H%M%S}"`) while iterating so each profiling run has its own snapshot.

## Understand Instrumentation Hooks

Profiling works best when the workflow is instrumented. The phishing analyzer already opts into profiling decorators via `framework_wrappers` when registering each tool.


```python
@register_function(
    config_type=EmailPhishingAnalyzerConfig,
    framework_wrappers=[LLMFrameworkEnum.LANGCHAIN],
)
async def email_phishing_analyzer(
    config: EmailPhishingAnalyzerConfig, builder: Builder
) -> Any:
    """Register the email phishing analysis tool."""
```

**What to look for:** A value like `[LLMFrameworkEnum.LANGCHAIN]` confirms NAT will wrap LangChain calls with profiler callbacks. If you extend the workflow with new frameworks (e.g., LlamaIndex or CrewAI), add the appropriate enum to keep instrumentation intact.

> **If you build new tools:** add `framework_wrappers=[LLMFrameworkEnum.LANGCHAIN]` (or the relevant framework enum) in your `@register_function` decorator. That ensures every LLM call emits usage stats into the profiler trace.

## Inspect Profiler Configuration in `config.yml`

The profiler ships as part of the evaluation configuration. Scroll to the `eval.general.profiler` block to see the toggles enabled for this workflow.

```yaml
# Excerpt from config.yml
  general:
    output_dir: eval_outout_with_profiler
    verbose: true
    dataset:
      _type: csv
      file_path: data/smaller_test.csv
      id_key: "subject"
      structure:
        question_key: body
        answer_key: label

    profiler:
      token_uniqueness_forecast: true
      workflow_runtime_forecast: true
      compute_llm_metrics: true
      csv_exclude_io_text: true
      prompt_caching_prefixes:
        enable: true
        min_frequency: 0.1
      bottleneck_analysis:
        enable_nested_stack: true
      concurrency_spike_analysis:
        enable: true
        spike_threshold: 7
```

**Interpretation guide:**
- `token_uniqueness_forecast`: trains a lightweight model to predict how many *new* tokens subsequent emails will use, helping capacity planning for caching or rate limits.
- `workflow_runtime_forecast`: forecasts runtime to anticipate SLA breaches as inputs scale.
- `compute_llm_metrics`: produces latency/throughput summaries, percentile breakdowns, and per-tool cost stats.
- `csv_exclude_io_text`: keeps the CSV manageable by omitting raw prompt/completion text (flip to `false` if you need them for prompt analysis).
- `prompt_caching_prefixes`: runs PrefixSpan over prompts to suggest KV-cacheable prefixes when they appear in ≥10% of calls.
- `bottleneck_analysis.enable_nested_stack`: surfaces nested stacks showing where time is spent (tool inside agent, etc.). Swap to `simple_stack` for a flatter summary.
- `concurrency_spike_analysis`: alerts when concurrent calls exceed the specified threshold (7 here). Adjust based on your throughput targets.

Treat these toggles like building blocks: start with `compute_llm_metrics` for a quick latency snapshot, then layer forecasting or prefix mining once you trust the basics. Each option adds some processing time, so tailoring them to the question at hand keeps profiling runs snappy.

## Optional: Customize Profiler Settings on the Fly

You can clone the base config and tweak profiler parameters without editing the original file.

In [None]:
import yaml
profile_config = yaml.safe_load(config_path.read_text())
profile_config["eval"]["general"]["output_dir"] = "./.tmp/eval/email_phishing_analyzer/profile_run_01"
profile_config["eval"]["general"]["profiler"]["concurrency_spike_analysis"]["spike_threshold"] = 5
profile_config_path = workflow_dir / "configs" / "config_profile_experiment.yml"
profile_config_path.write_text(yaml.safe_dump(profile_config))
profile_config_path

**Good practice:** Treat profiling configs like experiments. Check them into version control with clear names so you can compare runs later (`profile_run_{date}_{change}`).

While cloning configs, adjust logging thresholds or telemetry sinks as well. Profiling often uncovers long-tail events, and richer logs will help you connect a latency spike back to the original agent thoughts or tool inputs.

## Run the Workflow with Profiling Enabled

In [None]:
! nat eval --config_file configs/config_profiler.yml

Expect the CLI to log both evaluation scores and profiler status updates. With `verbose: true`, you will see when forecasting models train and when reports are written.

> **Time-saving tip:** When iterating quickly, limit the dataset via `dataset.limit` (see the evaluation doc) so profiling runs faster while you dial in settings.

If you ever need to run profiling against a remote workflow (`nat eval --endpoint ...`), the profiler works the same way: evaluation still happens client-side, so make sure your local machine has access to the generated artifacts.

## Navigate the Profiler Output Directory

In [None]:
sorted(profile_output_dir.iterdir())

You should see files similar to:
- `all_requests_profiler_traces.json`
- `standardized_data_all.csv`
- `inference_optimization.json`
- Optional reports (e.g., `bottleneck_analysis_report.json`, `concurrency_spikes.json`, `prompt_prefixes.json`) depending on enabled toggles.

If the directory is empty, double-check you ran the evaluation command against the correct config and that the workflow produced at least one successful run.

A quick sanity check is to open `profile_output_dir / "standardized_data_all.csv"` and ensure there are rows for each tool you expected to fire. Zero rows often means the workflow short-circuited before hitting the instrumented functions.

## Explore Raw Traces

`all_requests_profiler_traces.json` captures every instrumented call (LLM/tool) with timestamps, token counts, and metadata.

In [None]:
import json
trace_path = profile_output_dir / "all_requests_profiler_traces.json"
traces = json.loads(trace_path.read_text())
print(json.dumps(traces[0], indent=4))

**Use cases:**
- Validate that each tool call is captured (look for `sensitive_info_detector`, `intent_classifier`, etc.).
- Investigate spikes by filtering on `metrics.latency_ms` or `metrics.tokens`.
- Feed these traces into custom dashboards if you need deeper visualization.

Each trace event also includes `inputs` and `outputs` metadata (sanitised if `csv_exclude_io_text` is true). Use that to correlate long-running calls with specific prompts or payload sizes—an invaluable clue when you suspect oversized inputs are slowing things down.

## Analyze Standardized CSV Data

`standardized_data_all.csv` is a tabular view ideal for quick pandas analysis.

In [None]:
import pandas as pd
csv_path = profile_output_dir / "standardized_data_all.csv"
df = pd.read_csv(csv_path)

# Inspect columns relevant to latency & tokens
df.head()

**Insights to extract:**
- High average latency in `phishing_risk_aggregator` may suggest prompt optimization or a smaller LLM.
- Large `prompt_tokens` for `sensitive_info_detector` hints at multi-turn agent traces that you might truncate or compress.
- Compare `latency_ms` percentiles (90th vs 99th) using `df.latency_ms.quantile([0.9, 0.99])` to size buffers for worst-case scenarios.

Because the CSV is standardized, you can combine multiple runs into a single DataFrame with a `run_id` column and build lightweight dashboards (box plots per component, token histograms, etc.) directly in pandas or your BI tool of choice.


## Optimization Reports

The profiler synthesizes higher-level insights in JSON summaries.

In [None]:
opt_path = profile_output_dir / "inference_optimization.json"
with opt_path.open() as f:
    optimization_report = json.load(f)
optimization_report.keys()

In [None]:
runtime_stats = optimization_report["workflow_runtimes"]
print(f"The p90 workflow runtime was {runtime_stats['p90']} seconds")

The optimization report also records confidence intervals and percentiles for LLM latency and agent request throughput. Run the cell below to explore those values. 

In [None]:
optimization_report['confidence_intervals']

## Concurrency Insights

When `bottleneck_analysis.enable_nested_stack` is on, the profiler also runs the workflow at varying levels of concurrency to understand the latency and concurrency profile of your workflow. For example, run the cell below to load the results of the stack and concurrency analysis. 

In [None]:
import json
opt_path = profile_output_dir / "workflow_profiling_metrics.json"
with opt_path.open() as f:
    analysis_report = json.load(f)

analysis_report.keys()

In [None]:
print(f"""The average LLM latency at various tested concurrencies of the workflow was 
{analysis_report['concurrency_spike_analysis']['average_latency_by_concurrency']}""")

In [None]:
# Explore other parts of the analysis report here

## Viewing Your Agent Execution

The profiler also produces a Gantt chart that you can use to visualize the execution of your agent to visually spot bottlenecks or understand performance. 

In [None]:
from IPython.display import Image
img_path = profile_output_dir / "gantt_chart.png"
Image(filename=img_path)

## Take Action on Profiling Insights

- **Reduce latency:** If a single tool dominates runtime, experiment with a smaller LLM (`llama_3_70`), tighten prompts, or reduce unnecessary intermediate calls.
- **Manage concurrency:** Add `max_concurrency` to your config or throttle tool usage when spikes exceed infrastructure limits.
- **Cache smartly:** Use `prompt_prefixes.json` output to configure KV caching. Combine with `token_uniqueness_forecast` to prioritize caches where new tokens are rare.