From ed239d066051cf3e69ee9ff76ff3938b8e90ccd8 Mon Sep 17 00:00:00 2001
From: Hemang <hemang@invariantlabs.ai>
Date: Sun, 8 Dec 2024 22:15:26 +0100
Subject: [PATCH 1/5] Add documentation for Langgraph and Swarm examples from
 testing.

---
 docs/testing/Examples/langgraph.md | 149 +++++++++++++++++++++++++++++
 docs/testing/Examples/swarm.md     | 120 +++++++++++++++++++++++
 2 files changed, 269 insertions(+)
 create mode 100644 docs/testing/Examples/langgraph.md
 create mode 100644 docs/testing/Examples/swarm.md
diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md
new file mode 100644
index 0000000..8ac1301
--- /dev/null
+++ b/docs/testing/Examples/langgraph.md
@@ -0,0 +1,149 @@
+---
+title: LangGraph
+---
+
+# Intro
+
+LangGraph is a [library](https://github.com/langchain-ai/langgraph) for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. In this example, we build a weather agent that helps us answer queries about the weather by using tool calling.
+
+## Agent code
+
+You can view the agent code [here](https://github.com/invariantlabs-ai/testing/blob/main/sample_tests/langgraph/weather_agent/weather_agent.py).
+
+This can be invoked as:
+
+```python
+from langchain_core.messages import HumanMessage
+
+from .weather_agent import WeatherAgent
+
+invocation_response = WeatherAgent().get_graph().invoke(
+    {"messages": [HumanMessage(content="what is the weather in sf")]},
+    config={"configurable": {"thread_id": 42}},
+)
+```
+
+
+## Running example tests
+
+You can run the example tests discussed in this notebook by running the following command in the root of the repository:
+
+```bash
+poetry run invariant test sample_tests/langgraph/weather_agent/test_weather_agent.py --push --dataset_name langgraph_weather_agent
+```
+
+!!! note
+
+    If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
+    as higihlighted in the terminal.
+
+## Unit tests
+
+### Test 1:
+
+<div class='tiles'>
+<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/langgraph_weather_agent-1733695457/t/1" class='tile'>
+    <span class='tile-title'>Open in Explorer →</span>
+    <span class='tile-description'>See this example in the Invariant Explorer</span>
+</a>
+</div>
+
+```python
+def test_weather_agent_with_only_sf(weather_agent):
+    """Test the weather agent with San Francisco."""
+    invocation_response = weather_agent.invoke(
+        {"messages": [HumanMessage(content="what is the weather in sf")]},
+        config={"configurable": {"thread_id": 42}},
+    )
+
+    trace = TraceFactory.from_langgraph(invocation_response)
+
+    with trace.as_context():
+        find_weather_tool_calls = trace.tool_calls(name="_find_weather")
+        assert_true(F.len(find_weather_tool_calls) == 1)
+        assert_true(
+            find_weather_tool_calls[0]["function"]["arguments"].contains(
+                "San Francisco"
+            )
+        )
+
+        find_weather_tool_outputs = trace.messages(role="tool")
+        assert_true(F.len(find_weather_tool_outputs) == 1)
+        assert_true(
+            find_weather_tool_outputs[0]["content"].contains("60 degrees and foggy")
+        )
+
+        assert_true(trace.messages(-1)["content"].contains("60 degrees and foggy"))
+```
+
+We first use the `tool_calls()` method to retrieve all tool calls where the name is `_find_weather`, and we assert that there is exactly one such call. We also verify that the argument passed to the tool call includes `San Francisco`.
+
+Next, we use the `messages()` method with the `role="tool"` filter to check the output for `_find_weather` tool call, ensuring that the content of this output contains our desired answer.
+
+Finally, we confirm that the last message also includes our desired answer.
+
+### Test 2:
+
+<div class='tiles'>
+<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/langgraph_weather_agent-1733695457/t/2" class='tile'>
+    <span class='tile-title'>Open in Explorer →</span>
+    <span class='tile-description'>See this example in the Invariant Explorer</span>
+</a>
+</div>
+
+```python
+def test_weather_agent_with_sf_and_nyc(weather_agent):
+    """Test the weather agent with San Francisco and New York City."""
+    _ = weather_agent.invoke(
+        {"messages": [HumanMessage(content="what is the weather in sf")]},
+        config={"configurable": {"thread_id": 41}},
+    )
+    invocation_response = weather_agent.invoke(
+        {"messages": [HumanMessage(content="what is the weather in nyc")]},
+        config={"configurable": {"thread_id": 41}},
+    )
+
+    trace = TraceFactory.from_langgraph(invocation_response)
+
+    with trace.as_context():
+        find_weather_tool_calls = trace.tool_calls(name="_find_weather")
+        assert_true(len(find_weather_tool_calls) == 2)
+        find_weather_tool_call_args = str(
+            F.map(lambda x: x["function"]["arguments"], find_weather_tool_calls)
+        )
+        assert_true(
+            "San Francisco" in find_weather_tool_call_args
+            and "New York City" in find_weather_tool_call_args
+        )
+
+        find_weather_tool_outputs = trace.messages(role="tool")
+        assert_true(F.len(find_weather_tool_outputs) == 2)
+        assert_true(
+            find_weather_tool_outputs[0]["content"].contains("60 degrees and foggy")
+        )
+        assert_true(
+            find_weather_tool_outputs[1]["content"].contains("90 degrees and sunny")
+        )
+
+        assistant_response_messages = F.filter(
+            lambda m: m.get("tool_calls") is None, trace.messages(role="assistant")
+        )
+        assert_true(len(assistant_response_messages) == 2)
+        assert_true(
+            assistant_response_messages[0]["content"].contains(
+                "weather in San Francisco is"
+            )
+        )
+        assert_true(
+            assistant_response_messages[1]["content"].contains(
+                "weather in New York City is"
+            )
+        )
+```
+In this test, we use `F.map` to extract the arguments of the tool calls from the list of tool calls. We then assert that both our queries are present in the arguments list.
+
+There are two types of messages with `role="assistant"`: those where tool calls are made and those corresponding to the final response back to the caller. We use `F.filter` to filter out messages where `role="assistant"` but `tool_calls` is `None`. Finally, we assert that these response messages contain the results of the weather queries.
+
+## Conclusion
+
+We have seen how to to write unit tests for specific test cases when building an agent with the Langgraph library.
\ No newline at end of file
diff --git a/docs/testing/Examples/swarm.md b/docs/testing/Examples/swarm.md
new file mode 100644
index 0000000..3966499
--- /dev/null
+++ b/docs/testing/Examples/swarm.md
@@ -0,0 +1,120 @@
+---
+title: OpenAI Swarm
+---
+
+# Intro
+
+OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country.
+
+## Agent code
+You can view the agent code [here](sample_tests/swarm/capital_finder_agent/capital_finder_agent.py).
+
+This can be invoked as:
+
+```python
+from invariant.testing import SwarmWrapper
+from swarm import Swarm
+
+from .capital_finder_agent import create_agent
+
+swarm_wrapper = SwarmWrapper(Swarm())
+agent = create_agent()
+messages = [{"role": "user", "content": "What is the capital of France?"}]
+response = swarm_wrapper.run(
+    agent=agent,
+    messages=messages,
+)
+```
+
+SwarmWrapper is a lightweight wrapper around the Swarm class. The response of its `run(...)` method includes the current Swarm response along with the history of all messages exchanged.
+
+## Running example tests
+
+You can run the example tests discussed in this notebook by running the following command in the root of the repository:
+
+```bash
+poetry run invariant test sample_tests/swarm/capital_finder_agent/test_capital_finder_agent.py --push --dataset_name swarm_capital_finder_agent
+```
+
+!!! note
+
+    If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
+    as higihlighted in the terminal.
+
+## Unit tests
+
+### Test 1: Capital is correctly returned by the Agent
+
+<div class='tiles'>
+<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/swarm_capital_finder_agent-1733695570/t/1" class='tile'>
+    <span class='tile-title'>Open in Explorer →</span>
+    <span class='tile-description'>See this example in the Invariant Explorer</span>
+</a>
+</div>
+
+```python
+def test_capital_finder_agent_when_capital_found(swarm_wrapper):
+    """Test the capital finder agent when the capital is found."""
+    agent = create_agent()
+    messages = [{"role": "user", "content": "What is the capital of France?"}]
+    response = swarm_wrapper.run(
+        agent=agent,
+        messages=messages,
+    )
+    trace = SwarmWrapper.to_invariant_trace(response)
+
+    with trace.as_context():
+        get_capital_tool_calls = trace.tool_calls(name="get_capital")
+        assert_true(F.len(get_capital_tool_calls) == 1)
+        assert_equals(
+            "France", get_capital_tool_calls[0]["function"]["arguments"]["country_name"]
+        )
+
+        assert_true(trace.messages(-1)["content"].contains("Paris"))
+```
+
+We first use the `tool_calls()` method to retrieve all tool calls where the name is `get_capital`. Then, we assert that there is exactly one such tool call. We also assert that the argument `country_name` passed to the tool call is `France`. Additionally, we verify that the last message contains `Paris`, our desired answer.
+
+### Test 2: Capital is not found by the Agent
+
+<div class='tiles'>
+<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/swarm_capital_finder_agent-1733695570/t/2" class='tile'>
+    <span class='tile-title'>Open in Explorer →</span>
+    <span class='tile-description'>See this example in the Invariant Explorer</span>
+</a>
+</div>
+
+```python
+def test_capital_finder_agent_when_capital_not_found(swarm_wrapper):
+    """Test the capital finder agent when the capital is not found."""
+    agent = create_agent()
+    messages = [{"role": "user", "content": "What is the capital of Spain?"}]
+    response = swarm_wrapper.run(
+        agent=agent,
+        messages=messages,
+    )
+    trace = SwarmWrapper.to_invariant_trace(response)
+
+    with trace.as_context():
+        get_capital_tool_calls = trace.tool_calls(name="get_capital")
+        assert_true(F.len(get_capital_tool_calls) == 1)
+        assert_equals(
+            "Spain", get_capital_tool_calls[0]["function"]["arguments"]["country_name"]
+        )
+
+        tool_outputs = trace.tool_outputs(tool_name="get_capital")
+        assert_true(F.len(tool_outputs) == 1)
+        assert_true(tool_outputs[0]["content"].contains("not_found"))
+
+        assert_false(trace.messages(-1)["content"].contains("Madrid"))
+```
+
+We use the `tool_calls()` method to retrieve all calls with the name `get_capital`, asserting that there is exactly one such call and that the argument `country_name` is `Spain`.
+
+Next, we use the `tool_outputs()` method to check the outputs for `get_capital` calls, confirming that the call returned `not_found`, as the agent's local dictionary of country-to-capital mappings does not include `Spain`.
+
+Finally, we verify that the last message does not contain `Madrid`, consistent with the absence of `Spain` in the agent's limited mapping.
+
+## Conclusion
+
+We have seen how to to write unit tests for specific test cases when building an agent with the Swarm framework.
\ No newline at end of file

From ccf7ba253c7f0e9147884fa123a8950ffe170f37 Mon Sep 17 00:00:00 2001
From: Hemang <hemang@invariantlabs.ai>
Date: Mon, 9 Dec 2024 10:31:43 +0100
Subject: [PATCH 2/5] Since contains ignore case by default, update the tests.

---
 docs/testing/Examples/langgraph.md | 2 +-
 docs/testing/Examples/swarm.md     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md
index 8ac1301..ca24b6c 100644
--- a/docs/testing/Examples/langgraph.md
+++ b/docs/testing/Examples/langgraph.md
@@ -63,7 +63,7 @@ def test_weather_agent_with_only_sf(weather_agent):
         assert_true(F.len(find_weather_tool_calls) == 1)
         assert_true(
             find_weather_tool_calls[0]["function"]["arguments"].contains(
-                "San Francisco"
+                "San francisco"
             )
         )
 
diff --git a/docs/testing/Examples/swarm.md b/docs/testing/Examples/swarm.md
index 3966499..6409f4d 100644
--- a/docs/testing/Examples/swarm.md
+++ b/docs/testing/Examples/swarm.md
@@ -70,7 +70,7 @@ def test_capital_finder_agent_when_capital_found(swarm_wrapper):
             "France", get_capital_tool_calls[0]["function"]["arguments"]["country_name"]
         )
 
-        assert_true(trace.messages(-1)["content"].contains("Paris"))
+        assert_true(trace.messages(-1)["content"].contains("paris"))
 ```
 
 We first use the `tool_calls()` method to retrieve all tool calls where the name is `get_capital`. Then, we assert that there is exactly one such tool call. We also assert that the argument `country_name` passed to the tool call is `France`. Additionally, we verify that the last message contains `Paris`, our desired answer.

From 701a7fc809c20684c37954b0c46d826fa2ce54d1 Mon Sep 17 00:00:00 2001
From: Hemang <hemang@invariantlabs.ai>
Date: Tue, 10 Dec 2024 16:01:06 +0100
Subject: [PATCH 3/5] Update the docs to include the pip command.

---
 docs/testing/Examples/langgraph.md | 7 +++++++
 docs/testing/Examples/swarm.md     | 9 ++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md
index ca24b6c..7382b39 100644
--- a/docs/testing/Examples/langgraph.md
+++ b/docs/testing/Examples/langgraph.md
@@ -6,6 +6,13 @@ title: LangGraph
 
 LangGraph is a [library](https://github.com/langchain-ai/langgraph) for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. In this example, we build a weather agent that helps us answer queries about the weather by using tool calling.
 
+## Setup
+To use `langgraph`, you need to need to install the corresponding package:
+
+```bash
+pip install langgraph
+```
+
 ## Agent code
 
 You can view the agent code [here](https://github.com/invariantlabs-ai/testing/blob/main/sample_tests/langgraph/weather_agent/weather_agent.py).
diff --git a/docs/testing/Examples/swarm.md b/docs/testing/Examples/swarm.md
index 6409f4d..d3018f2 100644
--- a/docs/testing/Examples/swarm.md
+++ b/docs/testing/Examples/swarm.md
@@ -6,13 +6,20 @@ title: OpenAI Swarm
 
 OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country.
 
+## Setup
+To use `Swarm`, you need to need to install the corresponding package:
+
+```bash
+pip install openai-swarm
+```
+
 ## Agent code
 You can view the agent code [here](sample_tests/swarm/capital_finder_agent/capital_finder_agent.py).
 
 This can be invoked as:
 
 ```python
-from invariant.testing import SwarmWrapper
+from invariant.wrappers.swarm_wrapper import SwarmWrapper
 from swarm import Swarm
 
 from .capital_finder_agent import create_agent

From b476f13a3a3bb870a1db094b2c2d3836c2899254 Mon Sep 17 00:00:00 2001
From: Hemang <hemang@invariantlabs.ai>
Date: Tue, 10 Dec 2024 16:57:25 +0100
Subject: [PATCH 4/5] Use argument() instead of [function][arguments]

---
 docs/testing/Examples/langgraph.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md
index 7382b39..ec31179 100644
--- a/docs/testing/Examples/langgraph.md
+++ b/docs/testing/Examples/langgraph.md
@@ -116,7 +116,7 @@ def test_weather_agent_with_sf_and_nyc(weather_agent):
         find_weather_tool_calls = trace.tool_calls(name="_find_weather")
         assert_true(len(find_weather_tool_calls) == 2)
         find_weather_tool_call_args = str(
-            F.map(lambda x: x["function"]["arguments"], find_weather_tool_calls)
+            F.map(lambda x: x.argument(), find_weather_tool_calls)
         )
         assert_true(
             "San Francisco" in find_weather_tool_call_args

From 746b75f1413388fc66fd26dd55e0fb66f204f107 Mon Sep 17 00:00:00 2001
From: Luca Beurer-Kellner <lucabeurerkellner@gmail.com>
Date: Tue, 10 Dec 2024 18:00:10 +0100
Subject: [PATCH 5/5] unify + minor tweaks

---
 docs/testing/Examples/langgraph.md | 14 +++++++++++++-
 docs/testing/Examples/swarm.md     | 11 ++++++++++-
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md
index ec31179..dc184f2 100644
--- a/docs/testing/Examples/langgraph.md
+++ b/docs/testing/Examples/langgraph.md
@@ -2,7 +2,11 @@
 title: LangGraph
 ---
 
-# Intro
+# LangGraph Agents
+
+<div class="subtitle">
+Write tests for your <code>langgraph</code> applications.
+</div>
 
 LangGraph is a [library](https://github.com/langchain-ai/langgraph) for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. In this example, we build a weather agent that helps us answer queries about the weather by using tool calling.
 
@@ -46,6 +50,14 @@ poetry run invariant test sample_tests/langgraph/weather_agent/test_weather_agen
 
 ## Unit tests
 
+We can now use `testing` to assess the correctness of our agent. We will write two tests to verify different properties of the agents' behavior. For this, we want to verify that:
+
+1. The agent can correctly answer a query about the weather in San Francisco.
+
+2. The agent can correctly answer queries when asked about both the weather in San Francisco and New York City.
+
+For this, we will use `TraceFactory` to create traces from the invocation response and then use the corresponding `Trace` methods to examine the resulting runtime traces.
+
 ### Test 1:
 
 <div class='tiles'>
diff --git a/docs/testing/Examples/swarm.md b/docs/testing/Examples/swarm.md
index d3018f2..6a881ca 100644
--- a/docs/testing/Examples/swarm.md
+++ b/docs/testing/Examples/swarm.md
@@ -2,7 +2,11 @@
 title: OpenAI Swarm
 ---
 
-# Intro
+# Swarm Agents
+
+<div class="subtitle">
+Test your OpenAI <code>swarm</code> agents.
+</div>
 
 OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country.
 
@@ -50,6 +54,11 @@ poetry run invariant test sample_tests/swarm/capital_finder_agent/test_capital_f
 
 ## Unit tests
 
+We can now use `testing` to assess the correctness of our agent. We will write two tests to verify different properties of the agents' behavior. For this, we want to verify that:
+
+1. The agent can correctly answer a query about the capital of France.
+2. The agent handles correctly when a given capital cannot be determined.
+
 ### Test 1: Capital is correctly returned by the Agent
 
 <div class='tiles'>