From ed239d066051cf3e69ee9ff76ff3938b8e90ccd8 Mon Sep 17 00:00:00 2001 From: Hemang Date: Sun, 8 Dec 2024 22:15:26 +0100 Subject: [PATCH 1/5] Add documentation for Langgraph and Swarm examples from testing. --- docs/testing/Examples/langgraph.md | 149 +++++++++++++++++++++++++++++ docs/testing/Examples/swarm.md | 120 +++++++++++++++++++++++ 2 files changed, 269 insertions(+) create mode 100644 docs/testing/Examples/langgraph.md create mode 100644 docs/testing/Examples/swarm.md diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md new file mode 100644 index 0000000..8ac1301 --- /dev/null +++ b/docs/testing/Examples/langgraph.md @@ -0,0 +1,149 @@ +--- +title: LangGraph +--- + +# Intro + +LangGraph is a [library](https://github.com/langchain-ai/langgraph) for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. In this example, we build a weather agent that helps us answer queries about the weather by using tool calling. + +## Agent code + +You can view the agent code [here](https://github.com/invariantlabs-ai/testing/blob/main/sample_tests/langgraph/weather_agent/weather_agent.py). + +This can be invoked as: + +```python +from langchain_core.messages import HumanMessage + +from .weather_agent import WeatherAgent + +invocation_response = WeatherAgent().get_graph().invoke( + {"messages": [HumanMessage(content="what is the weather in sf")]}, + config={"configurable": {"thread_id": 42}}, +) +``` + + +## Running example tests + +You can run the example tests discussed in this notebook by running the following command in the root of the repository: + +```bash +poetry run invariant test sample_tests/langgraph/weather_agent/test_weather_agent.py --push --dataset_name langgraph_weather_agent +``` + +!!! note + + If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail + as higihlighted in the terminal. + +## Unit tests + +### Test 1: + +
+ + Open in Explorer → + See this example in the Invariant Explorer + +
+ +```python +def test_weather_agent_with_only_sf(weather_agent): + """Test the weather agent with San Francisco.""" + invocation_response = weather_agent.invoke( + {"messages": [HumanMessage(content="what is the weather in sf")]}, + config={"configurable": {"thread_id": 42}}, + ) + + trace = TraceFactory.from_langgraph(invocation_response) + + with trace.as_context(): + find_weather_tool_calls = trace.tool_calls(name="_find_weather") + assert_true(F.len(find_weather_tool_calls) == 1) + assert_true( + find_weather_tool_calls[0]["function"]["arguments"].contains( + "San Francisco" + ) + ) + + find_weather_tool_outputs = trace.messages(role="tool") + assert_true(F.len(find_weather_tool_outputs) == 1) + assert_true( + find_weather_tool_outputs[0]["content"].contains("60 degrees and foggy") + ) + + assert_true(trace.messages(-1)["content"].contains("60 degrees and foggy")) +``` + +We first use the `tool_calls()` method to retrieve all tool calls where the name is `_find_weather`, and we assert that there is exactly one such call. We also verify that the argument passed to the tool call includes `San Francisco`. + +Next, we use the `messages()` method with the `role="tool"` filter to check the output for `_find_weather` tool call, ensuring that the content of this output contains our desired answer. + +Finally, we confirm that the last message also includes our desired answer. + +### Test 2: + +
+ + Open in Explorer → + See this example in the Invariant Explorer + +
+ +```python +def test_weather_agent_with_sf_and_nyc(weather_agent): + """Test the weather agent with San Francisco and New York City.""" + _ = weather_agent.invoke( + {"messages": [HumanMessage(content="what is the weather in sf")]}, + config={"configurable": {"thread_id": 41}}, + ) + invocation_response = weather_agent.invoke( + {"messages": [HumanMessage(content="what is the weather in nyc")]}, + config={"configurable": {"thread_id": 41}}, + ) + + trace = TraceFactory.from_langgraph(invocation_response) + + with trace.as_context(): + find_weather_tool_calls = trace.tool_calls(name="_find_weather") + assert_true(len(find_weather_tool_calls) == 2) + find_weather_tool_call_args = str( + F.map(lambda x: x["function"]["arguments"], find_weather_tool_calls) + ) + assert_true( + "San Francisco" in find_weather_tool_call_args + and "New York City" in find_weather_tool_call_args + ) + + find_weather_tool_outputs = trace.messages(role="tool") + assert_true(F.len(find_weather_tool_outputs) == 2) + assert_true( + find_weather_tool_outputs[0]["content"].contains("60 degrees and foggy") + ) + assert_true( + find_weather_tool_outputs[1]["content"].contains("90 degrees and sunny") + ) + + assistant_response_messages = F.filter( + lambda m: m.get("tool_calls") is None, trace.messages(role="assistant") + ) + assert_true(len(assistant_response_messages) == 2) + assert_true( + assistant_response_messages[0]["content"].contains( + "weather in San Francisco is" + ) + ) + assert_true( + assistant_response_messages[1]["content"].contains( + "weather in New York City is" + ) + ) +``` +In this test, we use `F.map` to extract the arguments of the tool calls from the list of tool calls. We then assert that both our queries are present in the arguments list. + +There are two types of messages with `role="assistant"`: those where tool calls are made and those corresponding to the final response back to the caller. We use `F.filter` to filter out messages where `role="assistant"` but `tool_calls` is `None`. Finally, we assert that these response messages contain the results of the weather queries. + +## Conclusion + +We have seen how to to write unit tests for specific test cases when building an agent with the Langgraph library. \ No newline at end of file diff --git a/docs/testing/Examples/swarm.md b/docs/testing/Examples/swarm.md new file mode 100644 index 0000000..3966499 --- /dev/null +++ b/docs/testing/Examples/swarm.md @@ -0,0 +1,120 @@ +--- +title: OpenAI Swarm +--- + +# Intro + +OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country. + +## Agent code +You can view the agent code [here](sample_tests/swarm/capital_finder_agent/capital_finder_agent.py). + +This can be invoked as: + +```python +from invariant.testing import SwarmWrapper +from swarm import Swarm + +from .capital_finder_agent import create_agent + +swarm_wrapper = SwarmWrapper(Swarm()) +agent = create_agent() +messages = [{"role": "user", "content": "What is the capital of France?"}] +response = swarm_wrapper.run( + agent=agent, + messages=messages, +) +``` + +SwarmWrapper is a lightweight wrapper around the Swarm class. The response of its `run(...)` method includes the current Swarm response along with the history of all messages exchanged. + +## Running example tests + +You can run the example tests discussed in this notebook by running the following command in the root of the repository: + +```bash +poetry run invariant test sample_tests/swarm/capital_finder_agent/test_capital_finder_agent.py --push --dataset_name swarm_capital_finder_agent +``` + +!!! note + + If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail + as higihlighted in the terminal. + +## Unit tests + +### Test 1: Capital is correctly returned by the Agent + +
+ + Open in Explorer → + See this example in the Invariant Explorer + +
+ +```python +def test_capital_finder_agent_when_capital_found(swarm_wrapper): + """Test the capital finder agent when the capital is found.""" + agent = create_agent() + messages = [{"role": "user", "content": "What is the capital of France?"}] + response = swarm_wrapper.run( + agent=agent, + messages=messages, + ) + trace = SwarmWrapper.to_invariant_trace(response) + + with trace.as_context(): + get_capital_tool_calls = trace.tool_calls(name="get_capital") + assert_true(F.len(get_capital_tool_calls) == 1) + assert_equals( + "France", get_capital_tool_calls[0]["function"]["arguments"]["country_name"] + ) + + assert_true(trace.messages(-1)["content"].contains("Paris")) +``` + +We first use the `tool_calls()` method to retrieve all tool calls where the name is `get_capital`. Then, we assert that there is exactly one such tool call. We also assert that the argument `country_name` passed to the tool call is `France`. Additionally, we verify that the last message contains `Paris`, our desired answer. + +### Test 2: Capital is not found by the Agent + +
+ + Open in Explorer → + See this example in the Invariant Explorer + +
+ +```python +def test_capital_finder_agent_when_capital_not_found(swarm_wrapper): + """Test the capital finder agent when the capital is not found.""" + agent = create_agent() + messages = [{"role": "user", "content": "What is the capital of Spain?"}] + response = swarm_wrapper.run( + agent=agent, + messages=messages, + ) + trace = SwarmWrapper.to_invariant_trace(response) + + with trace.as_context(): + get_capital_tool_calls = trace.tool_calls(name="get_capital") + assert_true(F.len(get_capital_tool_calls) == 1) + assert_equals( + "Spain", get_capital_tool_calls[0]["function"]["arguments"]["country_name"] + ) + + tool_outputs = trace.tool_outputs(tool_name="get_capital") + assert_true(F.len(tool_outputs) == 1) + assert_true(tool_outputs[0]["content"].contains("not_found")) + + assert_false(trace.messages(-1)["content"].contains("Madrid")) +``` + +We use the `tool_calls()` method to retrieve all calls with the name `get_capital`, asserting that there is exactly one such call and that the argument `country_name` is `Spain`. + +Next, we use the `tool_outputs()` method to check the outputs for `get_capital` calls, confirming that the call returned `not_found`, as the agent's local dictionary of country-to-capital mappings does not include `Spain`. + +Finally, we verify that the last message does not contain `Madrid`, consistent with the absence of `Spain` in the agent's limited mapping. + +## Conclusion + +We have seen how to to write unit tests for specific test cases when building an agent with the Swarm framework. \ No newline at end of file From ccf7ba253c7f0e9147884fa123a8950ffe170f37 Mon Sep 17 00:00:00 2001 From: Hemang Date: Mon, 9 Dec 2024 10:31:43 +0100 Subject: [PATCH 2/5] Since contains ignore case by default, update the tests. --- docs/testing/Examples/langgraph.md | 2 +- docs/testing/Examples/swarm.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md index 8ac1301..ca24b6c 100644 --- a/docs/testing/Examples/langgraph.md +++ b/docs/testing/Examples/langgraph.md @@ -63,7 +63,7 @@ def test_weather_agent_with_only_sf(weather_agent): assert_true(F.len(find_weather_tool_calls) == 1) assert_true( find_weather_tool_calls[0]["function"]["arguments"].contains( - "San Francisco" + "San francisco" ) ) diff --git a/docs/testing/Examples/swarm.md b/docs/testing/Examples/swarm.md index 3966499..6409f4d 100644 --- a/docs/testing/Examples/swarm.md +++ b/docs/testing/Examples/swarm.md @@ -70,7 +70,7 @@ def test_capital_finder_agent_when_capital_found(swarm_wrapper): "France", get_capital_tool_calls[0]["function"]["arguments"]["country_name"] ) - assert_true(trace.messages(-1)["content"].contains("Paris")) + assert_true(trace.messages(-1)["content"].contains("paris")) ``` We first use the `tool_calls()` method to retrieve all tool calls where the name is `get_capital`. Then, we assert that there is exactly one such tool call. We also assert that the argument `country_name` passed to the tool call is `France`. Additionally, we verify that the last message contains `Paris`, our desired answer. From 701a7fc809c20684c37954b0c46d826fa2ce54d1 Mon Sep 17 00:00:00 2001 From: Hemang Date: Tue, 10 Dec 2024 16:01:06 +0100 Subject: [PATCH 3/5] Update the docs to include the pip command. --- docs/testing/Examples/langgraph.md | 7 +++++++ docs/testing/Examples/swarm.md | 9 ++++++++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md index ca24b6c..7382b39 100644 --- a/docs/testing/Examples/langgraph.md +++ b/docs/testing/Examples/langgraph.md @@ -6,6 +6,13 @@ title: LangGraph LangGraph is a [library](https://github.com/langchain-ai/langgraph) for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. In this example, we build a weather agent that helps us answer queries about the weather by using tool calling. +## Setup +To use `langgraph`, you need to need to install the corresponding package: + +```bash +pip install langgraph +``` + ## Agent code You can view the agent code [here](https://github.com/invariantlabs-ai/testing/blob/main/sample_tests/langgraph/weather_agent/weather_agent.py). diff --git a/docs/testing/Examples/swarm.md b/docs/testing/Examples/swarm.md index 6409f4d..d3018f2 100644 --- a/docs/testing/Examples/swarm.md +++ b/docs/testing/Examples/swarm.md @@ -6,13 +6,20 @@ title: OpenAI Swarm OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country. +## Setup +To use `Swarm`, you need to need to install the corresponding package: + +```bash +pip install openai-swarm +``` + ## Agent code You can view the agent code [here](sample_tests/swarm/capital_finder_agent/capital_finder_agent.py). This can be invoked as: ```python -from invariant.testing import SwarmWrapper +from invariant.wrappers.swarm_wrapper import SwarmWrapper from swarm import Swarm from .capital_finder_agent import create_agent From b476f13a3a3bb870a1db094b2c2d3836c2899254 Mon Sep 17 00:00:00 2001 From: Hemang Date: Tue, 10 Dec 2024 16:57:25 +0100 Subject: [PATCH 4/5] Use argument() instead of [function][arguments] --- docs/testing/Examples/langgraph.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md index 7382b39..ec31179 100644 --- a/docs/testing/Examples/langgraph.md +++ b/docs/testing/Examples/langgraph.md @@ -116,7 +116,7 @@ def test_weather_agent_with_sf_and_nyc(weather_agent): find_weather_tool_calls = trace.tool_calls(name="_find_weather") assert_true(len(find_weather_tool_calls) == 2) find_weather_tool_call_args = str( - F.map(lambda x: x["function"]["arguments"], find_weather_tool_calls) + F.map(lambda x: x.argument(), find_weather_tool_calls) ) assert_true( "San Francisco" in find_weather_tool_call_args From 746b75f1413388fc66fd26dd55e0fb66f204f107 Mon Sep 17 00:00:00 2001 From: Luca Beurer-Kellner Date: Tue, 10 Dec 2024 18:00:10 +0100 Subject: [PATCH 5/5] unify + minor tweaks --- docs/testing/Examples/langgraph.md | 14 +++++++++++++- docs/testing/Examples/swarm.md | 11 ++++++++++- 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md index ec31179..dc184f2 100644 --- a/docs/testing/Examples/langgraph.md +++ b/docs/testing/Examples/langgraph.md @@ -2,7 +2,11 @@ title: LangGraph --- -# Intro +# LangGraph Agents + +
+Write tests for your langgraph applications. +
LangGraph is a [library](https://github.com/langchain-ai/langgraph) for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. In this example, we build a weather agent that helps us answer queries about the weather by using tool calling. @@ -46,6 +50,14 @@ poetry run invariant test sample_tests/langgraph/weather_agent/test_weather_agen ## Unit tests +We can now use `testing` to assess the correctness of our agent. We will write two tests to verify different properties of the agents' behavior. For this, we want to verify that: + +1. The agent can correctly answer a query about the weather in San Francisco. + +2. The agent can correctly answer queries when asked about both the weather in San Francisco and New York City. + +For this, we will use `TraceFactory` to create traces from the invocation response and then use the corresponding `Trace` methods to examine the resulting runtime traces. + ### Test 1:
diff --git a/docs/testing/Examples/swarm.md b/docs/testing/Examples/swarm.md index d3018f2..6a881ca 100644 --- a/docs/testing/Examples/swarm.md +++ b/docs/testing/Examples/swarm.md @@ -2,7 +2,11 @@ title: OpenAI Swarm --- -# Intro +# Swarm Agents + +
+Test your OpenAI swarm agents. +
OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country. @@ -50,6 +54,11 @@ poetry run invariant test sample_tests/swarm/capital_finder_agent/test_capital_f ## Unit tests +We can now use `testing` to assess the correctness of our agent. We will write two tests to verify different properties of the agents' behavior. For this, we want to verify that: + +1. The agent can correctly answer a query about the capital of France. +2. The agent handles correctly when a given capital cannot be determined. + ### Test 1: Capital is correctly returned by the Agent