diff --git a/docs/agents.md b/docs/agents.md
new file mode 100644
index 0000000000..b25171f4d0
--- /dev/null
+++ b/docs/agents.md
@@ -0,0 +1,203 @@
+## Introduction
+
+Agents are PydanticAI's primary interface for interacting with LLMs.
+
+In some use cases a single Agent will control an entire application or component,
+but multiple agents can also interact to embody more complex workflows.
+
+The [`Agent`][pydantic_ai.Agent] class is well documented, but in essence you can think of an agent as a container for:
+
+* A [system prompt](#system-prompts) — a set of instructions for the LLM written by the developer
+* One or more [retrievers](#retrievers) — functions that the LLM may call to get information while generating a response
+* An optional structured [result type](results.md) — the structured datatype the LLM must return at the end of a run
+* A [dependency](dependencies.md) type constraint — system prompt functions, retrievers and result validators may all use dependencies when they're run
+* Agents may optionally also have a default [model](models/index.md) associated with them, the model to use can also be defined when running the agent
+
+In typing terms, agents are generic in their dependency and result types, e.g. an agent which required `#!python Foobar` dependencies and returned data of type `#!python list[str]` results would have type `#!python Agent[Foobar, list[str]]`.
+
+Here's a toy example of an agent that simulates a roulette wheel:
+
+```py title="roulette_wheel.py"
+from pydantic_ai import Agent, CallContext
+
+roulette_agent = Agent(  # (1)!
+    'openai:gpt-4o',
+    deps_type=int,
+    result_type=bool,
+    system_prompt=(
+        'Use the `roulette_wheel` to see if the '
+        'customer has won based on the number they provide.'
+    ),
+)
+
+
+@roulette_agent.retriever_context
+async def roulette_wheel(ctx: CallContext[int], square: int) -> str:  # (2)!
+    """check if the square is a winner"""
+    return 'winner' if square == ctx.deps else 'loser'
+
+
+# Run the agent
+success_number = 18  # (3)!
+result = roulette_agent.run_sync('Put my money on square eighteen', deps=success_number)
+print(result.data)  # (4)!
+#> True
+
+result = roulette_agent.run_sync('I bet five is the winner', deps=success_number)
+print(result.data)
+#> False
+```
+
+1. Create an agent, which expects an integer dependency and returns a boolean result, this agent will ahve type of `#!python Agent[int, bool]`.
+2. Define a retriever that checks if the square is a winner, here [`CallContext`][pydantic_ai.dependencies.CallContext] is parameterized with the dependency type `int`, if you got the dependency type wrong you'd get a typing error.
+3. In reality, you might want to use a random number here e.g. `random.randint(0, 36)` here.
+4. `result.data` will be a boolean indicating if the square is a winner, Pydantic performs the result validation, it'll be typed as a `bool` since its type is derived from the `result_type` generic parameter of the agent.
+
+!!! tip "Agents are Singletons, like FastAPI"
+    Agents are a singleton instance, you can think of them as similar to a small [`FastAPI`][fastapi.FastAPI] app or an [`APIRouter`][fastapi.APIRouter].
+
+## Running Agents
+
+There are three ways to run an agent:
+
+1. [`#!python agent.run()`][pydantic_ai.Agent.run] — a coroutine which returns a result containing a completed response, returns a [`RunResult`][pydantic_ai.result.RunResult]
+2. [`#!python agent.run_sync()`][pydantic_ai.Agent.run_sync] — a plain function which returns a result containing a completed response (internally, this just calls `#!python asyncio.run(self.run())`), returns a [`RunResult`][pydantic_ai.result.RunResult]
+3. [`#!python agent.run_stream()`][pydantic_ai.Agent.run_stream] — a coroutine which returns a result containing methods to stream a response as an async iterable, returns a [`StreamedRunResult`][pydantic_ai.result.StreamedRunResult]
+
+Here's a simple example demonstrating all three:
+
+```python title="run_agent.py"
+from pydantic_ai import Agent
+
+agent = Agent('openai:gpt-4o')
+
+result_sync = agent.run_sync('What is the capital of Italy?')
+print(result_sync.data)
+#> Rome
+
+
+async def main():
+    result = await agent.run('What is the capital of France?')
+    print(result.data)
+    #> Paris
+
+    async with agent.run_stream('What is the capital of the UK?') as response:
+        print(await response.get_data())
+        #> London
+```
+_(This example is complete, it can be run "as is")_
+
+You can also pass messages from previous runs to continue a conversation or provide context, as described in [Messages and Chat History](message-history.md).
+
+## Runs vs. Conversations
+
+An agent **run** might represent an entire conversation — there's no limit to how many messages can be exchanged in a single run. However, a **conversation** might also be composed of multiple runs, especially if you need to maintain state between separate interactions or API calls.
+
+Here's an example of a conversation comprised of multiple runs:
+
+```python title="conversation_example.py"
+from pydantic_ai import Agent
+
+agent = Agent('openai:gpt-4o')
+
+# First run
+result1 = agent.run_sync('Who was Albert Einstein?')
+print(result1.data)
+#> Albert Einstein was a German-born theoretical physicist.
+
+# Second run, passing previous messages
+result2 = agent.run_sync(
+    'What was his most famous equation?', message_history=result1.new_messages()  # (1)!
+)
+print(result2.data)
+#> Albert Einstein's most famous equation is (E = mc^2).
+```
+1. Continue the conversation, without `message_history` the model would not know who "he" was referring to.
+
+## System Prompts
+
+System prompts might seem simple at first glance since they're just strings (or sequences of strings that are concatenated), but crafting the right system prompt is key to getting the model to behave as you want.
+
+Generally, system prompts fall into two categories:
+
+1. **Static system prompts**: These are known when writing the code and can be defined via the `system_prompt` parameter of the `Agent` constructor.
+2. **Dynamic system prompts**: These aren't known until runtime and should be defined via functions decorated with `@agent.system_prompt`.
+
+You can add both to a single agent; they're concatenated in the order they're defined at runtime.
+
+Here's an example using both types of system prompts:
+
+```python title="system_prompts.py"
+from datetime import date
+
+from pydantic_ai import Agent, CallContext
+
+agent = Agent(
+    'openai:gpt-4o',
+    deps_type=str,  # (1)!
+    system_prompt="Use the customer's name while replying to them.",  # (2)!
+)
+
+
+@agent.system_prompt  # (3)!
+def add_the_users_name(ctx: CallContext[str]) -> str:
+    return f"The user's named is {ctx.deps}."
+
+
+@agent.system_prompt
+def add_the_date() -> str:  # (4)!
+    return f'The date is {date.today()}.'
+
+
+result = agent.run_sync('What is the date?', deps='Frank')
+print(result.data)
+#> Hello Frank, the date today is 2032-01-02.
+```
+
+1. The agent expects a string dependency.
+2. Static system prompt defined at agent creation time.
+3. Dynamic system prompt defined via a decorator.
+4. Another dynamic system prompt, system prompts don't have to have the `CallContext` parameter.
+
+## Retrievers
+
+* two different retriever decorators (`retriver_plain` and `retriever_context`) depending on whether you want to use the context or not, show an example using both
+* retriever parameters are extracted and used to build the schema for the tool, then validated with pydantic
+* if a retriever has a single "model like" parameter (e.g. pydantic mode, dataclass, typed dict), the schema for the tool will but just that type
+* docstrings are parsed to get the tool description, thanks to griffe docs for each parameter are extracting using Google, numpy or sphinx docstring styling
+* You can raise `ModelRetry` from within a retriever to suggest to the model it should retry
+* the return type of retriever can either be `str` or a JSON object typed as `dict[str, Any]` as some models (e.g. Gemini) support structured return values, some expect text (OpenAI) but seem to be just as good at extracting meaning from the data
+
+## Reflection and self-correction
+
+* validation errors from both retrievers parameter validation and structured result validation can be passed back to the with a request to retry
+* as described above, you can also raise `ModelRetry` from within a retriever or result validator to tell the model it should retry
+* the default retry count is 1, but can be altered both on a whole agent, or on a per-retriever basis and result validator basis
+* you can access the current retry count from within a retriever or result validator via `ctx.retry`
+
+## Model errors
+
+* If models behave unexpectedly, e.g. the retry limit is exceed, agent runs will raise `UnexpectedModelBehaviour` exceptions
+* If you use PydanticAI in correctly, we try to raise a `UserError` with a helpful message
+* show an except of a `UnexpectedModelBehaviour` being raised
+* if a `UnexpectedModelBehaviour` is raised, you may want to access the [`.last_run_messages`][pydantic_ai.Agent.last_run_messages] attribute of an agent to see the messages exchanged that led to the error, show an example of accessing `.last_run_messages` in an except block to get more details
+
+## API Reference
+
+::: pydantic_ai.Agent
+    options:
+      members:
+        - __init__
+        - run
+        - run_sync
+        - run_stream
+        - model
+        - override_deps
+        - override_model
+        - last_run_messages
+        - system_prompt
+        - retriever_plain
+        - retriever_context
+        - result_validator
+
+::: pydantic_ai.exceptions
diff --git a/docs/api/agent.md b/docs/api/agent.md
deleted file mode 100644
index 06de0fcc4d..0000000000
--- a/docs/api/agent.md
+++ /dev/null
@@ -1,17 +0,0 @@
-# `pydantic_ai.Agent`
-
-::: pydantic_ai.Agent
-    options:
-      members:
-        - __init__
-        - run
-        - run_sync
-        - run_stream
-        - model
-        - override_deps
-        - override_model
-        - last_run_messages
-        - system_prompt
-        - retriever_plain
-        - retriever_context
-        - result_validator
diff --git a/docs/api/dependencies.md b/docs/api/dependencies.md
deleted file mode 100644
index 9f49436a0a..0000000000
--- a/docs/api/dependencies.md
+++ /dev/null
@@ -1,3 +0,0 @@
-# `pydantic_ai.dependencies`
-
-::: pydantic_ai.dependencies
diff --git a/docs/api/exceptions.md b/docs/api/exceptions.md
deleted file mode 100644
index 277b8fd350..0000000000
--- a/docs/api/exceptions.md
+++ /dev/null
@@ -1,3 +0,0 @@
-# `pydantic_ai.exceptions`
-
-::: pydantic_ai.exceptions
diff --git a/docs/api/messages.md b/docs/api/messages.md
deleted file mode 100644
index 9986b8504e..0000000000
--- a/docs/api/messages.md
+++ /dev/null
@@ -1,17 +0,0 @@
-# `pydantic_ai.messages`
-
-::: pydantic_ai.messages
-    options:
-      members:
-        - Message
-        - SystemPrompt
-        - UserPrompt
-        - ToolReturn
-        - RetryPrompt
-        - ModelAnyResponse
-        - ModelTextResponse
-        - ModelStructuredResponse
-        - ToolCall
-        - ArgsJson
-        - ArgsObject
-        - MessagesTypeAdapter
diff --git a/docs/concepts/agents.md b/docs/concepts/agents.md
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/docs/concepts/results.md b/docs/concepts/results.md
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/docs/concepts/retrievers.md b/docs/concepts/retrievers.md
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/docs/concepts/streaming.md b/docs/concepts/streaming.md
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/docs/concepts/system-prompt.md b/docs/concepts/system-prompt.md
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/docs/concepts/testing-evals.md b/docs/concepts/testing-evals.md
deleted file mode 100644
index e69de29bb2..0000000000
diff --git a/docs/concepts/dependencies.md b/docs/dependencies.md
similarity index 96%
rename from docs/concepts/dependencies.md
rename to docs/dependencies.md
index f566113998..d0e0719920 100644
--- a/docs/concepts/dependencies.md
+++ b/docs/dependencies.md
@@ -1,6 +1,6 @@
 # Dependencies
 
-PydanticAI uses a dependency injection system to provide data and services to your agent's [system prompts](system-prompt.md), [retrievers](retrievers.md) and [result validators](results.md#TODO).
+PydanticAI uses a dependency injection system to provide data and services to your agent's [system prompts](agents.md#system-prompts), [retrievers](agents.md#retrievers) and [result validators](results.md#result-validators).
 
 Matching PydanticAI's design philosophy, our dependency system tries to use existing best practice in Python development rather than inventing esoteric "magic", this should make dependencies type-safe, understandable easier to test and ultimately easier to deploy in production.
 
@@ -159,7 +159,7 @@ _(This example is complete, it can be run "as is")_
 
 ## Full Example
 
-As well as system prompts, dependencies can be used in [retrievers](retrievers.md) and [result validators](results.md#TODO).
+As well as system prompts, dependencies can be used in [retrievers](agents.md#retrievers) and [result validators](results.md#result-validators).
 
 ```python title="full_example.py" hl_lines="27-35 38-48"
 from dataclasses import dataclass
@@ -223,6 +223,8 @@ async def main():
 1. To pass `CallContext` and to a retriever, us the [`retriever_context`][pydantic_ai.Agent.retriever_context] decorator.
 2. `CallContext` may optionally be passed to a [`result_validator`][pydantic_ai.Agent.result_validator] function as the first argument.
 
+_(This example is complete, it can be run "as is")_
+
 ## Overriding Dependencies
 
 When testing agents, it's useful to be able to customise dependencies.
@@ -337,6 +339,10 @@ print(result.data)
 
 The following examples demonstrate how to use dependencies in PydanticAI:
 
-- [Weather Agent](../examples/weather-agent.md)
-- [SQL Generation](../examples/sql-gen.md)
-- [RAG](../examples/rag.md)
+- [Weather Agent](examples/weather-agent.md)
+- [SQL Generation](examples/sql-gen.md)
+- [RAG](examples/rag.md)
+
+## API Reference
+
+::: pydantic_ai.dependencies
diff --git a/docs/examples/chat-app.md b/docs/examples/chat-app.md
index a64859f8a3..b32bfd73dc 100644
--- a/docs/examples/chat-app.md
+++ b/docs/examples/chat-app.md
@@ -2,7 +2,7 @@ Simple chat app example build with FastAPI.
 
 Demonstrates:
 
-* [reusing chat history](../concepts/message-history.md)
+* [reusing chat history](../message-history.md)
 * serializing messages
 * streaming responses
 
diff --git a/docs/examples/rag.md b/docs/examples/rag.md
index 3db3f891a8..7624c85b9d 100644
--- a/docs/examples/rag.md
+++ b/docs/examples/rag.md
@@ -5,7 +5,7 @@ RAG search example. This demo allows you to ask question of the [logfire](https:
 Demonstrates:
 
 * retrievers
-* [agent dependencies](../concepts/dependencies.md)
+* [agent dependencies](../dependencies.md)
 * RAG search
 
 This is done by creating a database containing each section of the markdown documentation, then registering
diff --git a/docs/examples/sql-gen.md b/docs/examples/sql-gen.md
index 1b43abedf1..be1ba00f23 100644
--- a/docs/examples/sql-gen.md
+++ b/docs/examples/sql-gen.md
@@ -7,7 +7,7 @@ Demonstrates:
 * custom `result_type`
 * dynamic system prompt
 * result validation
-* [agent dependencies](../concepts/dependencies.md)
+* [agent dependencies](../dependencies.md)
 
 ## Running the Example
 
diff --git a/docs/examples/weather-agent.md b/docs/examples/weather-agent.md
index fb2f774d1d..d73f81ec52 100644
--- a/docs/examples/weather-agent.md
+++ b/docs/examples/weather-agent.md
@@ -4,7 +4,7 @@ Demonstrates:
 
 * retrievers
 * multiple retrievers
-* [agent dependencies](../concepts/dependencies.md)
+* [agent dependencies](../dependencies.md)
 
 In this case the idea is a "weather" agent — the user can ask for the weather in multiple locations,
 the agent will use the `get_lat_lng` tool to get the latitude and longitude of the locations, then use
diff --git a/docs/index.md b/docs/index.md
index ee0d3991f5..f1ef986a0a 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -90,7 +90,7 @@ async def main():
 7. Multiple retrievers can be registered with the same agent, the LLM can choose which (if any) retrievers to call in order to respond to a user.
 8. Run the agent asynchronously, conducting a conversation with the LLM until a final response is reached. You can also run agents synchronously with `run_sync`. Internally agents are all async, so `run_sync` is a helper using `asyncio.run` to call `run()`.
 9. The response from the LLM, in this case a `str`, Agents are generic in both the type of `deps` and `result_type`, so calls are typed end-to-end.
-10. [`result.all_messages()`](concepts/message-history.md) includes details of messages exchanged, this is useful both to understand the conversation that took place and useful if you want to continue the conversation later — messages can be passed back to later `run/run_sync` calls.
+10. [`result.all_messages()`](message-history.md) includes details of messages exchanged, this is useful both to understand the conversation that took place and useful if you want to continue the conversation later — messages can be passed back to later `run/run_sync` calls.
 
 !!! tip "Complete `weather_agent.py` example"
     This example is incomplete for the sake of brevity; you can find a complete `weather_agent.py` example [here](examples/weather-agent.md).
diff --git a/docs/concepts/message-history.md b/docs/message-history.md
similarity index 96%
rename from docs/concepts/message-history.md
rename to docs/message-history.md
index 697b15d98f..42ea22ac8d 100644
--- a/docs/concepts/message-history.md
+++ b/docs/message-history.md
@@ -1,13 +1,7 @@
-from pydantic_ai_examples.pydantic_model import model
-
 # Messages and chat history
 
 PydanticAI provides access to messages exchanged during an agent run. These messages can be used both to continue a coherent conversation, and to understand how an agent performed.
 
-## Messages types
-
-[API documentation for `messages`][pydantic_ai.messages] contains details of the message types and their meaning.
-
 ### Accessing Messages from Results
 
 After running an agent, you can access the messages exchanged during that run from the `result` object.
@@ -269,10 +263,24 @@ print(result2.all_messages())
 """
 ```
 
-## Last Run Messages
-
-TODO: document [`last_run_messages`][pydantic_ai.Agent.last_run_messages].
-
 ## Examples
 
-For a more complete example of using messages in conversations, see the [chat app](../examples/chat-app.md) example.
+For a more complete example of using messages in conversations, see the [chat app](examples/chat-app.md) example.
+
+## API Reference
+
+::: pydantic_ai.messages
+    options:
+      members:
+        - Message
+        - SystemPrompt
+        - UserPrompt
+        - ToolReturn
+        - RetryPrompt
+        - ModelAnyResponse
+        - ModelTextResponse
+        - ModelStructuredResponse
+        - ToolCall
+        - ArgsJson
+        - ArgsObject
+        - MessagesTypeAdapter
diff --git a/docs/api/models/function.md b/docs/models/function.md
similarity index 50%
rename from docs/api/models/function.md
rename to docs/models/function.md
index f83fcf4a3f..831fc0ff11 100644
--- a/docs/api/models/function.md
+++ b/docs/models/function.md
@@ -1,3 +1,3 @@
-# `pydantic_ai.models.function`
+# FunctionModel
 
 ::: pydantic_ai.models.function
diff --git a/docs/api/models/gemini.md b/docs/models/gemini.md
similarity index 50%
rename from docs/api/models/gemini.md
rename to docs/models/gemini.md
index 5cf3315be0..e37f9af7ca 100644
--- a/docs/api/models/gemini.md
+++ b/docs/models/gemini.md
@@ -1,3 +1,3 @@
-# `pydantic_ai.models.gemini`
+# Gemini
 
 ::: pydantic_ai.models.gemini
diff --git a/docs/api/models/base.md b/docs/models/index.md
similarity index 100%
rename from docs/api/models/base.md
rename to docs/models/index.md
diff --git a/docs/api/models/openai.md b/docs/models/openai.md
similarity index 50%
rename from docs/api/models/openai.md
rename to docs/models/openai.md
index ab3cedb646..1f072e755a 100644
--- a/docs/api/models/openai.md
+++ b/docs/models/openai.md
@@ -1,3 +1,3 @@
-# `pydantic_ai.models.openai`
+# OpenAI
 
 ::: pydantic_ai.models.openai
diff --git a/docs/api/models/test.md b/docs/models/test.md
similarity index 50%
rename from docs/api/models/test.md
rename to docs/models/test.md
index 35ffc19dd7..9f3d7d09d7 100644
--- a/docs/api/models/test.md
+++ b/docs/models/test.md
@@ -1,3 +1,3 @@
-# `pydantic_ai.models.test`
+# TestModel
 
 ::: pydantic_ai.models.test
diff --git a/docs/api/result.md b/docs/results.md
similarity index 60%
rename from docs/api/result.md
rename to docs/results.md
index 83d61af813..94ef57df20 100644
--- a/docs/api/result.md
+++ b/docs/results.md
@@ -1,4 +1,20 @@
-# `pydantic_ai.result`
+## Ending runs
+
+TODO
+
+## Result Validators
+
+TODO
+
+## Streamed Results
+
+TODO
+
+## Cost
+
+TODO
+
+## API Reference
 
 ::: pydantic_ai.result
     options:
diff --git a/docs/testing-evals.md b/docs/testing-evals.md
new file mode 100644
index 0000000000..a6165abfce
--- /dev/null
+++ b/docs/testing-evals.md
@@ -0,0 +1,8 @@
+# Testing and Evals
+
+TODO
+
+principles:
+
+* unit tests are no different to any other app, just `TestModel` or `FunctionModel`, we know how to do unit tests, there's no magic just good practice
+* evals are more like benchmarks, they never "pass" although they do "fail", you care mostly about how they change over time, we (and we think most other people) don't really know what a "good" eval is, we provide some useful tools, we'll improve this if/when a common best practice emerges, or we think we have something interesting to say
diff --git a/mkdocs.yml b/mkdocs.yml
index 2b2c858805..5ae95b53c6 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -13,15 +13,18 @@ nav:
   - Introduction:
     - Introduction: index.md
     - install.md
-  - Concepts:
-    - concepts/agents.md
-    - concepts/dependencies.md
-    - concepts/retrievers.md
-    - concepts/system-prompt.md
-    - concepts/results.md
-    - concepts/message-history.md
-    - concepts/streaming.md
-    - concepts/testing-evals.md
+  - Documentation:
+    - agents.md
+    - dependencies.md
+    - results.md
+    - message-history.md
+    - testing-evals.md
+    - Models:
+      - models/index.md
+      - models/openai.md
+      - models/gemini.md
+      - models/test.md
+      - models/function.md
   - Examples:
     - examples/index.md
     - examples/pydantic-model.md
@@ -31,17 +34,6 @@ nav:
     - examples/stream-markdown.md
     - examples/stream-whales.md
     - examples/chat-app.md
-  - API Reference:
-    - api/agent.md
-    - api/result.md
-    - api/messages.md
-    - api/dependencies.md
-    - api/exceptions.md
-    - api/models/base.md
-    - api/models/openai.md
-    - api/models/gemini.md
-    - api/models/test.md
-    - api/models/function.md
 
 extra:
   # hide the "Made with Material for MkDocs" message
@@ -79,7 +71,7 @@ theme:
     - content.code.copy
     - content.code.select
     - navigation.path
-    - navigation.expand
+#    - navigation.expand
     - navigation.indexes
     - navigation.sections
     - navigation.tracking
@@ -112,6 +104,7 @@ markdown_extensions:
   - pymdownx.superfences
   - pymdownx.snippets
   - pymdownx.tilde
+  - pymdownx.inlinehilite
   - pymdownx.highlight:
       pygments_lang_class: true
   - pymdownx.extra:
@@ -147,7 +140,8 @@ plugins:
             show_signature_annotations: true
             signature_crossrefs: true
             group_by_category: false
-            heading_level: 2
+            # 3 because docs are in pages with an H2 just above them
+            heading_level: 3
           import:
             - url: https://docs.python.org/3/objects.inv
             - url: https://docs.pydantic.dev/latest/objects.inv
diff --git a/tests/test_examples.py b/tests/test_examples.py
index 972d46a6c6..0ba44c2da9 100644
--- a/tests/test_examples.py
+++ b/tests/test_examples.py
@@ -13,7 +13,14 @@
 from pytest_examples import CodeExample, EvalExample, find_examples
 from pytest_mock import MockerFixture
 
-from pydantic_ai.messages import Message, ModelAnyResponse, ModelTextResponse
+from pydantic_ai.messages import (
+    ArgsObject,
+    Message,
+    ModelAnyResponse,
+    ModelStructuredResponse,
+    ModelTextResponse,
+    ToolCall,
+)
 from pydantic_ai.models import KnownModelName, Model
 from pydantic_ai.models.function import AgentInfo, DeltaToolCalls, FunctionModel
 from tests.conftest import ClientWithHandler
@@ -71,16 +78,32 @@ async def async_http_request(url: str, **kwargs: Any) -> httpx.Response:
     return http_request(url, **kwargs)
 
 
+text_responses = {
+    'What is the weather like in West London and in Wiltshire?': 'The weather in West London is raining, while in Wiltshire it is sunny.',
+    'Tell me a joke.': 'Did you hear about the toothpaste scandal? They called it Colgate.',
+    'Explain?': 'This is an excellent joke invent by Samuel Colvin, it needs no explanation.',
+    'What is the capital of France?': 'Paris',
+    'What is the capital of Italy?': 'Rome',
+    'What is the capital of the UK?': 'London',
+    'Who was Albert Einstein?': 'Albert Einstein was a German-born theoretical physicist.',
+    'What was his most famous equation?': "Albert Einstein's most famous equation is (E = mc^2).",
+    'What is the date?': 'Hello Frank, the date today is 2032-01-02.',
+}
+
+
 async def model_logic(messages: list[Message], info: AgentInfo) -> ModelAnyResponse:
     m = messages[-1]
-    if m.role == 'user' and m.content == 'What is the weather like in West London and in Wiltshire?':
-        return ModelTextResponse(content='The weather in West London is raining, while in Wiltshire it is sunny.')
-    if m.role == 'user' and m.content == 'Tell me a joke.':
-        return ModelTextResponse(content='Did you hear about the toothpaste scandal? They called it Colgate.')
-    if m.role == 'user' and m.content == 'Explain?':
-        return ModelTextResponse(content='This is an excellent joke invent by Samuel Colvin, it needs no explanation.')
-    if m.role == 'user' and m.content == 'What is the capital of France?':
-        return ModelTextResponse(content='Paris')
+    if m.role == 'user':
+        if text_response := text_responses.get(m.content):
+            return ModelTextResponse(content=text_response)
+
+    if m.role == 'user' and m.content == 'Put my money on square eighteen':
+        return ModelStructuredResponse(calls=[ToolCall(tool_name='roulette_wheel', args=ArgsObject({'square': 18}))])
+    elif m.role == 'user' and m.content == 'I bet five is the winner':
+        return ModelStructuredResponse(calls=[ToolCall(tool_name='roulette_wheel', args=ArgsObject({'square': 5}))])
+    elif m.role == 'tool-return' and m.tool_name == 'roulette_wheel':
+        win = m.content == 'winner'
+        return ModelStructuredResponse(calls=[ToolCall(tool_name='final_result', args=ArgsObject({'response': win}))])
     else:
         sys.stdout.write(str(debug.format(messages, info)))
         raise RuntimeError(f'Unexpected message: {m}')
@@ -88,15 +111,17 @@ async def model_logic(messages: list[Message], info: AgentInfo) -> ModelAnyRespo
 
 async def stream_model_logic(messages: list[Message], info: AgentInfo) -> AsyncIterator[str | DeltaToolCalls]:
     m = messages[-1]
-    if m.role == 'user' and m.content == 'Tell me a joke.':
-        *words, last_word = 'Did you hear about the toothpaste scandal? They called it Colgate.'.split(' ')
-        for work in words:
-            yield f'{work} '
-            await asyncio.sleep(0.05)
-        yield last_word
-    else:
-        sys.stdout.write(str(debug.format(messages, info)))
-        raise RuntimeError(f'Unexpected message: {m}')
+    if m.role == 'user':
+        if text_response := text_responses.get(m.content):
+            *words, last_word = text_response.split(' ')
+            for work in words:
+                yield f'{work} '
+                await asyncio.sleep(0.05)
+            yield last_word
+            return
+
+    sys.stdout.write(str(debug.format(messages, info)))
+    raise RuntimeError(f'Unexpected message: {m}')
 
 
 def mock_infer_model(_model: Model | KnownModelName) -> Model: