How to test agents? integration tests or evals?

### Question

Hi. I am trying to test my agents without doing so manually, but I still didn't understand how to do it well.

I need to run live tests, hitting the LLM API.

Here are some cases I need to test:
- Test with different inputs if a tool is called as expected. Maybe I need to mock a tool
- Test the impacts of changes in instructions with complete workflows, for different inputs (many tool calls, testing a whole conversation)
- Given some message histories that caused unexpected model behavior exceptions, I would like to write tests for changes in the tools and prompts in order to see if the issue was fixed
- write tests to validade the agent against new workflows, observing the message history exchanged (could be through logfire)
- Evaluate the impact of switching models, if the agent continue to perform as expected

Could someone help me figure out how to perform these tests?

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to test agents? integration tests or evals? #2981

Question

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to test agents? integration tests or evals? #2981

Description

Question

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions