Description
When using workflows, there is a significant difference between tests run from the UI level and those run from the CLI using the elevenlabs agents test <agent-id> command.
Reproduction steps
- Set up test with transitions between nodes in
chat_history. For example, by creating one from an existing conversation.
- Assign this test to the appropriate node in the agent workflow.
- Run tests with the following command:
elevenlabs agents test <agent-id>
- Similarly, run this test from the UI.
Actual behavior
Currently, when we execute tests from the UI, they usually all pass, but when we execute them from the CLI, half of them suddenly fail.
This is likely due to their assignment to the appropriate nodes in the workflow. We can clearly see that this is because the tests that fail have incorrect agent_transfers, as if they didn't take into account chat_history or the node for which they were executed.
As we can see in the example below, there's information about the agent being transferred to the Verification node, which is before the Circumstances node. In this workflow, we don't even go back to that node.
Expected behavior
There should be no difference between running these tests from the CLI or UI.
Of course, it is possible that LLM will behave differently because it is a non-derministic model, but the differences should be negligible.
Description
When using workflows, there is a significant difference between tests run from the UI level and those run from the CLI using the
elevenlabs agents test <agent-id>command.Reproduction steps
chat_history. For example, by creating one from an existing conversation.Actual behavior
Currently, when we execute tests from the UI, they usually all pass, but when we execute them from the CLI, half of them suddenly fail.
This is likely due to their assignment to the appropriate nodes in the workflow. We can clearly see that this is because the tests that fail have incorrect
agent_transfers, as if they didn't take into accountchat_historyor the node for which they were executed.As we can see in the example below, there's information about the agent being transferred to the
Verificationnode, which is before theCircumstancesnode. In this workflow, we don't even go back to that node.Expected behavior
There should be no difference between running these tests from the CLI or UI.
Of course, it is possible that LLM will behave differently because it is a non-derministic model, but the differences should be negligible.