Difference between tests run from the CLI and UI using workflows

### Description 

When using workflows, there is a significant difference between tests run from the UI level and those run from the CLI using the `elevenlabs agents test <agent-id>` command.


### Reproduction steps

1. Set up test with transitions between nodes in `chat_history`. For example, by creating one from an existing conversation.
2. Assign this test to the appropriate node in the agent workflow.
3. Run tests with the following command:

```
elevenlabs agents test <agent-id>
```

4. Similarly, run this test from the UI.


### Actual behavior

Currently, when we execute tests from the UI, they usually all pass, but when we execute them from the CLI, half of them suddenly fail. 

<img width="469" height="79" alt="Image" src="https://github.com/user-attachments/assets/870a1ff4-69ca-454f-8587-a407a72299ff" />

This is likely due to their assignment to the appropriate nodes in the workflow. We can clearly see that this is because the tests that fail have incorrect `agent_transfers`, as if they didn't take into account `chat_history` or the node for which they were executed.

As we can see in the example below, there's information about the agent being transferred to the `Verification` node, which is before the `Circumstances` node. In this workflow, we don't even go back to that node.


<img width="706" height="779" alt="Image" src="https://github.com/user-attachments/assets/813937b5-3435-455c-b3e7-e1fcabb6a279" />

### Expected behavior

There should be no difference between running these tests from the CLI or UI. 
Of course, it is possible that LLM will behave differently because it is a non-derministic model, but the differences should be negligible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between tests run from the CLI and UI using workflows #76

Description

Reproduction steps

Actual behavior

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Difference between tests run from the CLI and UI using workflows #76

Description

Description

Reproduction steps

Actual behavior

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions