Skip to content

Failures due to rate throttling are not reported to the user #111

@ktpedre

Description

@ktpedre

When using the Cerebras inference service, when a rate limit is hit, an error is returned. The ATLAS UI just sits there with no error reported to the user. The error is logged in the ATLAS app logs. It would be helpful to let the user know via the web UI that their request failed and they should try again later.

Here's an example of what the rate throttling error looks like from the ATLAS app logs:

Failed to call LLM with tools: litellm.RateLimitError: RateLimitError: CerebrasException - We're experiencing high traffic right now! Please try again soon.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/app/backend/application/chat/service.py", line 250, in handle_chat_message\n return await orchestrator.execute(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/app/backend/application/chat/orchestrator.py", line 186, in execute\n return await self.tools_mode.run(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/app/backend/application/chat/modes/tools.py", line 89, in run\n llm_response = await error_utils.safe_call_llm_with_tools(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/app/backend/application/chat/utilities/error_utils.py", line 92, in safe_call_llm_with_tools\n raise ValidationError(f"Failed to call LLM with tools: {str(e)}")\ndomain.errors.ValidationError: Failed to call LLM with tools: Failed to call LLM with tools: litellm.RateLimitError: RateLimitError: CerebrasException - We're experiencing high traffic right now! Please tryagain soon.", "extra_taskName": "Task-225", "extra_otelSpanID": "0", "extra_otelTraceID": "0", "extra_otelTraceSampled": false, "extra_otelServiceName": "atlas-ui-3-backend”}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions