-
Notifications
You must be signed in to change notification settings - Fork 5
Description
When using the Cerebras inference service, when a rate limit is hit, an error is returned. The ATLAS UI just sits there with no error reported to the user. The error is logged in the ATLAS app logs. It would be helpful to let the user know via the web UI that their request failed and they should try again later.
Here's an example of what the rate throttling error looks like from the ATLAS app logs:
Failed to call LLM with tools: litellm.RateLimitError: RateLimitError: CerebrasException - We're experiencing high traffic right now! Please try again soon.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/app/backend/application/chat/service.py", line 250, in handle_chat_message\n return await orchestrator.execute(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/app/backend/application/chat/orchestrator.py", line 186, in execute\n return await self.tools_mode.run(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/app/backend/application/chat/modes/tools.py", line 89, in run\n llm_response = await error_utils.safe_call_llm_with_tools(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "/app/backend/application/chat/utilities/error_utils.py", line 92, in safe_call_llm_with_tools\n raise ValidationError(f"Failed to call LLM with tools: {str(e)}")\ndomain.errors.ValidationError: Failed to call LLM with tools: Failed to call LLM with tools: litellm.RateLimitError: RateLimitError: CerebrasException - We're experiencing high traffic right now! Please tryagain soon.", "extra_taskName": "Task-225", "extra_otelSpanID": "0", "extra_otelTraceID": "0", "extra_otelTraceSampled": false, "extra_otelServiceName": "atlas-ui-3-backend”}