fix(dashboards): Reduce dashboard generation validation loop failures#116501
fix(dashboards): Reduce dashboard generation validation loop failures#116501shruthilayaj wants to merge 3 commits into
Conversation
Four targeted fixes for the slow_or_repetitive failure mode that accounts for ~40% of unsatisfied dashboard generation runs: - Add per-display-type limit ceiling validator on GeneratedWidget so a line chart with limit>10 or a table with limit>20 is caught by Pydantic before reaching the DRF serializer - Enforce at least one query on non-text widgets at the Pydantic layer - Interpolate Dashboard.MAX_WIDGETS into the prompt so the model knows the widget cap when planning - Reformat DRF serializer errors from raw dict repr into plain-language lines (Widget N, query M, field: message) before injecting them into the retry prompt, making them easier for the model to act on Co-Authored-By: Claude Sonnet 4 <noreply@example.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f7d7378. Configure here.
| ) | ||
| lines.append( | ||
| f"Widget {widget_idx}, query {query_idx}, {query_field}: {msg}" | ||
| ) |
There was a problem hiding this comment.
String errors in "queries" list silently dropped
Low Severity
_format_serializer_errors silently drops string-typed items in the "queries" error list. When widget_field == "queries" and isinstance(widget_field_errors, list) is true, the inner loop skips any item where isinstance(query_errors, dict) is false. DRF can return string errors for the "queries" key (e.g., {"queries": ["Text widgets don't have queries"]} when a text widget has queries — which Pydantic doesn't prevent). In multi-widget scenarios with other formatted errors, lines is non-empty so the str(errors) fallback doesn't trigger, and the string-based queries error is permanently lost from the retry prompt.
Reviewed by Cursor Bugbot for commit f7d7378. Configure here.
DominikB2014
left a comment
There was a problem hiding this comment.
awesome, looks like this should improve things
| if not is_text and not queries: | ||
| raise ValueError("Non-text widgets must have at least one query") | ||
|
|
||
| return values | ||
|
|
||
| @root_validator | ||
| def check_limit_by_display_type(cls, values: dict[str, Any]) -> dict[str, Any]: | ||
| display_type = values.get("display_type") | ||
| limit = values.get("limit") | ||
| if display_type is None or limit is None or display_type == "text": | ||
| return values | ||
| max_limit = _LIMIT_MAX_BY_DISPLAY_TYPE.get(display_type, _DEFAULT_LIMIT_MAX) | ||
| if limit > max_limit: | ||
| raise ValueError( | ||
| f"limit={limit} exceeds the maximum of {max_limit} for display_type '{display_type}'" | ||
| ) |
There was a problem hiding this comment.
🤔 I wonder if we could just share this logic with the serializer instead? i wouldnt block the pr on this, sounds like it would warrant a bigger refactor, but curious on your thoughts for that
| # Per-display-type limit | ||
| # in src/sentry/api/serializers/rest_framework/dashboard.py. | ||
| _LIMIT_MAX_BY_DISPLAY_TYPE: dict[str, int] = { | ||
| "categorical_bar": 25, | ||
| "table": 20, | ||
| } | ||
| _DEFAULT_LIMIT_MAX = 10 |
There was a problem hiding this comment.
Could we have a shared mapping for this to use in both the DRF and pydantic?
| MAX_VALIDATION_RETRIES = 3 | ||
|
|
||
|
|
||
| def _format_serializer_errors(errors: dict[str, Any]) -> str: |
There was a problem hiding this comment.
I don't have strong opinons on the formatter here, i'm open to keeping it to see if it improves anything, are we able to track this by knowing the number of iterations it took to generate a valid dashboard?


Dashboard generation satisfaction is not doing so great: https://redash.getsentry.net/queries/11370/source#15017 (it's our biggest unsatisfied bucket)
A few fixes for the common failure modes: