Skip to content

fix(dashboards): Reduce dashboard generation validation loop failures#116501

Open
shruthilayaj wants to merge 3 commits into
masterfrom
fix/dashboard-generation-validation-loop
Open

fix(dashboards): Reduce dashboard generation validation loop failures#116501
shruthilayaj wants to merge 3 commits into
masterfrom
fix/dashboard-generation-validation-loop

Conversation

@shruthilayaj
Copy link
Copy Markdown
Member

@shruthilayaj shruthilayaj commented May 29, 2026

Dashboard generation satisfaction is not doing so great: https://redash.getsentry.net/queries/11370/source#15017 (it's our biggest unsatisfied bucket)

A few fixes for the common failure modes:

  • Add per-display-type limit ceiling validation so it is caught by Pydantic before reaching the DRF serializer
  • Enforce at least one query on non-text widgets at the Pydantic layer
  • Include Dashboard.MAX_WIDGETS in prompt so the model knows the widget cap when planning
  • Reformat DRF serializer errors from raw dict repr into plain-language lines (Widget N, query M, field: message) before injecting them into the retry prompt, making them easier for the model to act on

Four targeted fixes for the slow_or_repetitive failure mode that accounts
for ~40% of unsatisfied dashboard generation runs:

- Add per-display-type limit ceiling validator on GeneratedWidget so a line
  chart with limit>10 or a table with limit>20 is caught by Pydantic before
  reaching the DRF serializer
- Enforce at least one query on non-text widgets at the Pydantic layer
- Interpolate Dashboard.MAX_WIDGETS into the prompt so the model knows the
  widget cap when planning
- Reformat DRF serializer errors from raw dict repr into plain-language lines
  (Widget N, query M, field: message) before injecting them into the retry
  prompt, making them easier for the model to act on

Co-Authored-By: Claude Sonnet 4 <noreply@example.com>
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 29, 2026
@shruthilayaj shruthilayaj marked this pull request as ready for review May 29, 2026 17:53
@shruthilayaj shruthilayaj requested a review from a team as a code owner May 29, 2026 17:53
@shruthilayaj shruthilayaj requested a review from DominikB2014 May 29, 2026 17:57
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f7d7378. Configure here.

)
lines.append(
f"Widget {widget_idx}, query {query_idx}, {query_field}: {msg}"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String errors in "queries" list silently dropped

Low Severity

_format_serializer_errors silently drops string-typed items in the "queries" error list. When widget_field == "queries" and isinstance(widget_field_errors, list) is true, the inner loop skips any item where isinstance(query_errors, dict) is false. DRF can return string errors for the "queries" key (e.g., {"queries": ["Text widgets don't have queries"]} when a text widget has queries — which Pydantic doesn't prevent). In multi-widget scenarios with other formatted errors, lines is non-empty so the str(errors) fallback doesn't trigger, and the string-based queries error is permanently lost from the retry prompt.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f7d7378. Configure here.

Copy link
Copy Markdown
Contributor

@DominikB2014 DominikB2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, looks like this should improve things

Comment on lines +239 to +254
if not is_text and not queries:
raise ValueError("Non-text widgets must have at least one query")

return values

@root_validator
def check_limit_by_display_type(cls, values: dict[str, Any]) -> dict[str, Any]:
display_type = values.get("display_type")
limit = values.get("limit")
if display_type is None or limit is None or display_type == "text":
return values
max_limit = _LIMIT_MAX_BY_DISPLAY_TYPE.get(display_type, _DEFAULT_LIMIT_MAX)
if limit > max_limit:
raise ValueError(
f"limit={limit} exceeds the maximum of {max_limit} for display_type '{display_type}'"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I wonder if we could just share this logic with the serializer instead? i wouldnt block the pr on this, sounds like it would warrant a bigger refactor, but curious on your thoughts for that

Comment on lines +43 to +49
# Per-display-type limit
# in src/sentry/api/serializers/rest_framework/dashboard.py.
_LIMIT_MAX_BY_DISPLAY_TYPE: dict[str, int] = {
"categorical_bar": 25,
"table": 20,
}
_DEFAULT_LIMIT_MAX = 10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a shared mapping for this to use in both the DRF and pydantic?

MAX_VALIDATION_RETRIES = 3


def _format_serializer_errors(errors: dict[str, Any]) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinons on the formatter here, i'm open to keeping it to see if it improves anything, are we able to track this by knowing the number of iterations it took to generate a valid dashboard?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants