-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
feat(assisted-query): include built-in fields in span attrs rpc and support logs and metrics #103874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…upport logs and metrics
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #103874 +/- ##
===========================================
- Coverage 80.58% 80.42% -0.16%
===========================================
Files 9292 9299 +7
Lines 396718 398110 +1392
Branches 25286 25286
===========================================
+ Hits 319681 320190 +509
- Misses 76577 77460 +883
Partials 460 460 |
| "attributeType": attr_type, | ||
| "itemType": item_type, | ||
| "statsPeriod": stats_period, | ||
| "project": project_ids, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for errors and generally all rpcs I'm treating empty project_ids as all projects, should we do this here?
project_ids = project_ids or [ALL_ACCESS_PROJECTS]
| params=query_params, | ||
| ) | ||
|
|
||
| fields[attr_type] = [item["name"] for item in resp.data if "name" in item] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fields[attr_type] = [item["name"] for item in resp.data if "name" in item] | |
| fields[attr_type] = [item["name"] for item in resp.data] |
so this doesn't fail silently
|
|
||
| query_params = { | ||
| "itemType": item_type, | ||
| "attributeType": "string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there can be attrs w other types right? should we pass in the field's type, maybe return it in the _names rpc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the old RPC this is replacing only supported strings, which makes sense because this is about searching values by substring
didn't want to make any breaking changes in this PR either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, if we reuse this for metrics/logs (I think we should!) can do a refactor then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's already re-usable by changing itemType, that's the intention, and the same thing applies where we can only do substring search on string fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as more context, it only makes sense to search for values of string types as booleans are also stored as strings and numerical search (int/float/datetimes) I believe aren't supported because of insane cardinality which is why even in sentry search it defaults to a few suggested values.
| values.setdefault(field, set()).update(field_values_list[:limit]) | ||
|
|
||
| # Convert sets to sorted lists for JSON serialization | ||
| return {"values": {field: sorted(field_values) for field, field_values in values.items()}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't seem like field_values is converted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorted converts any iterable into a list
| return {"consent": False, "consent_url": consent_url} | ||
|
|
||
|
|
||
| def get_attribute_names(*, org_id: int, project_ids: list[int], stats_period: str) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think we'd need to clean these up in a follow up, to avoid breaking current query agent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR (at least what I intended, double check if me if wrong) just moves those functions into a new file and keeps the same input and output signatures, and I just change the internal logic, so nothing should break
| } | ||
| """ | ||
| organization = Organization.objects.get(id=org_id) | ||
| api_key = ApiKey(organization_id=org_id, scope_list=API_KEY_SCOPES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: client.get() uses user=None, leading to AnonymousUser() and permission failures for organization access.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
The client.get() method is invoked with user=None, which causes src/sentry/api/client.py to default to AnonymousUser(). This AnonymousUser() lacks organization membership, leading to permission checks failing within OrganizationEventsV2EndpointBase (specifically in get_snuba_params()) when the endpoint /organizations/{organization.slug}/trace-items/attributes/ is called. Consequently, the API calls will consistently fail with permission errors, rendering the feature non-functional.
💡 Suggested Fix
Provide an authenticated user with organization permissions to the user parameter of client.get().
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/seer/assisted_query/traces_tools.py#L39
Potential issue: The `client.get()` method is invoked with `user=None`, which causes
`src/sentry/api/client.py` to default to `AnonymousUser()`. This `AnonymousUser()` lacks
organization membership, leading to permission checks failing within
`OrganizationEventsV2EndpointBase` (specifically in `get_snuba_params()`) when the
endpoint `/organizations/{organization.slug}/trace-items/attributes/` is called.
Consequently, the API calls will consistently fail with permission errors, rendering the
feature non-functional.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 3254778
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what we do everywhere with client.get
| # Extract "value" from each item, filter out None/empty, and respect limit | ||
| field_values_list = [item["value"] for item in resp.data if item.get("value")] | ||
| # Merge with existing values if field already exists (multiple substrings for same field) | ||
| values.setdefault(field, set()).update(field_values_list[:limit]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Redundant limiting causes inefficient memory usage
The function applies [:limit] twice: once when adding values to the set (line 127) and again when returning the sorted result (line 131). When the same field is queried with multiple substrings, the set accumulates up to limit values from each query, potentially growing to several times the limit before being trimmed again. This wastes memory and processing by storing and sorting unnecessary values. The slicing on line 127 should be removed to collect all API responses, then apply the limit once on the final sorted output.
Additional Locations (1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multi limit lets us balance the results between the two fields somewhat which i kinda like
Fixes bug in the spans attributes RPC where built-in fields like span.op and span.description were not included sometimes. This change simplifies the code for us and matches the attributes available on the frontend exactly.
This also serves as an RPC for logs and metrics attributes, just by passing in a different
itemType.Closes AIML-1635, AIML-1670, and AIML-1672