feat(issues): Add GroupActionLog instrumentation for issue mutations#116347
feat(issues): Add GroupActionLog instrumentation for issue mutations#116347yuvmen wants to merge 12 commits into
Conversation
Adds `publish_action` calls alongside existing Activity creation sites to log issue mutations with source attribution (web, API, MCP, Seer). Uses a contextvar (`ActionContext`) set at the API boundary in `update_groups()` and read at lower-level mutation sites, so functions like `GroupAssignee.assign()` and `update_priority()` can log actions without signature changes. Instrumented actions: resolve, unresolve, archive, assign, unassign, set_priority, merge, mark_reviewed, view, trigger_autofix, comment (create/edit/delete). Currently a no-op structured logger — will be wired to the real GroupActionLogEntry model once that lands.
| {"detail": "Cannot manually set priority of one or more issues."}, | ||
| status=HTTPStatus.BAD_REQUEST, | ||
| ) | ||
| source = resolve_action_source(request) |
There was a problem hiding this comment.
Unauthenticated X-Seer-Referrer header allows action-source spoofing in audit log
Any authenticated API client can set the X-Seer-Referrer HTTP header to make their issue mutations appear in the action log as originating from Seer (seer:explorer or seer:slack), undermining the audit trail this feature is designed to create.
Evidence
resolve_action_source()inaction_log.py(line 81) readsrequest.META.get(SEER_REFERRER_HEADER, "")— equivalent to theX-Seer-ReferrerHTTP header — with no prior authentication check.- If the header is present, the function immediately returns
SEER_SLACKorSEER_EXPLORERbefore ever reaching theSeerRpcSignatureAuthenticationcheck (line ~90) or any other identity verification. - The MCP path, by contrast, is gated by
application_id == mcp_app_idbefore the client-name header is trusted. source = resolve_action_source(request)(hunk line ~215) feeds this spoofable value intoaction_context_scope, which propagates it into everypublish_action/publish_action_from_contextcall downstream.
Also found at 3 additional locations
src/sentry/issues/endpoints/group_details.py:296src/sentry/issues/action_log.py:83-87src/sentry/issues/priority.py:76
Identified by Warden security-review · 5PV-YZH
There was a problem hiding this comment.
Same as the previuos closed PR -
yea this is true I considered it, will defer this for the future but will for sure need to consider if we want to close this ability to spoof SEER source.
it is currently just for reporting, limited in the "damage" a malicious actor can do
The `from sentry.api import client` resolves to the module, not the ApiClient instance, causing mypy attr-defined errors on `.get()` calls.
| from typing import Any | ||
|
|
||
| from sentry.api import client | ||
| from sentry.api.client import ApiClient |
There was a problem hiding this comment.
was getting typing errors on all these tool files, have no idea why when main has them go through fine. Eventually gave up and fixed the typing issue here, seems like import was too ambiguous
…rom unresolve - Replace module-level global MCP_APPLICATION_ID with django.core.cache (5min TTL), so a missing app record on first lookup doesn't permanently poison the cache for the worker lifetime - Add ActionType.UNARCHIVE to distinguish un-ignoring (IGNORED->UNRESOLVED) from un-resolving (RESOLVED->UNRESOLVED), matching the existing signal distinction (issue_unignored vs issue_unresolved)
…plication_id - Remove UNARCHIVE — "unresolve" is the destination state regardless of whether the issue was previously resolved or archived - Skip _get_mcp_application_id() cache/DB call when request has no application_id (the common case for non-token-auth requests) - Use ActionSource.MCP constant in f-string for MCP client slug
…ontext When publish_action_from_context is called without an ActionContext, log an error (creates a Sentry issue with stack trace) and record the action with source="unknown" instead of silently dropping it.
…revert UNARCHIVE - Wrap system-initiated mutations (auto_update_priority, ingest assign, ownership auto-assign, release-based assign) with action_context_scope(source=SYSTEM) - Wrap integration sync assign/deassign with action_context_scope(source=integration.provider) - Revert UNARCHIVE action type — UNRESOLVE covers both transitions since "unresolved" is a real state, "unarchived" is not - Keep logger.error for missing context so unexpected callers surface as Sentry issues
kcons
left a comment
There was a problem hiding this comment.
I don't have many useful things to say about the logic for determining Source or the granularity of it, as ultimately I think those things are very tied to what actually works combined with what we need, and I'm not on firm footing in either case.
I'm also not sure how strictly to consider this PR; my sense is that while it is a big footprint and the start of some critical stuff, we're mostly considering it placeholder for now. If the intend is to land it (which i hope it is; starting to actually log stuff and consider the practicalities of collecting the relevant data in context is fairly urgent), I do think we need to go a bit harder on docs and interface, though that may mean being explicitly non-committal more than being very careful.
In principle, I think "a short identifying string with some structure, not large in cardinality, carefully controlled in generation but tolerant in consumption (when we start consuming)" is the right model here for source, and I do think we want source.
So, with that and my interest in getting preliminary stuff with plausible interfaces in play, I conceptually approve.
| ) -> None: | ||
| ctx = get_action_context() | ||
| if ctx is None: | ||
| logger.error( |
There was a problem hiding this comment.
I think we'd want to check in_test_environment() and fail hard if true here.
There was a problem hiding this comment.
This actually becomes pretty disruptive in a lot of our tests, I think I will not do this at least for now, perhaps figure out a better way down the line to detect these in CI
| def publish_action( | ||
| *, | ||
| action: ActionType, | ||
| source: str, |
There was a problem hiding this comment.
Seems like this should be ActionSource.
I understand why it isn't; we want to be able to qualify things with prefixes to show it is a specific variant of a category of source.. but I don't think we want that to mean we accept a string.
I think it'd be nice to limit the set of legal ActionSources to those that we believe are valid, and know that by virtue of it being able to be constructed an ActionSource is valid; potentially be able to parse str to an ActionSource and know it is then valid. From the API perspective, this makes things simpler and harder to break. From a data management perspective, it means if it is published we think it is valid. The only real burden here is having an ActionSource representation that knows the rules. It could be a wrapped str with a regex, but I'm assuming it'd be some sort of SourceKind(StrEnum) akin to current ActionSource, along with an optional addition str with values restricted by SourceKind. I assume it'd have a __str__ method that sorta does ":".join[self.kind, self.variant] if self.variant else self.kind.
Maybe it's overkill, but interface-wise, easy, and implementation presumably one-shottable by an agent.
There was a problem hiding this comment.
yea I agree this should actually be ActionSource, and with your analysis here of the possible second type that includes variant. I think right now we cna go with this.
I think another possibility to what you suggested is we can keep it simpler and have the source variant be a part of the metadata. Like ActionSource has concrete categories that are well defined and low cardinality, if someone wants to add a more loose optional variant on top it can be in the data field and higher cardinality.
| PAGERDUTY = "pagerduty" | ||
| OPSGENIE = "opsgenie" | ||
| PERFORCE = "perforce" | ||
| UNKNOWN = "unknown" |
There was a problem hiding this comment.
Do you think there's value in distinguishing between 'UNKNOWN' and 'Something bad happened and we were unable to execute the appropriate logic'? Or is 'UNKNOWN' better understood as 'ERROR'?
There was a problem hiding this comment.
Not sure if you think we shouldn't save the action log in when we cant determine source? I mean UNKNOWN is an error state in a way for sure, like we have logic to determine source, if our rules for source fail I dont want to fail the whole flow and prevent the log from getting created, so saving it with UNKNOWN I think is a good representation of this case, would be odd to save with source=ERROR no? though not sure if thats what youre suggesting
|
|
||
|
|
||
| def resolve_action_source(request: Request | None) -> str: | ||
| if request is None: |
There was a problem hiding this comment.
This fallback seems surprising. Are contexts where you might have a request common?
There was a problem hiding this comment.
yea this actually doesnt make much sense when I think about it, had in mind like SYSTEM cases will still call this with None, but as you say we are never unsure if we have request or not, will remove this
|
|
||
|
|
||
| @contextmanager | ||
| def action_context_scope(source: str, actor_id: int | None = None) -> Generator[None]: |
There was a problem hiding this comment.
With actor_id == None meaning SYSTEM, it seems safe to not have a default here. I'd read only setting source to mean only overriding source, not actor, if I saw it in code.
| ) | ||
|
|
||
|
|
||
| def publish_action_from_context( |
There was a problem hiding this comment.
I'm very interested in the context bits here.
It seems like a really helpful way for us to get coverage without having to thread through lots of extra data (with user it was maybe reasonable to expect we'd have it, but source is a different beast). I could imagine that we'd set up a context by default in most endpoints, and probably have a convention for threading it through task arguments so under normal circumstances a context could be assumed.
However, we don't get proof of availability, and refactors will break context and lead to errors we have to hope will be triaged.
It is possible to use static analysis to find cases of context use that don't have a provable source, but I don't know how hard that'd be to set up.
I think in general "_from_context" existing suggests that it's what most people should use outside of some narrow and shallow reporting cases; we should probably specify this if that's the case. Not necessarily in this change, but certainly in a v0 of what we expect anyone to use.
There was a problem hiding this comment.
yea agreed, this is the part that I was the most conflicted about. It does indeed come with this tradeoff you describe, but I think it is probably worth for the amount of clutter it saves. I am open to considering more sophisticated ways to make sure context exists where it needs to, this was my attempt to not get too fancy right away.
I wonder if we can even instruct agents to help with this, like an instruction in some AGENTS.md about "if publish_action_from_context is called, make sure context is populated earlier`, wont be airtight but might reduce times people break it. I say this becase I am not sure how viable static analysis would be in detecting something like, but agentic would probably be able to.
Right now breaking context would cause Sentry issues with the log.error there, and with the addition of throwing in tests you suggested it might help prevent breaking.
regarding from_context I think for sure this is a convention we want to push and yes we should it should be the preferable way outside of narrow cases like you said
There was a problem hiding this comment.
yea agreed, this is the part that I was the most conflicted about. It does indeed come with this tradeoff you describe, but I think it is probably worth for the amount of clutter it saves. I am open to considering more sophisticated ways to make sure context exists where it needs to, this was my attempt to not get too fancy right away.
I wonder if we can even instruct agents to help with this, like an instruction in some AGENTS.md about "if publish_action_from_context is called, make sure context is populated earlier`, wont be airtight but might reduce times people break it. I say this becase I am not sure how viable static analysis would be in detecting something like, but agentic would probably be able to.
Right now breaking context would cause Sentry issues with the log.error there, and with the addition of throwing in tests you suggested it might help prevent breaking.
regarding from_context I think for sure this is a convention we want to push and yes we should it should be the preferable way outside of narrow cases like you said
|
|
||
|
|
||
| @dataclass(frozen=True) | ||
| class ActionContext: |
There was a problem hiding this comment.
This strikes me as pretty central, conceptually and practically.
My pr has GroupActionActor because that's the core attribution information I was dealing with, and once determined it's best treated as one thing, but I think that needs to be extended with source, since in any given context they are really grouped.
There was a problem hiding this comment.
yea agreed as I said above this Context is the main thing here and we need to align on it. Regarding Actor, I see it as - Actor is mostly a "who" and ActionContext is about both who and how, so yea I think it should be extended to this, I dont mind how exactly so whatever works for you
| idempotency_key: str | None = None, | ||
| ) -> None: | ||
| actor_type = ActorType.USER if actor_id is not None else ActorType.SYSTEM | ||
| logger.info( |
There was a problem hiding this comment.
might also be fun to have a counter tagged by action and source, just for now.
- Make actor_id required in action_context_scope (no silent default) - Raise RuntimeError in test environment when context is missing - Remove dead request=None branch from resolve_action_source - Add metrics.incr counter tagged by action and source
The RuntimeError in test environment broke every existing test that calls instrumented functions without action context. Keep logger.error as the safety net instead — it creates Sentry issues for triage without coupling unrelated tests to the action log system.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b5e03e2. Configure here.
| if seer_referrer: | ||
| if "slack" in seer_referrer.lower(): | ||
| return ActionSource.SEER_SLACK | ||
| return ActionSource.SEER_EXPLORER |
There was a problem hiding this comment.
Spoofable header bypasses authentication-based source detection
Low Severity
The SEER_REFERRER_HEADER check on any arbitrary request header value takes priority over the cryptographic SeerRpcSignatureAuthentication check below it. Any API client can set X-Seer-Referrer to get misattributed as seer:explorer or seer:slack, undermining source attribution accuracy. The authentication-based check (lines 96-101) is unreachable when the header is present, making it dead code in that scenario. The header check and authenticator check have inverted priority — the stronger signal (authenticated identity) is shadowed by the weaker one (unauthenticated header).
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b5e03e2. Configure here.
There was a problem hiding this comment.
as noted above when warden flagged it, right now I dont have a good way of preventing this spoofing, the order wont make much difference. I have noted this for the future, gonna defer it for now as the impact is just bad source labeling, no permission implications
|
@kcons thanks for all the comments - I think this work for sure is a bit exploratory but I do intend on actually landing this PR, acknowledging much of this is still subject to change. I fully agree with:
regarding this
I also agree but I am not sure how to address it at this point, do you think like add some more comments / context in code or detail these decisions in our docs better? I made corrections and responded to your comments, I do want to get this PR in shape and land it, there are some instrumentation assumptions it will help me prove and I think setting this in place will move us forward, even if its all subject to change. |
Regardling landability, I think we just want someone seeing a use of this module to be able to answer "what is this doing? is this important? should I be using it?" fairly quickly (ie basic doc coverage on the public interface, perhaps with a standard one-line disclaimer about it being experimental or whatever). If someone is refactoring involved code, they should be able to do it correctly. Also, a module level doc given the basic context and status so someone wondering what action_log is has a reasonable answer ("what is this module? should I be using it?" probably core questions to answer). I hope to get back to my table PR soon (almost done with replies) and once that happens, I'll have some fun merging this in. |


Summary
Instruments issue-facing API endpoints to emit structured log entries for the Issue Action Log, with full source attribution (web, API, MCP client, Seer, sentry-cli).
action_log.pymodule withresolve_action_source(),publish_action(), and aContextVar-basedActionContextthat's set at the API boundary and read at lower-level mutation sitespublish_actioncalls next to existingActivity.objects.create()calls, piggybacking on their existing change detection rather than reimplementing itGroupActionLogEntryonce that model landsInstrumented actions
Resolve, unresolve, archive, assign, unassign, set priority, merge, mark reviewed, view, trigger autofix, comment (create/edit/delete)
Edit: Ended up adding instrumentation for some system tasks like
auto_update_priorityand integrations, as I wanted to be able to track missing instrumentation but without those it would get pretty noisy, and it wasn't a big thing to add here.