feat(issues): Add GroupActionLogEntry#115771
Conversation
|
@shashjar fyi |
|
This PR has a migration; here is the generated SQL for for --
-- Create model IssueActionLogEntry
--
CREATE TABLE "sentry_issueactionlogentry" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "group_id" bigint NOT NULL, "project_id" bigint NOT NULL, "original_group_id" bigint NULL, "type" integer NOT NULL CHECK ("type" >= 0), "actor_type" integer NOT NULL CHECK ("actor_type" >= 0), "actor_id" bigint NOT NULL, "data" jsonb NOT NULL, "date_added" timestamp with time zone DEFAULT (STATEMENT_TIMESTAMP()) NOT NULL, "idempotency_key" varchar(64) NULL);
CREATE UNIQUE INDEX CONCURRENTLY "uniq_issueactionlogentry_group_idempotency_key" ON "sentry_issueactionlogentry" ("group_id", "idempotency_key") WHERE "idempotency_key" IS NOT NULL;
CREATE INDEX CONCURRENTLY "sentry_issu_group_i_ff3639_idx" ON "sentry_issueactionlogentry" ("group_id", "date_added", "id"); |
|
Okay, potentially hot take that you can feel free to ignore: have we considered using "Group" instead of "Issue" throughout? (So it'd be GroupActionLog, etc.) The reason for this is pretty simple: consistency with everything else. All of our tables use Group instead of Issue (e.g. sentry_groupredirect), as do all of our Models (e.g. GroupRedirect)... generally we've used Issues for user-facing stuff but kept Group around internally. Personally I think the consistency here is worth it — but will leave this up to you. |
I find the argument for Group compelling; I used "Issue" since that's what all of the discussion prior to this PR has used. If I can force us to set policy on "Group is what we call it in code, not an old name we haven't managed to change", I will. |
|
This PR has a migration; here is the generated SQL for for --
-- Create model GroupActionLogEntry
--
CREATE TABLE "sentry_groupactionlogentry" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "group_id" bigint NOT NULL, "project_id" bigint NOT NULL, "original_group_id" bigint NULL, "type" integer NOT NULL CHECK ("type" >= 0), "actor_type" integer NOT NULL CHECK ("actor_type" >= 0), "actor_id" bigint NOT NULL, "data" jsonb NOT NULL, "date_added" timestamp with time zone DEFAULT (STATEMENT_TIMESTAMP()) NOT NULL, "date_updated" timestamp with time zone NOT NULL, "idempotency_key" varchar(64) NULL);
CREATE UNIQUE INDEX CONCURRENTLY "uniq_groupactionlogentry_group_idempotency_key" ON "sentry_groupactionlogentry" ("group_id", "idempotency_key") WHERE "idempotency_key" IS NOT NULL;
CREATE INDEX CONCURRENTLY "sentry_grou_group_i_cc465f_idx" ON "sentry_groupactionlogentry" ("group_id", "date_added", "id"); |
wedamija
left a comment
There was a problem hiding this comment.
Isn't this similar to the existing GroupHistory table?
It is fairly similar. Also very similar to Activity. Also a few others. |
|
Secondary database plan is here to move it later. I've started the process of getting it set up, it'll take a bit, and in the meantime I'll configure the getsentry router to yell at us if it is used in any context that is incompatible with separate database. |
| date_added = models.DateTimeField(db_default=Now()) | ||
|
|
||
| # Primarly intended for debugging; not intended to be relied upon | ||
| # for invalidation. | ||
| date_updated = models.DateTimeField(auto_now=True) |
There was a problem hiding this comment.
Nit: Should we add these via using the DefaultFieldsModel base class instead?
| # DB-defaulted; backfill code may pass an explicit value. | ||
| date_added = models.DateTimeField(db_default=Now()) |
There was a problem hiding this comment.
Will we evict these over time, or just rely on Group to cascade delete? If we want to expire them after a period of time, an index could be useful here
| app_label = "sentry" | ||
| db_table = "sentry_groupactionlogentry" | ||
| indexes = [ | ||
| models.Index(fields=["group_id", "date_added", "id"]), |
There was a problem hiding this comment.
What's the intent of this index? jfyi, usually if you need to filter on two really high cardinality columns an index isn't going to help too much, since there typically won't be more than one row in each leaf.
The main reason this can be helpful is if you're making a query that only returns group_id, date_added, id. If that's not the intent, probably I'd just remove id here and keep the index size a little smaller.
| group_id = BoundedBigIntegerField() | ||
| # The project the group belongs to. | ||
| project_id = BoundedBigIntegerField() |
There was a problem hiding this comment.
You should probably add indexes on all of these fk columns
| # The project the group belongs to. | ||
| project_id = BoundedBigIntegerField() | ||
| # The group_id before any merges, if this entry was migrated. | ||
| original_group_id = BoundedBigIntegerField(null=True) |
There was a problem hiding this comment.
Probably this also needs an index?
| actor_type = BoundedPositiveIntegerField( | ||
| choices=[(t.value, t.name) for t in GroupActorType], | ||
| ) | ||
| actor_id = BoundedBigIntegerField() |

Adds GroupActionLog entry, with no meaningful actions or uses yet.
Admittedly, this isn't very meaningful without a mechanism to derive data or any real events, but it is a precursor to both of those and needs to make sense as a log independent of them.
The aim here is to make it possible to record arbitrary actions on Groups and consume them sequentially.
There aren't specific planned query patterns for actor or project_id as provided here, but they are core pieces of information about an event that should always be present and stored efficiently, and requiring them out of the gate allow for a variety of useful analyses we may need to do.
A core risk here is that this scheme is ultimately mutable. There'll be a backfill helper method with invalidation hooks and another one for merges (both render log-derived data potentially no longer canonical); asequential autoincrement IDs through an ordered log give a record of this permanently, which is nice, but it's always possible to add code that doesn't follow the rules.