Skip to content

ref(seer): Drop legacy night-shift columns and kind db_default#114828

Merged
trevor-e merged 14 commits intomasterfrom
telkins/night-shift-column-deletion
May 5, 2026
Merged

ref(seer): Drop legacy night-shift columns and kind db_default#114828
trevor-e merged 14 commits intomasterfrom
telkins/night-shift-column-deletion

Conversation

@trevor-e
Copy link
Copy Markdown
Member

@trevor-e trevor-e commented May 5, 2026

Stacked on #114790 — review and land that one first, then wait for full deploy before merging this.

Drops the legacy night-shift columns now that #114790 has fully deployed and all live code has stopped reading/writing them.

  • SafeRemoveField(DELETE) for seer_nightshiftrunissue.action, seer_nightshiftrun.triage_strategy, and seer_nightshiftrun.error_message — physically drops the columns from Postgres.
  • Drops the db_default on kind (kept in ref(seer): Genericize night shift result table #114790 to keep INSERTs from old replicas valid mid-rolling-deploy). All live code now specifies kind explicitly, so the safety net is no longer needed.

trevor-e and others added 10 commits May 4, 2026 18:14
Renames SeerNightShiftRunIssue → SeerNightShiftRunResult (state-only;
db_table preserved) and adds a `kind` discriminator + per-row `extras`
JSON to support nightly per-org work beyond agentic triage. Triage's
existing `action` column is moved into extras["action"] via a
deploy-time backfill. Legacy columns (action, triage_strategy,
error_message) are marked SafeRemoveField MOVE_TO_PENDING; their
physical drop ships in a follow-up.
Without this, runs that had an error_message recorded against the
legacy column would return errorMessage: null from the API after the
column is removed from model state via SafeRemoveField(MOVE_TO_PENDING).
Adds a second RunPython op (run before the error_message
SafeRemoveField) and extends the migration test to cover both the
existing action backfill and the new error_message backfill.
Mypy flagged the missing annotation on the migration test's
setup_before_migration override.
Dropping the db_default in the same migration that adds the NOT NULL
column breaks rolling deploys: old replicas still INSERTing without
kind would fail with a NOT NULL violation between when the migration
applies and when those replicas are replaced. Defer the db_default
drop to the follow-up PR2 (the same one that runs SafeRemoveField
DELETE for the legacy columns), which already requires PR1 to be
fully deployed.
Pre-migration the triage_strategy column was a required CharField,
always set to "agentic_triage" — including for runs that produced no
result rows (failed quota check, no eligible projects, dry runs). My
earlier serializer change derived the legacy alias from result-row
presence, silently flipping triageStrategy to null for those runs.
Hardcode it back to "agentic_triage" so the API surface matches the
old behavior. Multi-kind serializer logic ships with the feature PR.
The migration has 12 operations including a CONCURRENTLY index, so
TestMigrations.setUp's roll-back/forward cycle was hovering at or
over CI's 120s fail-slow budget per test. The migration itself is
already exercised by:

- Direct application against dev DB with real data (47 result rows
  backfilled correctly).
- The full night-shift test suite (49 tests) running against the
  post-migration schema with --create-db.
- The two RunPython callbacks are simple idempotent loops; the cost
  of an integration test has stopped paying for itself.
Drive-by edit from the original PR1 commit. No reason for it.
Follow-up to the schema-genericization PR. The previous PR moved
action, triage_strategy, and error_message to MOVE_TO_PENDING and
left a db_default on the new kind column to keep mid-rolling-deploy
INSERTs from old replicas valid. Now that the previous PR is fully
deployed, this PR:

- SafeRemoveField(DELETE) for action, triage_strategy, error_message
  to physically drop the columns from Postgres.
- Drops the db_default on kind so callers must specify it explicitly
  (already the case in all live code).
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

This PR has a migration; here is the generated SQL for src/sentry/seer/migrations/0009_genericize_night_shift_results.py src/sentry/seer/migrations/0010_drop_legacy_columns.py

for 0009_genericize_night_shift_results in seer

--
-- Custom state/database change combination
--
-- (no-op)
--
-- Alter field run on seernightshiftrunresult
--
-- (no-op)
--
-- Add field kind to seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ADD COLUMN "kind" varchar(256) DEFAULT 'agentic_triage' NOT NULL;
--
-- Add field extras to seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ADD COLUMN "extras" jsonb DEFAULT '{}'::jsonb NOT NULL;
--
-- Alter field group on seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ALTER COLUMN "group_id" DROP NOT NULL;
--
-- Alter field action on seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ALTER COLUMN "action" DROP NOT NULL;
--
-- Alter field triage_strategy on seernightshiftrun
--
ALTER TABLE "seer_nightshiftrun" ALTER COLUMN "triage_strategy" DROP NOT NULL;
--
-- Create index seer_nights_run_id_d5406e_idx on field(s) run, kind of model seernightshiftrunresult
--
CREATE INDEX CONCURRENTLY "seer_nights_run_id_d5406e_idx" ON "seer_nightshiftrunissue" ("run_id", "kind");
--
-- Raw Python operation
--
-- THIS OPERATION CANNOT BE WRITTEN AS SQL
--
-- Moved seernightshiftrunresult.action field to pending deletion state
--
-- (no-op)
--
-- Moved seernightshiftrun.triage_strategy field to pending deletion state
--
-- (no-op)
--
-- Raw Python operation
--
-- THIS OPERATION CANNOT BE WRITTEN AS SQL
--
-- Moved seernightshiftrun.error_message field to pending deletion state
--
-- (no-op)

for 0010_drop_legacy_columns in seer

--
-- Remove field action from seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" DROP COLUMN "action" CASCADE;
--
-- Remove field triage_strategy from seernightshiftrun
--
ALTER TABLE "seer_nightshiftrun" DROP COLUMN "triage_strategy" CASCADE;
--
-- Remove field error_message from seernightshiftrun
--
ALTER TABLE "seer_nightshiftrun" DROP COLUMN "error_message" CASCADE;
--
-- Alter field kind on seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ALTER COLUMN "kind" DROP DEFAULT;

Copy link
Copy Markdown
Contributor

@chromy chromy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Base automatically changed from telkins/night-shift-feedback-summary to master May 5, 2026 14:12
…-column-deletion

# Conflicts:
#	migrations_lockfile.txt
#	src/sentry/seer/models/night_shift.py
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

This PR has a migration; here is the generated SQL for src/sentry/seer/migrations/0010_drop_legacy_columns.py

for 0010_drop_legacy_columns in seer

--
-- Remove field action from seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" DROP COLUMN "action" CASCADE;
--
-- Remove field triage_strategy from seernightshiftrun
--
ALTER TABLE "seer_nightshiftrun" DROP COLUMN "triage_strategy" CASCADE;
--
-- Remove field error_message from seernightshiftrun
--
ALTER TABLE "seer_nightshiftrun" DROP COLUMN "error_message" CASCADE;
--
-- Alter field kind on seernightshiftrunresult
--
ALTER TABLE "seer_nightshiftrunissue" ALTER COLUMN "kind" DROP DEFAULT;

@trevor-e trevor-e marked this pull request as ready for review May 5, 2026 14:32
@trevor-e trevor-e requested review from a team as code owners May 5, 2026 14:32
trevor-e added 2 commits May 5, 2026 11:12
The test passes but exceeds the global 120s budget because Django's
migration framework has to roll back/forward 12 operations (including
a CONCURRENTLY index) every time. CI attributes this work to the
"call" phase, so pytest-fail-slow trips. Lifting the per-test budget
to 4m gives stable headroom; the cost is fundamentally in the
framework, not the test logic.
The migration shipped, the data backfilled correctly in prod, and the
test was the only one in the codebase needing a fail_slow override —
130s of CI time on every backend PR for marginal post-merge value.
Squash will eventually elide the migration anyway (it's marked
elidable=True). Cleaner to remove now along with the column-deletion
migration this PR ships.
@trevor-e trevor-e merged commit 0e3d676 into master May 5, 2026
83 checks passed
@trevor-e trevor-e deleted the telkins/night-shift-column-deletion branch May 5, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants