Skip to content

Conversation

@srest2021
Copy link
Member

@srest2021 srest2021 commented Nov 5, 2025

relates to REPLAY-803

followup: #102711

@linear
Copy link

linear bot commented Nov 5, 2025

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 5, 2025
@srest2021 srest2021 changed the title fix(releases): Add build number and code to semver index fix(releases): Add build number and code to semver index (DB migration) Nov 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

This PR has a migration; here is the generated SQL for src/sentry/migrations/1003_rebuild_semver_index_with_build_fields.py

for 1003_rebuild_semver_index_with_build_fields in sentry

--
-- Remove index sentry_release_semver_idx from release
--
DROP INDEX CONCURRENTLY IF EXISTS "sentry_release_semver_idx";
--
-- Create index sentry_release_semver_idx on F(organization), OrderBy(F(major), descending=True), OrderBy(F(minor), descending=True), OrderBy(F(patch), descending=True), OrderBy(F(revision), descending=True), OrderBy(CASE WHEN <Q: (AND: ('prerelease', ''))> THEN Value(1), ELSE Value(0), descending=True), OrderBy(F(prerelease), descending=True), OrderBy(F(build_number), descending=True), OrderBy(F(build_code), descending=True) on model release
--
CREATE INDEX CONCURRENTLY "sentry_release_semver_idx" ON "sentry_release" ("organization_id", "major" DESC, "minor" DESC, "patch" DESC, "revision" DESC, (CASE WHEN "prerelease" = '' THEN 1 ELSE 0 END) DESC, "prerelease" DESC, "build_number" DESC, "build_code" DESC);

@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

This PR has a migration; here is the generated SQL for src/sentry/migrations/1004_rebuild_semver_index_with_build_fields.py

for 1004_rebuild_semver_index_with_build_fields in sentry

--
-- Remove index sentry_release_semver_idx from release
--
DROP INDEX CONCURRENTLY IF EXISTS "sentry_release_semver_idx";
--
-- Create index sentry_release_semver_idx on F(organization), OrderBy(F(major), descending=True), OrderBy(F(minor), descending=True), OrderBy(F(patch), descending=True), OrderBy(F(revision), descending=True), OrderBy(CASE WHEN <Q: (AND: ('prerelease', ''))> THEN Value(1), ELSE Value(0), descending=True), OrderBy(F(prerelease), descending=True), OrderBy(F(build_number), descending=True), OrderBy(F(build_code), descending=True) on model release
--
CREATE INDEX CONCURRENTLY "sentry_release_semver_idx" ON "sentry_release" ("organization_id", "major" DESC, "minor" DESC, "patch" DESC, "revision" DESC, (CASE WHEN "prerelease" = '' THEN 1 ELSE 0 END) DESC, "prerelease" DESC, "build_number" DESC, "build_code" DESC);

@srest2021 srest2021 marked this pull request as ready for review November 5, 2025 19:39
@srest2021 srest2021 requested review from a team as code owners November 5, 2025 19:39
("sentry", "1003_group_history_prev_history_safe_removal"),
]

operations = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will let the folks on the db migration reviewers group chime in but i think we want to first create the new index, then drop the old one after in a separate migration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should switch these if we go ahead with this

Copy link
Member

@wedamija wedamija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe the queries you want to make with this?

My vague memory of this that build number/code didn't make sense to sort by. Do you have some example rows here just so I can make sure I'm following this correctly?

("sentry", "1003_group_history_prev_history_safe_removal"),
]

operations = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should switch these if we go ahead with this

@srest2021
Copy link
Member Author

Could you describe the queries you want to make with this?

My vague memory of this that build number/code didn't make sense to sort by. Do you have some example rows here just so I can make sure I'm following this correctly?

@wedamija Yes definitely. According to the semver spec build code should not be included in the semver ordering. However it turns out that when detecting regressions we have been including the build_number and build_code columns in the ordering (see these test cases in my followup PR, which demonstrate the existing behavior for regression detection and act as our "target" behavior for semver ordering). We also have a case here (thread here) for the resolve in next release feature which points out this inconsistency. By adding the build code and number to the semver ordering, we can make sure that the "next" release picked to resolve by follows the same ordering. Here is the test I modified in the followup to confirm and demonstrate the new ordering.

I'll make sure to create a new index for this. The changes in the followup PR will also be gated under a new feature flag (WIP) so we can roll this out safely.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

This PR has a migration; here is the generated SQL for src/sentry/migrations/1004_add_semver_with_build_code_index.py

for 1004_add_semver_with_build_code_index in sentry

--
-- Create index sentry_release_semver_new_idx on F(organization), OrderBy(F(major), descending=True), OrderBy(F(minor), descending=True), OrderBy(F(patch), descending=True), OrderBy(F(revision), descending=True), OrderBy(CASE WHEN <Q: (AND: ('prerelease', ''))> THEN Value(1), ELSE Value(0), descending=True), OrderBy(F(prerelease), descending=True), OrderBy(CASE WHEN <Q: (AND: ('build_code__isnull', False), ('build_number__isnull', True))> THEN Value(2), WHEN <Q: (AND: ('build_number__isnull', False))> THEN Value(1), ELSE Value(0), descending=True), OrderBy(F(build_number), descending=True), OrderBy(F(build_code), descending=True) on model release
--
CREATE INDEX CONCURRENTLY "sentry_release_semver_new_idx" ON "sentry_release" ("organization_id", "major" DESC, "minor" DESC, "patch" DESC, "revision" DESC, (CASE WHEN "prerelease" = '' THEN 1 ELSE 0 END) DESC, "prerelease" DESC, (CASE WHEN ("build_code" IS NOT NULL AND "build_number" IS NULL) THEN 2 WHEN "build_number" IS NOT NULL THEN 1 ELSE 0 END) DESC, "build_number" DESC, "build_code" DESC);

@wedamija
Copy link
Member

wedamija commented Nov 5, 2025

Could you describe the queries you want to make with this?
My vague memory of this that build number/code didn't make sense to sort by. Do you have some example rows here just so I can make sure I'm following this correctly?

@wedamija Yes definitely. According to the semver spec build code should not be included in the semver ordering. However it turns out that when detecting regressions we have been including the build_number and build_code columns in the ordering (see these test cases in my followup PR, which demonstrate the existing behavior for regression detection and act as our "target" behavior for semver ordering). We also have a case here (thread here) for the resolve in next release feature which points out this inconsistency. By adding the build code and number to the semver ordering, we can make sure that the "next" release picked to resolve by follows the same ordering. Here is the test I modified in the followup to confirm and demonstrate the new ordering.

I'll make sure to create a new index for this. The changes in the followup PR will also be gated under a new feature flag (WIP) so we can roll this out safely.

Ok, this all sounds reasonable. One thing I want to check is whether the index will actually be used here. It might be fine to leave the existing index as is, since maybe there aren't that many builds per release.

Are you able to print out an actual sql query that uses this sort so that we can verify whether the new columns in this index will get used?

@srest2021
Copy link
Member Author

srest2021 commented Nov 6, 2025

@wedamija The use case I'm interested in is the greatest_semver_release function, which with the changes would generate a sql query like this. Would of course be easier to skip having to create the new index if possible--let me know what you think.

SELECT
  "sentry_release"."id",
  "sentry_release"."organization_id",
  "sentry_release"."status",
  "sentry_release"."version",
  "sentry_release"."ref",
  "sentry_release"."url",
  "sentry_release"."date_added",
  "sentry_release"."date_started",
  "sentry_release"."date_released",
  "sentry_release"."data",
  "sentry_release"."owner_id",
  "sentry_release"."commit_count",
  "sentry_release"."last_commit_id",
  "sentry_release"."authors",
  "sentry_release"."total_deploys",
  "sentry_release"."last_deploy_id",
  "sentry_release"."package",
  "sentry_release"."major",
  "sentry_release"."minor",
  "sentry_release"."patch",
  "sentry_release"."revision",
  "sentry_release"."prerelease",
  "sentry_release"."build_code",
  "sentry_release"."build_number",
  "sentry_release"."user_agent",
  CASE WHEN "sentry_release"."prerelease" = '' THEN 1 ELSE 0 END AS "prerelease_case",
  CASE WHEN ("sentry_release"."build_code" IS NOT NULL
  AND "sentry_release"."build_number" IS NULL) THEN 2 WHEN "sentry_release"."build_number" IS NOT NULL THEN 1 ELSE 0 END AS "build_code_case"
FROM
  "sentry_release"
INNER JOIN
  "sentry_release_project"
  ON ("sentry_release"."id" = "sentry_release_project"."release_id")
WHERE
  ("sentry_release"."organization_id" = 123
  AND "sentry_release_project"."project_id" = 123
  AND "sentry_release"."major" IS NOT NULL)
ORDER BY
  "sentry_release"."major" DESC,
  "sentry_release"."minor" DESC,
  "sentry_release"."patch" DESC,
  "sentry_release"."revision" DESC,
  26 DESC, --prerelease_case column
  "sentry_release"."prerelease" DESC,
  27 DESC, --build_code_case column
  "sentry_release"."build_number" DESC,
  "sentry_release"."build_code" DESC

@wedamija
Copy link
Member

wedamija commented Nov 6, 2025

@wedamija The use case I'm interested in is the greatest_semver_release function, which with the changes would generate a sql query like this. Would of course be easier to skip having to create the new index if possible--let me know what you think.

SELECT
  "sentry_release"."id",
  "sentry_release"."organization_id",
  "sentry_release"."status",
  "sentry_release"."version",
  "sentry_release"."ref",
  "sentry_release"."url",
  "sentry_release"."date_added",
  "sentry_release"."date_started",
  "sentry_release"."date_released",
  "sentry_release"."data",
  "sentry_release"."owner_id",
  "sentry_release"."commit_count",
  "sentry_release"."last_commit_id",
  "sentry_release"."authors",
  "sentry_release"."total_deploys",
  "sentry_release"."last_deploy_id",
  "sentry_release"."package",
  "sentry_release"."major",
  "sentry_release"."minor",
  "sentry_release"."patch",
  "sentry_release"."revision",
  "sentry_release"."prerelease",
  "sentry_release"."build_code",
  "sentry_release"."build_number",
  "sentry_release"."user_agent",
  CASE WHEN "sentry_release"."prerelease" = '' THEN 1 ELSE 0 END AS "prerelease_case",
  CASE WHEN ("sentry_release"."build_code" IS NOT NULL
  AND "sentry_release"."build_number" IS NULL) THEN 2 WHEN "sentry_release"."build_number" IS NOT NULL THEN 1 ELSE 0 END AS "build_code_case"
FROM
  "sentry_release"
INNER JOIN
  "sentry_release_project"
  ON ("sentry_release"."id" = "sentry_release_project"."release_id")
WHERE
  ("sentry_release"."organization_id" = 123
  AND "sentry_release_project"."project_id" = 123
  AND "sentry_release"."major" IS NOT NULL)
ORDER BY
  "sentry_release"."major" DESC,
  "sentry_release"."minor" DESC,
  "sentry_release"."patch" DESC,
  "sentry_release"."revision" DESC,
  26 DESC, --prerelease_case column
  "sentry_release"."prerelease" DESC,
  27 DESC, --build_code_case column
  "sentry_release"."build_number" DESC,
  "sentry_release"."build_code" DESC

As is, this index won't be used for the build columns because we're sorting on build_code_case_column. Postgres doesn't know how to use that index to build the case statement. So we have two options here:

  • Add a column like prerelease_case but for build_code_case to the index. If you take a look at the generated index above, you can see it being used. So you'd need to have <build_code_case_col>, build_number, build_code at the end of your index
  • You could try running this query as-is in redash against some orgs you want to test it on, and see if it's already fast enough. I'd probably recommend trying this first, instead of making this index more complicated, because I think it might be hard for postgres to even decide to use it. When testing in postgres, be aware that multiple runs of the same query end up cached, and so you need to test on different org/project ids, or wait long enough that the queries are removed from memory.

@srest2021
Copy link
Member Author

Update: We think it's ok to skip rebuilding the index when adding the build code columns to the semver ordering. We'll monitor the latency of the query on Datadog when changes are deployed.

@srest2021 srest2021 closed this Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants