Skip to content

Conversation

@vgrozdanic
Copy link
Member

Refactor of get_or_create method of GroupRelease model, preparing it for gradual rollout and to decrease the number of rollbacks it does since almost every attempt of insertions for this model results in a rollback.

Why are we doing this

Currently we are doing around 300 rollbacks per second, mostly caused by overly optimistic writes - almost all of the writes result in the rollback because the data already exists in the table, and for those occasions get_or_create is more suitable since SELECT statement is more performant than ROLLBACK when they happen most of the times.

Datadog notebok with investagtion, where 3 problematic models where detected:

  • GroupRelease
  • Commit
  • EnvironmentProject

Models will be refactored one at the time, and the refactor will be rolled out gradually: 10% - 50% - 100%

@vgrozdanic vgrozdanic requested a review from a team April 7, 2025 11:23
@vgrozdanic vgrozdanic requested a review from a team as a code owner April 7, 2025 11:23
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 7, 2025
if instance is None:
try:
with transaction.atomic(router.db_for_write(cls)):
if random.random() < options.get("grouprelease.new_get_or_create.rollout"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to change this pull request, but for the others you could use sentry.options.rollout.in_random_rollout to encapsulate the random + option read together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't aware that that exists, thanks!

I will update the docs to point to sentry.options.rollout.in_random_rollout instead of proposing to use random.random()

@vgrozdanic vgrozdanic merged commit 8d46e84 into master Apr 7, 2025
50 checks passed
@vgrozdanic vgrozdanic deleted the vgrozdanic/refactor-grouprelease-to-reduce-rollbacks branch April 7, 2025 14:40
@codecov
Copy link

codecov bot commented Apr 7, 2025

Codecov Report

Attention: Patch coverage is 94.11765% with 2 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/models/grouprelease.py 77.77% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #88885   +/-   ##
=======================================
  Coverage   87.73%   87.73%           
=======================================
  Files       10072    10072           
  Lines      569614   569711   +97     
  Branches    22373    22373           
=======================================
+ Hits       499734   499824   +90     
- Misses      69481    69488    +7     
  Partials      399      399           

andrewshie-sentry pushed a commit that referenced this pull request Apr 8, 2025
Refactor of `get_or_create` method of `GroupRelease` model, preparing it
for gradual rollout and to decrease the number of rollbacks it does
since almost every attempt of insertions for this model results in a
rollback.


## Why are we doing this

Currently we are doing around 300 rollbacks per second, mostly caused by
overly optimistic writes - almost all of the writes result in the
rollback because the data already exists in the table, and for those
occasions `get_or_create` is more suitable since SELECT statement is
more performant than ROLLBACK when they happen most of the times.

[Datadog notebok with
investagtion](https://app.datadoghq.com/notebook/12067672/postgres-rollback-investigation?range=604800000&start=1743184480708&live=true),
where 3 problematic models where detected:
- `GroupRelease`
- `Commit`
- `EnvironmentProject`

Models will be refactored one at the time, and the refactor will be
rolled out gradually: 10% - 50% - 100%
vgrozdanic added a commit that referenced this pull request Apr 10, 2025
In #88885 rollout option was
introduced.

Now it is rolled out to 100%, and it's time to do the clean up.

In the next PRs options automator flag and option registration will be
removed, once this is merged and fully deployed
MichaelSun48 pushed a commit that referenced this pull request Apr 11, 2025
In #88885 rollout option was
introduced.

Now it is rolled out to 100%, and it's time to do the clean up.

In the next PRs options automator flag and option registration will be
removed, once this is merged and fully deployed
vgrozdanic added a commit that referenced this pull request Apr 15, 2025
Similar to how it was done in:
#88885

Refactor of `add_project` method of `Environment` model, preparing it
for gradual rollout and to decrease the number of rollbacks it does
since almost every attempt of insertions for this model results in a
rollback.

In this method `EnvironmentProject` was doing overly optimistic inserts
leading to us having almost 100 rollbacks/second coming just from this
model


## Why are we doing this

Currently we are doing around 300 rollbacks per second, mostly caused by
overly optimistic writes - almost all of the writes result in the
rollback because the data already exists in the table, and for those
occasions `get_or_create` is more suitable since SELECT statement is
more performant than ROLLBACK when they happen most of the times.

[Datadog notebok with
investagtion](https://app.datadoghq.com/notebook/12067672/postgres-rollback-investigation?range=604800000&start=1743184480708&live=true),
where 3 problematic models where detected:
- ~GroupRelease~
- `Commit`
- `EnvironmentProject`

Models will be refactored one at the time, and the refactor will be
rolled out gradually: 10% - 50% - 100%
andrewshie-sentry pushed a commit that referenced this pull request Apr 22, 2025
In #88885 rollout option was
introduced.

Now it is rolled out to 100%, and it's time to do the clean up.

In the next PRs options automator flag and option registration will be
removed, once this is merged and fully deployed
andrewshie-sentry pushed a commit that referenced this pull request Apr 22, 2025
Similar to how it was done in:
#88885

Refactor of `add_project` method of `Environment` model, preparing it
for gradual rollout and to decrease the number of rollbacks it does
since almost every attempt of insertions for this model results in a
rollback.

In this method `EnvironmentProject` was doing overly optimistic inserts
leading to us having almost 100 rollbacks/second coming just from this
model


## Why are we doing this

Currently we are doing around 300 rollbacks per second, mostly caused by
overly optimistic writes - almost all of the writes result in the
rollback because the data already exists in the table, and for those
occasions `get_or_create` is more suitable since SELECT statement is
more performant than ROLLBACK when they happen most of the times.

[Datadog notebok with
investagtion](https://app.datadoghq.com/notebook/12067672/postgres-rollback-investigation?range=604800000&start=1743184480708&live=true),
where 3 problematic models where detected:
- ~GroupRelease~
- `Commit`
- `EnvironmentProject`

Models will be refactored one at the time, and the refactor will be
rolled out gradually: 10% - 50% - 100%
@github-actions github-actions bot locked and limited conversation to collaborators Apr 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants