Skip to content

Conversation

@vgrozdanic
Copy link
Member

@vgrozdanic vgrozdanic commented Apr 10, 2025

Similar to how it was done in: #88885

Refactor of add_project method of Environment model, preparing it for gradual rollout and to decrease the number of rollbacks it does since almost every attempt of insertions for this model results in a rollback.

In this method EnvironmentProject was doing overly optimistic inserts leading to us having almost 100 rollbacks/second coming just from this model

Why are we doing this

Currently we are doing around 300 rollbacks per second, mostly caused by overly optimistic writes - almost all of the writes result in the rollback because the data already exists in the table, and for those occasions get_or_create is more suitable since SELECT statement is more performant than ROLLBACK when they happen most of the times.

Datadog notebok with investagtion, where 3 problematic models where detected:

  • GroupRelease
  • Commit
  • EnvironmentProject

Models will be refactored one at the time, and the refactor will be rolled out gradually: 10% - 50% - 100%

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 10, 2025
Comment on lines 115 to 121
if in_random_rollout("environmentproject.new_add_project.rollout"):
_, created = EnvironmentProject.objects.get_or_create(
project=project, environment=self, defaults={"is_hidden": is_hidden}
)
if not created:
# We've already created the object, should still cache the action.
cache.set(cache_key, 1, 3600)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a refactor without changing any logic that was being done before:

  • is_hidden is only set during creation - same as it was before
  • if the EnvironmentProject already exists, we write to cache - same as it was before (assumption is that this is a protection on db to not overload it with to many write requests, we keep it as cache lookup is less expensive than DB lookup)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also write to cache when the EnvironmentProject record is first created? I know that is a change in logic, but it would help reduce the number of queries we're running.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not a change in logic to cache both paths. Previously, we'd cache immediately after create and inside the exception. So it's more correct to remove if not created and just cache both

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, edited the code to always cache the value

@vgrozdanic vgrozdanic marked this pull request as ready for review April 14, 2025 12:17
@vgrozdanic vgrozdanic requested a review from a team April 14, 2025 12:18
@codecov
Copy link

codecov bot commented Apr 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #89265      +/-   ##
==========================================
+ Coverage   87.73%   87.75%   +0.01%     
==========================================
  Files       10172    10138      -34     
  Lines      574129   573043    -1086     
  Branches    22612    22425     -187     
==========================================
- Hits       503728   502863     -865     
+ Misses      69985    69743     -242     
- Partials      416      437      +21     

Comment on lines 115 to 121
if in_random_rollout("environmentproject.new_add_project.rollout"):
_, created = EnvironmentProject.objects.get_or_create(
project=project, environment=self, defaults={"is_hidden": is_hidden}
)
if not created:
# We've already created the object, should still cache the action.
cache.set(cache_key, 1, 3600)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also write to cache when the EnvironmentProject record is first created? I know that is a change in logic, but it would help reduce the number of queries we're running.

@vgrozdanic vgrozdanic merged commit de38201 into master Apr 15, 2025
60 checks passed
@vgrozdanic vgrozdanic deleted the vgrozdanic/refactor-environment-model branch April 15, 2025 08:13
vgrozdanic added a commit that referenced this pull request Apr 22, 2025
…90042)

After rolling out to 100% last week, we can now safely remove old code
which is no longer used/

Continuation of
getsentry/sentry-options-automator#3597 and
#89265
andrewshie-sentry pushed a commit that referenced this pull request Apr 22, 2025
Similar to how it was done in:
#88885

Refactor of `add_project` method of `Environment` model, preparing it
for gradual rollout and to decrease the number of rollbacks it does
since almost every attempt of insertions for this model results in a
rollback.

In this method `EnvironmentProject` was doing overly optimistic inserts
leading to us having almost 100 rollbacks/second coming just from this
model


## Why are we doing this

Currently we are doing around 300 rollbacks per second, mostly caused by
overly optimistic writes - almost all of the writes result in the
rollback because the data already exists in the table, and for those
occasions `get_or_create` is more suitable since SELECT statement is
more performant than ROLLBACK when they happen most of the times.

[Datadog notebok with
investagtion](https://app.datadoghq.com/notebook/12067672/postgres-rollback-investigation?range=604800000&start=1743184480708&live=true),
where 3 problematic models where detected:
- ~GroupRelease~
- `Commit`
- `EnvironmentProject`

Models will be refactored one at the time, and the refactor will be
rolled out gradually: 10% - 50% - 100%
andrewshie-sentry pushed a commit that referenced this pull request Apr 22, 2025
…90042)

After rolling out to 100% last week, we can now safely remove old code
which is no longer used/

Continuation of
getsentry/sentry-options-automator#3597 and
#89265
@github-actions github-actions bot locked and limited conversation to collaborators Apr 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants