Skip to content

feat(replays): Add memory-efficient project and project option query caches#101606

Merged
cmanallen merged 20 commits into
masterfrom
cmanallen/replays-project-query-cache
Oct 17, 2025
Merged

feat(replays): Add memory-efficient project and project option query caches#101606
cmanallen merged 20 commits into
masterfrom
cmanallen/replays-project-query-cache

Conversation

@cmanallen

@cmanallen cmanallen commented Oct 16, 2025

Copy link
Copy Markdown
Member

get_from_cache caches results in a thread-local. This means eight copies of the same project model (and project-option model) for our current configuration. Additionally get_from_cache is very coarse fetching fields we don't need. By caching just the boolean values we care about we can minimize the footprint of each query on our overall memory usage. Since we need to cache a lot of projects and project-options this is important to maintaining stable memory usage.

Total memory should be reduced by O(8n) where n is the delta between the size of a project model and a boolean and the size of the project-option model and a tuple of booleans.

A word on the AutoCache. AutoCache is safe but not logically atomic. We defer to last writer wins and potentially duplicate work. This could be improved but we don't expect the results of the fn argument to produce an effect or be non-deterministic. At least for our current case. However, it might be wise to implement better locking behavior in AutoCache.__getitem__ so we don't unnecessarily compute a project or project-option query multiple times. This is easy enough to do but I think we've done enough already for this pull so we can address this in a follow-up!

A context object is now being passed around the consumer. This is better for testing than using globals. There are more effects that could be moved out of the processing logic and into the context object. This would make our consumer significantly easier to test and require fewer mocks.

@cmanallen cmanallen requested a review from a team as a code owner October 16, 2025 16:53
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 16, 2025
cursor[bot]

This comment was marked as outdated.

Comment thread src/sentry/replays/usecases/ingest/cache.py Outdated
Comment thread src/sentry/replays/usecases/ingest/cache.py Outdated
Comment thread src/sentry/replays/usecases/ingest/__init__.py Outdated
Comment thread src/sentry/replays/usecases/ingest/event_logger.py Outdated
@codecov

codecov Bot commented Oct 16, 2025

Copy link
Copy Markdown

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
41108 2 41106 251
View the top 2 failed test(s) by shortest run time
tests.sentry.replays.integration.consumers.test_recording::test_recording_consumer_invalid_message
Stack Traces | 0.046s run time
#x1B[1m#x1B[.../integration/consumers/test_recording.py#x1B[0m:27: in consumer
    ).create_with_partitions(lambda x, force=False: None, {})
#x1B[1m#x1B[.../replays/consumers/recording.py#x1B[0m:73: in create_with_partitions
    if options.get("replay.consumer.enable_new_query_caching_system"):
#x1B[1m#x1B[.../sentry/options/manager.py#x1B[0m:312: in get
    result = self.store.get(opt, silent=silent)
#x1B[1m#x1B[.../sentry/options/store.py#x1B[0m:115: in get
    result = self.get_store(key, silent=silent)
#x1B[1m#x1B[.../sentry/options/store.py#x1B[0m:215: in get_store
    value = self.model.objects.get(key=key.name).value
#x1B[1m#x1B[31m.venv/lib/python3.13.../db/models/manager.py#x1B[0m:87: in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
#x1B[1m#x1B[31m.venv/lib/python3.13.../db/models/query.py#x1B[0m:629: in get
    num = len(clone)
#x1B[1m#x1B[31m.venv/lib/python3.13.../db/models/query.py#x1B[0m:366: in __len__
    self._fetch_all()
#x1B[1m#x1B[31m.venv/lib/python3.13.../db/models/query.py#x1B[0m:1945: in _fetch_all
    self._result_cache = list(self._iterable_class(self))
#x1B[1m#x1B[31m.venv/lib/python3.13.../db/models/query.py#x1B[0m:91: in __iter__
    results = compiler.execute_sql(
#x1B[1m#x1B[31m.venv/lib/python3.13.../models/sql/compiler.py#x1B[0m:1621: in execute_sql
    cursor = self.connection.cursor()
#x1B[1m#x1B[31m.venv/lib/python3.13.../django/utils/asyncio.py#x1B[0m:26: in inner
    return func(*args, **kwargs)
#x1B[1m#x1B[31m.venv/lib/python3.13.../backends/base/base.py#x1B[0m:320: in cursor
    return self._cursor()
#x1B[1m#x1B[.../db/postgres/decorators.py#x1B[0m:38: in inner
    return func(self, *args, **kwargs)
#x1B[1m#x1B[.../db/postgres/base.py#x1B[0m:114: in _cursor
    return super()._cursor()
#x1B[1m#x1B[31m.venv/lib/python3.13.../backends/base/base.py#x1B[0m:296: in _cursor
    self.ensure_connection()
#x1B[1m#x1B[31mE   RuntimeError: Database access not allowed, use the "django_db" mark, or the "db" or "transactional_db" fixtures to enable it.#x1B[0m
tests.sentry.replays.integration.consumers.test_recording::test_recording_consumer
Stack Traces | 0.048s run time
#x1B[1m#x1B[.../integration/consumers/test_recording.py#x1B[0m:27: in consumer
    ).create_with_partitions(lambda x, force=False: None, {})
#x1B[1m#x1B[.../replays/consumers/recording.py#x1B[0m:73: in create_with_partitions
    if options.get("replay.consumer.enable_new_query_caching_system"):
#x1B[1m#x1B[.../sentry/options/manager.py#x1B[0m:312: in get
    result = self.store.get(opt, silent=silent)
#x1B[1m#x1B[.../sentry/options/store.py#x1B[0m:115: in get
    result = self.get_store(key, silent=silent)
#x1B[1m#x1B[.../sentry/options/store.py#x1B[0m:215: in get_store
    value = self.model.objects.get(key=key.name).value
#x1B[1m#x1B[31m.venv/lib/python3.13.../db/models/manager.py#x1B[0m:87: in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
#x1B[1m#x1B[31m.venv/lib/python3.13.../db/models/query.py#x1B[0m:629: in get
    num = len(clone)
#x1B[1m#x1B[31m.venv/lib/python3.13.../db/models/query.py#x1B[0m:366: in __len__
    self._fetch_all()
#x1B[1m#x1B[31m.venv/lib/python3.13.../db/models/query.py#x1B[0m:1945: in _fetch_all
    self._result_cache = list(self._iterable_class(self))
#x1B[1m#x1B[31m.venv/lib/python3.13.../db/models/query.py#x1B[0m:91: in __iter__
    results = compiler.execute_sql(
#x1B[1m#x1B[31m.venv/lib/python3.13.../models/sql/compiler.py#x1B[0m:1621: in execute_sql
    cursor = self.connection.cursor()
#x1B[1m#x1B[31m.venv/lib/python3.13.../django/utils/asyncio.py#x1B[0m:26: in inner
    return func(*args, **kwargs)
#x1B[1m#x1B[31m.venv/lib/python3.13.../backends/base/base.py#x1B[0m:320: in cursor
    return self._cursor()
#x1B[1m#x1B[.../db/postgres/decorators.py#x1B[0m:38: in inner
    return func(self, *args, **kwargs)
#x1B[1m#x1B[.../db/postgres/base.py#x1B[0m:114: in _cursor
    return super()._cursor()
#x1B[1m#x1B[31m.venv/lib/python3.13.../backends/base/base.py#x1B[0m:296: in _cursor
    self.ensure_connection()
#x1B[1m#x1B[31mE   RuntimeError: Database access not allowed, use the "django_db" mark, or the "db" or "transactional_db" fixtures to enable it.#x1B[0m

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@srest2021 srest2021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two tests are failing with db access errors because options.get("replay.consumer.enable_new_query_caching_system") in ProcessReplayRecordingStrategyFactory.create_with_partitions() isn't being correctly patched by the existing options_get mock:

tests/sentry/replays/integration/consumers/test_recording.py::test_recording_consumer_invalid_message

tests/sentry/replays/integration/consumers/test_recording.py::test_recording_consumer

You can patch options.get() in the consumer fixture to make sure the tests pass.

Approving b/c I patched it locally to return True and the tests passed!


# We're intentionally manually looking up the options. We're avoided the project-options local
# cache which exist on the preferred interface methods.
options = ProjectOption.objects.filter(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return false for both options if project doesn't exist. _has_replays_lookup() will raise before we get here, but might be good to add the same raise DropEvent() behavior here to make the expectation that the project exists explicit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentional. I'm not sure if an absence of the project-options implies the absence of the project and I don't want to query to find out (since another motivation with this PR is to reduce the amount of times PG bouncer rejects our queries and it would generally be bad for throughput).

In practice, Relay drops events from deleted projects so what we would be catching here are poorly timed deletions. Raising here would prevent an unnecessary publish to the issues platform but its not a big deal if we occasionally push bad data. They validate it regardless.

@cmanallen

Copy link
Copy Markdown
Member Author

Thank you @srest2021 for the notes on patching the fixture!

cursor[bot]

This comment was marked as outdated.

@cmanallen cmanallen merged commit 808f085 into master Oct 17, 2025
69 checks passed
@cmanallen cmanallen deleted the cmanallen/replays-project-query-cache branch October 17, 2025 15:24
@sentry

sentry Bot commented Oct 18, 2025

Copy link
Copy Markdown
Contributor

Issues attributed to commits in this pull request

This pull request was merged and Sentry observed the following issues:

srest2021 added a commit that referenced this pull request Oct 20, 2025
@github-actions github-actions Bot locked and limited conversation to collaborators Nov 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants