Sampling Cache: ensure consolidated values are reported; cache all sites, not only >threshold by aerosol · Pull Request #5783 · plausible/analytics

aerosol · 2025-10-07T08:06:26Z

Changes

Current sampling issues aside, for a consolidated view consisting of n sites, none of which exceeds the sampling threshold alone, a complete sum must be calculated.
The cache refresh query has been changed, but luckily it makes it run much faster (1s ballpark now). So at the tiny expense of ets memory used we've got all site ids available for any estimation efforts.

https://3.basecamp.com/5308029/buckets/43891605/card_tables/cards/9147369994

Tests

Automated tests have been added
This PR does not require tests

Changelog

Entry has been added to changelog
This PR does not make a user-facing change

Documentation

Docs have been updated
This change does not need a documentation update

Dark mode

The UI has been tested both in dark and light mode
This PR does not change the UI

ukutaht · 2025-10-07T08:33:10Z

extra/lib/plausible/stats/sampling_cache.ex

+    above_threshold_only? = Keyword.get(opts, :above_threshold_only?, true)
+
+    case super(key, opts) do
+      result when is_integer(result) and above_threshold_only? and result >= @threshold ->


Hmm I don't really understand why this conditional is there in the first place. It looks like if a site has 9m events in the last 30d, it is excluded from sampling?

So if we're querying 3 years of data, the traffic estimate would be 3 * 12 * 9m = 324m but due to this early exclusion we wouldn't apply sampling? If true, it doesn't much sense to me.

Sorry I know not exactly a comment on the changes made in this PR which preserves existing behaviour, it just stood out to me when reviewing.

Hmm I don't really understand why this conditional is there in the first place. It looks like if a site has 9m events in the last 30d, it is excluded from sampling?

Correct, it's how it works right now.

So if we're querying 3 years of data, the traffic estimate would be 3 * 12 * 9m = 324m but due to this early exclusion we wouldn't apply sampling? If true, it doesn't much sense to me.

No, if we're querying 3 years of data, the traffic estimate would be nil, because 9m. Only >10m are included in sampling currently which is a very small number of sites if you query the SamplingCache.size() on prod...

@zoldar WYT? having now all 30d values, can we make the estimate any more accurate?

I think that when populating the cache, we should filter by a fraction of sample threshold, something like:

having: selected_as(:events_ingested) > ^(Sampling.default_sample_threshold() / 12)

we'd still skip sampling for ranges when estimate goes below default threshold but we'd account for at least most common long term queries.

Though, on a second thought, this does not save us from very long period queries against sites just under that 30d threshold.

To really address that, we'd have to somehow account for sites.stats_start_date in that cache refresh query - I'm not really sure how though yet.

If it's doable, then definitely yes 👍 Then we would lower risk of missing sites just under the threshold. It's still not an ironclad guarantee as there might be sites with a very seasonal traffic pattern which might fly under the radar during quieter months, but still, it would be an improvement for sure.

okay, I'll work on that here

WYT? 34d6967 @ukutaht @zoldar

Looks good to me 👍 Have you checked the time/resources it takes to execute the cache query in production now?

First query can go as high as 6s, second seems to be cached and finishes in around 1s. No problem I guess.

aerosol requested review from RobertJoonas, ukutaht and zoldar October 7, 2025 08:06

ukutaht reviewed Oct 7, 2025

View reviewed changes

aerosol added 2 commits October 8, 2025 08:57

Sampling Cache: ensure consolidated values above threshold are reported

8baecfc

Calculate fractional sampling regardless of cached traffic record

34d6967

aerosol force-pushed the sampling-cache-all branch from 45d4389 to 34d6967 Compare October 8, 2025 06:57

Fix alias

98ae948

zoldar approved these changes Oct 8, 2025

View reviewed changes

aerosol changed the title ~~Sampling Cache: ensure consolidated values above threshold are reported~~ Sampling Cache: ensure consolidated values above threshold are reported; cache all sites, not only >threshold Oct 20, 2025

aerosol added this pull request to the merge queue Oct 20, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 20, 2025

aerosol changed the title ~~Sampling Cache: ensure consolidated values above threshold are reported; cache all sites, not only >threshold~~ Sampling Cache: ensure consolidated values are reported; cache all sites, not only >threshold Oct 20, 2025

aerosol added this pull request to the merge queue Oct 20, 2025

Merged via the queue into master with commit 5976a6a Oct 20, 2025
16 checks passed

aerosol deleted the sampling-cache-all branch November 24, 2025 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sampling Cache: ensure consolidated values are reported; cache all sites, not only >threshold#5783

Sampling Cache: ensure consolidated values are reported; cache all sites, not only >threshold#5783
aerosol merged 3 commits intomasterfrom
sampling-cache-all

aerosol commented Oct 7, 2025

Uh oh!

ukutaht Oct 7, 2025

Uh oh!

aerosol Oct 7, 2025

Uh oh!

aerosol Oct 7, 2025

Uh oh!

aerosol Oct 7, 2025

Uh oh!

zoldar Oct 7, 2025

Uh oh!

zoldar Oct 7, 2025 •

edited

Loading

Uh oh!

aerosol Oct 7, 2025

Uh oh!

aerosol Oct 8, 2025

Uh oh!

zoldar Oct 8, 2025

Uh oh!

aerosol Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

aerosol commented Oct 7, 2025

Changes

Tests

Changelog

Documentation

Dark mode

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zoldar Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aerosol Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zoldar Oct 7, 2025 •

edited

Loading

aerosol Oct 8, 2025 •

edited

Loading