Add rel_size_replica_cache #11889

knizhnik · 2025-05-10T10:38:24Z

Problem

See
Discussion: https://neondb.slack.com/archives/C033RQ5SPDH/p1746645666075799
Issue: https://github.com/neondatabase/cloud/issues/28609

Relation size cache is not correctly updated at PS in case of replicas.

Summary of changes

Have two caches for relation size in timeline: rel_size_primary_cache and rel_size_replica_cache.
rel_size_primary_cache is actually what we have now. The only difference is that it is not updated in get_rel_size, only by WAL ingestion
rel_size_replica_cache has limited size (LruCache) and it's key is (Lsn,RelTag) . It is updated in get_rel_size. Only strict LSN matches are accepted as cache hit.

github-actions · 2025-05-10T11:40:38Z

8481 tests run: 7933 passed, 0 failed, 548 skipped (full report)

Flaky tests (2)

Postgres 17

test_storcon_create_delete_sk_down[DeletionSubject.TENANT-RestartStorcon.RESTART]: release-arm64-with-lfc

Postgres 16

test_sharding_split_failures[failure13]: release-arm64-with-lfc

Code coverage* (full report)

functions: 32.6% (9015 of 27663 functions)
lines: 48.6% (78871 of 162146 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
3735ab6 at 2025-05-20T14:51:15.151Z :recycle:}

VladLazar

The idea seems fine to me. I don't like that we are inferring the primary/replica status from the request_lsn, but that could go away with the communicator.

So the issue occurs when a non-primary endpoint updates the rel size cache, correct?
A regression test for that would be great. I still don't really see how that can happen.

pageserver/src/pgdatadir_mapping.rs

pageserver/src/tenant/timeline.rs

pageserver/src/pgdatadir_mapping.rs

knizhnik · 2025-05-13T18:08:56Z

The idea seems fine to me. I don't like that we are inferring the primary/replica status from the request_lsn, but that could go away with the communicator.

So the issue occurs when a non-primary endpoint updates the rel size cache, correct? A regression test for that would be great. I still don't really see how that can happen.

There was a regress test proposed by @alexanderlaw in https://github.com/neondatabase/cloud/issues/28609 - not as simple as I preferred to have. But I failed to create simpler one.

knizhnik · 2025-05-13T18:10:57Z

So the issue occurs when a non-primary endpoint updates the rel size cache, correct?

Yes. But it is not so easy to reproduce, because right now there is check in rel_size_cache:

        if lsn < rel_size_cache.complete_as_of {
            // Do not cache old values. It's safe to cache the size on read, as long as
            // the read was at an LSN since we started the WAL ingestion. Reasoning: we
            // never evict values from the cache, so if the relation size changed after
            // 'lsn', the new value is already in the cache.
            return;
        }

VladLazar · 2025-05-14T10:06:40Z

The idea seems fine to me. I don't like that we are inferring the primary/replica status from the request_lsn, but that could go away with the communicator.
So the issue occurs when a non-primary endpoint updates the rel size cache, correct? A regression test for that would be great. I still don't really see how that can happen.

There was a regress test proposed by @alexanderlaw in neondatabase/cloud#28609 - not as simple as I preferred to have. But I failed to create simpler one.

I think we should spend the time and check the test in. Best be sure we're actually fixing the issue.

VladLazar · 2025-05-14T12:09:55Z

I did a quick check to see if the Alexander's test actually generates writes to relsize cache from static endpoints here. It does indeed.

While working on that, I found out that the compute tells the pageserver of it's type: primary|replica|static.
@knizhnik how do you feel about using that instead of the request_lsn?

knizhnik · 2025-05-15T05:19:53Z

I did a quick check to see if the Alexander's test actually generates writes to relsize cache from static endpoints here. It does indeed.

While working on that, I found out that the compute tells the pageserver of it's type: primary|replica|static. @knizhnik how do you feel about using that instead of the request_lsn?

I do not think that using reported compute type is better approach. First of all because LSN range not_modified_since..request_lkn is basic concept in Neon SMGR. It can be considered as more complex and obscure, than just primary/replica, but IMHO it is more fundamental one.

Also having range allows in principle to perform more precise check if cached LSN is valid.

Finally, we are currently working on replica promotion. In this case endpoint type will be changed from replica to primary (without reconnect to PS). Yes, it can be somehow detected and handled. But why to worry about it if approach with LSN range handles it automatically?

VladLazar · 2025-05-15T12:49:12Z

I did a quick check to see if the Alexander's test actually generates writes to relsize cache from static endpoints here. It does indeed.
While working on that, I found out that the compute tells the pageserver of it's type: primary|replica|static. @knizhnik how do you feel about using that instead of the request_lsn?

I do not think that using reported compute type is better approach. First of all because LSN range not_modified_since..request_lkn is basic concept in Neon SMGR. It can be considered as more complex and obscure, than just primary/replica, but IMHO it is more fundamental one.

Also having range allows in principle to perform more precise check if cached LSN is valid.

Finally, we are currently working on replica promotion. In this case endpoint type will be changed from replica to primary (without reconnect to PS). Yes, it can be somehow detected and handled. But why to worry about it if approach with LSN range handles it automatically?

Promotion is a good point. LSN range is a basic concept on the compute side but that's not how people on the storage team think of requests. If it needs to change, then so be it, but at least let's use lsn range everywhere as suggested in #11889 (comment)

pageserver/src/pgdatadir_mapping.rs

pageserver/src/tenant/timeline.rs

VladLazar

Polite reminder: please check in Alexander's test. I know it's not as simple as we'd want it to be, but it reproduces the issue. You can take it from my PR if it makes it easier.

knizhnik · 2025-05-15T18:41:36Z

Polite reminder: please check in Alexander's test. I know it's not as simple as we'd want it to be, but it reproduces the issue. You can take it from my PR if it makes it easier.

Done

alexanderlaw · 2025-05-15T19:35:27Z

I think, I can simplify the test if you're going to commit it.

knizhnik · 2025-05-15T19:42:43Z

I think, I can simplify the test if you're going to commit it.

Thank you.
Will be great!

VladLazar

Looks good. Thanks for going through my comments.

The only slightly substantial point I have is about checking the snapshot cache for the primary (see comment). The rest are nits for you to consider.

pageserver/src/tenant/timeline.rs

pageserver/src/pgdatadir_mapping.rs

pageserver/src/page_service.rs

libs/pageserver_api/src/config.rs

pageserver/src/pgdatadir_mapping.rs

VladLazar

Good stuff!

Two final requests related to lock handling. Apologies for not spotting it earlier.

pageserver/src/tenant/timeline.rs

pageserver/src/pgdatadir_mapping.rs

## Problem See Discussion: https://neondb.slack.com/archives/C033RQ5SPDH/p1746645666075799 Issue: neondatabase/neon-archive-cloud#28609 Relation size cache is not correctly updated at PS in case of replicas. ## Summary of changes 1. Have two caches for relation size in timeline: `rel_size_primary_cache` and `rel_size_replica_cache`. 2. `rel_size_primary_cache` is actually what we have now. The only difference is that it is not updated in `get_rel_size`, only by WAL ingestion 3. `rel_size_replica_cache` has limited size (LruCache) and it's key is `(Lsn,RelTag)` . It is updated in `get_rel_size`. Only strict LSN matches are accepted as cache hit. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

knizhnik requested a review from a team as a code owner May 10, 2025 10:38

knizhnik requested review from erikgrinaker and skyzh May 10, 2025 10:38

knizhnik requested a review from VladLazar May 12, 2025 11:40

VladLazar reviewed May 13, 2025

View reviewed changes

VladLazar reviewed May 15, 2025

View reviewed changes

pageserver/src/pgdatadir_mapping.rs Outdated Show resolved Hide resolved

pageserver/src/pgdatadir_mapping.rs Show resolved Hide resolved

pageserver/src/pgdatadir_mapping.rs Outdated Show resolved Hide resolved

pageserver/src/tenant/timeline.rs Show resolved Hide resolved

VladLazar reviewed May 15, 2025

View reviewed changes

knizhnik added 11 commits May 15, 2025 22:27

Add rel_size_replica_cache

11154d5

Increase relsize_cache capacity

0b7b396

Update primary relsize cache in case of get_rel_size for the latest LSN

2f7c85b

Fix update of rel_size_primary_cache

b92c02e

Add relsize_pitr_cache_capacity parameter

05888a5

Fix issue with serde

38134cd

Rename pitr_cache to snapshopt_cache

51004dc

Fix test_attach_tenant_config.py test

7798579

Rename relsize_pitr_cache_capacity

41adb4e

Add test for ephemeral endpoint vacuum

bb39f88

Remove Version::Lsn

356eb40

knizhnik force-pushed the rel_size_replica_cache branch from 7dc7981 to 356eb40 Compare May 15, 2025 19:27

knizhnik requested a review from VladLazar May 15, 2025 19:36

VladLazar reviewed May 16, 2025

View reviewed changes

Address review comments

82bf650

knizhnik requested a review from VladLazar May 18, 2025 04:43

VladLazar approved these changes May 19, 2025

View reviewed changes

pageserver/src/tenant/timeline.rs Show resolved Hide resolved

pageserver/src/pgdatadir_mapping.rs Show resolved Hide resolved

Drop latest relsize cache lock before lookup in snapshot cache

3735ab6

VladLazar mentioned this pull request May 20, 2025

pageserver: keep tombstone in relation size cache on relation drop #11925

Closed

knizhnik added this pull request to the merge queue May 20, 2025

Merged via the queue into main with commit 2e3dc9a May 20, 2025
184 of 185 checks passed

knizhnik deleted the rel_size_replica_cache branch May 20, 2025 15:44

Add rel_size_replica_cache #11889

Add rel_size_replica_cache #11889

Uh oh!

Conversation

knizhnik commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of changes

Uh oh!

github-actions bot commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

8481 tests run: 7933 passed, 0 failed, 548 skipped (full report)

Postgres 17

Postgres 16

Code coverage* (full report)

Uh oh!

VladLazar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

knizhnik commented May 13, 2025

Uh oh!

knizhnik commented May 13, 2025

Uh oh!

VladLazar commented May 14, 2025

Uh oh!

VladLazar commented May 14, 2025

Uh oh!

knizhnik commented May 15, 2025

Uh oh!

VladLazar commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VladLazar left a comment

Choose a reason for hiding this comment

Uh oh!

knizhnik commented May 15, 2025

Uh oh!

alexanderlaw commented May 15, 2025

Uh oh!

knizhnik commented May 15, 2025

Uh oh!

VladLazar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VladLazar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

knizhnik commented May 10, 2025 •

edited

Loading

github-actions bot commented May 10, 2025 •

edited

Loading

VladLazar commented May 15, 2025 •

edited

Loading

VladLazar left a comment •

edited

Loading