Skip to content

Conversation

@knizhnik
Copy link
Contributor

@knizhnik knizhnik commented May 10, 2025

Problem

See
Discussion: https://neondb.slack.com/archives/C033RQ5SPDH/p1746645666075799
Issue: https://github.com/neondatabase/cloud/issues/28609

Relation size cache is not correctly updated at PS in case of replicas.

Summary of changes

  1. Have two caches for relation size in timeline: rel_size_primary_cache and rel_size_replica_cache.
  2. rel_size_primary_cache is actually what we have now. The only difference is that it is not updated in get_rel_size, only by WAL ingestion
  3. rel_size_replica_cache has limited size (LruCache) and it's key is (Lsn,RelTag) . It is updated in get_rel_size. Only strict LSN matches are accepted as cache hit.

@knizhnik knizhnik requested a review from a team as a code owner May 10, 2025 10:38
@knizhnik knizhnik requested review from erikgrinaker and skyzh May 10, 2025 10:38
@github-actions
Copy link

github-actions bot commented May 10, 2025

8481 tests run: 7933 passed, 0 failed, 548 skipped (full report)


Flaky tests (2)

Postgres 17

Postgres 16

Code coverage* (full report)

  • functions: 32.6% (9015 of 27663 functions)
  • lines: 48.6% (78871 of 162146 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
3735ab6 at 2025-05-20T14:51:15.151Z :recycle:

@knizhnik knizhnik requested a review from VladLazar May 12, 2025 11:40
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea seems fine to me. I don't like that we are inferring the primary/replica status from the request_lsn, but that could go away with the communicator.


So the issue occurs when a non-primary endpoint updates the rel size cache, correct?
A regression test for that would be great. I still don't really see how that can happen.

@knizhnik
Copy link
Contributor Author

The idea seems fine to me. I don't like that we are inferring the primary/replica status from the request_lsn, but that could go away with the communicator.

So the issue occurs when a non-primary endpoint updates the rel size cache, correct? A regression test for that would be great. I still don't really see how that can happen.

There was a regress test proposed by @alexanderlaw in https://github.com/neondatabase/cloud/issues/28609 - not as simple as I preferred to have. But I failed to create simpler one.

@knizhnik
Copy link
Contributor Author

So the issue occurs when a non-primary endpoint updates the rel size cache, correct?

Yes. But it is not so easy to reproduce, because right now there is check in rel_size_cache:

        if lsn < rel_size_cache.complete_as_of {
            // Do not cache old values. It's safe to cache the size on read, as long as
            // the read was at an LSN since we started the WAL ingestion. Reasoning: we
            // never evict values from the cache, so if the relation size changed after
            // 'lsn', the new value is already in the cache.
            return;
        }

@VladLazar
Copy link
Contributor

The idea seems fine to me. I don't like that we are inferring the primary/replica status from the request_lsn, but that could go away with the communicator.
So the issue occurs when a non-primary endpoint updates the rel size cache, correct? A regression test for that would be great. I still don't really see how that can happen.

There was a regress test proposed by @alexanderlaw in neondatabase/cloud#28609 - not as simple as I preferred to have. But I failed to create simpler one.

I think we should spend the time and check the test in. Best be sure we're actually fixing the issue.

@VladLazar
Copy link
Contributor

I did a quick check to see if the Alexander's test actually generates writes to relsize cache from static endpoints here. It does indeed.

While working on that, I found out that the compute tells the pageserver of it's type: primary|replica|static.
@knizhnik how do you feel about using that instead of the request_lsn?

@knizhnik
Copy link
Contributor Author

I did a quick check to see if the Alexander's test actually generates writes to relsize cache from static endpoints here. It does indeed.

While working on that, I found out that the compute tells the pageserver of it's type: primary|replica|static. @knizhnik how do you feel about using that instead of the request_lsn?

I do not think that using reported compute type is better approach. First of all because LSN range not_modified_since..request_lkn is basic concept in Neon SMGR. It can be considered as more complex and obscure, than just primary/replica, but IMHO it is more fundamental one.

Also having range allows in principle to perform more precise check if cached LSN is valid.

Finally, we are currently working on replica promotion. In this case endpoint type will be changed from replica to primary (without reconnect to PS). Yes, it can be somehow detected and handled. But why to worry about it if approach with LSN range handles it automatically?

@VladLazar
Copy link
Contributor

VladLazar commented May 15, 2025

I did a quick check to see if the Alexander's test actually generates writes to relsize cache from static endpoints here. It does indeed.
While working on that, I found out that the compute tells the pageserver of it's type: primary|replica|static. @knizhnik how do you feel about using that instead of the request_lsn?

I do not think that using reported compute type is better approach. First of all because LSN range not_modified_since..request_lkn is basic concept in Neon SMGR. It can be considered as more complex and obscure, than just primary/replica, but IMHO it is more fundamental one.

Also having range allows in principle to perform more precise check if cached LSN is valid.

Finally, we are currently working on replica promotion. In this case endpoint type will be changed from replica to primary (without reconnect to PS). Yes, it can be somehow detected and handled. But why to worry about it if approach with LSN range handles it automatically?

Promotion is a good point. LSN range is a basic concept on the compute side but that's not how people on the storage team think of requests. If it needs to change, then so be it, but at least let's use lsn range everywhere as suggested in #11889 (comment)

Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polite reminder: please check in Alexander's test. I know it's not as simple as we'd want it to be, but it reproduces the issue. You can take it from my PR if it makes it easier.

@knizhnik
Copy link
Contributor Author

Polite reminder: please check in Alexander's test. I know it's not as simple as we'd want it to be, but it reproduces the issue. You can take it from my PR if it makes it easier.

Done

@knizhnik knizhnik force-pushed the rel_size_replica_cache branch from 7dc7981 to 356eb40 Compare May 15, 2025 19:27
@alexanderlaw
Copy link
Contributor

I think, I can simplify the test if you're going to commit it.

@knizhnik knizhnik requested a review from VladLazar May 15, 2025 19:36
@knizhnik
Copy link
Contributor Author

I think, I can simplify the test if you're going to commit it.

Thank you.
Will be great!

Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for going through my comments.


The only slightly substantial point I have is about checking the snapshot cache for the primary (see comment). The rest are nits for you to consider.

@knizhnik knizhnik requested a review from VladLazar May 18, 2025 04:43
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!


Two final requests related to lock handling. Apologies for not spotting it earlier.

@knizhnik knizhnik added this pull request to the merge queue May 20, 2025
Merged via the queue into main with commit 2e3dc9a May 20, 2025
184 of 185 checks passed
@knizhnik knizhnik deleted the rel_size_replica_cache branch May 20, 2025 15:44
thesuhas pushed a commit that referenced this pull request May 28, 2025
## Problem

See 
Discussion:
https://neondb.slack.com/archives/C033RQ5SPDH/p1746645666075799
Issue: neondatabase/neon-archive-cloud#28609

Relation size cache is not correctly updated at PS in case of replicas.

## Summary of changes

1. Have two caches for relation size in timeline:
`rel_size_primary_cache` and `rel_size_replica_cache`.
2. `rel_size_primary_cache` is actually what we have now. The only
difference is that it is not updated in `get_rel_size`, only by WAL
ingestion
3. `rel_size_replica_cache` has limited size (LruCache) and it's key is
`(Lsn,RelTag)` . It is updated in `get_rel_size`. Only strict LSN
matches are accepted as cache hit.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants