Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: use conditional GET for secondary tenant heatmaps #9236

Merged
merged 9 commits into from
Oct 4, 2024

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Oct 2, 2024

Problem

Secondary tenant heatmaps were always downloaded, even when they hadn't changed. This can be avoided by using a conditional GET request passing the ETag of the previous heatmap.

Resolves #6199.

Summary of changes

The ETag was already plumbed down into the heatmap downloader, and just needed further plumbing into the remote storage backends.

  • Add a DownloadOpts struct and pass it to RemoteStorage::download().
  • Add an optional DownloadOpts::etag field, which uses a conditional GET and returns DownloadError::Unmodified on match.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Copy link

github-actions bot commented Oct 2, 2024

5058 tests run: 4872 passed, 0 failed, 186 skipped (full report)


Flaky tests (7)

Postgres 17

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 31.4% (7494 of 23901 functions)
  • lines: 49.6% (60201 of 121435 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
84c2503 at 2024-10-04T10:07:55.974Z :recycle:

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Oct 2, 2024

@arpad-m This is a quick and dirty prototype, which just adds a prev_etag parameter to download() and returns DownloadError::Unmodified. I'd like to discuss the API before wrapping it up. I think my preference would be to:

  • Add a DownloadOpts struct passed into download() with an etag field.
  • In a follow-up PR, merge download_byte_range() into download() by adding DownloadOpts::byte_range.
  • Make the Download result an enum with variants e.g. Stream and Unmodified, instead of returning an error.

Alternatively, we can go with the current approach in this PR, or add a new download_cond(path, etag) -> Result<Option<Download>, DownloadError> method which returns None when unmodified.

@arpad-m
Copy link
Member

arpad-m commented Oct 2, 2024

DownloadOpts would also be useful for #8830, so +1 to that. Merging download_byte_range and download also makes a lot of sense.

Make the Download result an enum with variants e.g. Stream and Unmodified, instead of returning an error.

idk about that one, to me it doesn't seem worth the cost in the "happy" path where you just want to download a file.
We also have "not found" as a separate error kind, even though some people might just want to test whether some file is present or not. Errors are just values.

@erikgrinaker
Copy link
Contributor Author

Cool, I'll add DownloadOpts and wrap this up.

it doesn't seem worth the cost in the "happy" path where you just want to download a file

Yeah, I also came around to this -- it'd just be annoying in the common case. Will keep the error.

@arpad-m
Copy link
Member

arpad-m commented Oct 2, 2024

It would also be nice to have a test in common/tests.rs that checks both the head_object operation and the etag functionality (when I added head_object I haven't added any tests). something like:

  • do head_object for non-existent file -> error
  • create file with content A
  • do head_object again -> it should now exist and have an etag
  • do conditional download with the etag -> it should yield the error
  • overwrite file with content B
  • do conditional download with the old etag -> it should yield the new file
  • do head_object again -> compare the etag and changed times, they should mismatch (there should be probably be a sleep of 3-4 seconds somewhere to ensure that at least one second has passed because that's the resolution offered by S3)
  • do conditional download with the new etag -> should yield an error again

libs/remote_storage/src/lib.rs Outdated Show resolved Hide resolved
@erikgrinaker erikgrinaker force-pushed the erik/secondary-headmap-cond-get branch from f2a9940 to 531c848 Compare October 4, 2024 09:06
@erikgrinaker erikgrinaker marked this pull request as ready for review October 4, 2024 09:07
@erikgrinaker erikgrinaker requested a review from a team as a code owner October 4, 2024 09:07
@erikgrinaker erikgrinaker requested a review from jcsp October 4, 2024 09:07
@erikgrinaker
Copy link
Contributor Author

@arpad-m Added a test case too, should be ready to go now.

do head_object again -> it should now exist and have an etag

head_object doesn't provide the etag. We can address that separately if needed, I just used download() for now.

I'll submit a separate PR for a head_object test case.

I split out the download_byte_range() cleanup to #9260.

@erikgrinaker erikgrinaker merged commit 37158d0 into main Oct 4, 2024
79 checks passed
@erikgrinaker erikgrinaker deleted the erik/secondary-headmap-cond-get branch October 4, 2024 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: optimize heatmap downloads with S3 conditional GET
2 participants