Proxy release 2024-03-07 #7041

vipvap · 2024-03-07T06:01:36Z

Proxy release 2024-03-07

Please merge this Pull Request using 'Create a merge commit' button

Usually RFC documents are not modified, but the vast mentions of "zenith" in early RFC documents make it desirable to update the product name to today's name, to avoid confusion. ## Problem Early RFC documents use the old "zenith" product name a lot, which is not something everyone is aware of after the product was renamed. ## Summary of changes Replace occurrences of "zenith" with "neon". Images are excluded. --------- Co-authored-by: Andreas Scherbaum <andreas@neon.tech>

## Problem The current implementation of `deploy-prod` workflow doesn't allow to run parallel deploys on Storage and Proxy. ## Summary of changes - Call `deploy-proxy-prod` workflow that deploys only Proxy components, and that can be run in parallel with `deploy-prod` for Storage.

As pointed out in the comments added in this PR: the in-memory state of the filesystem already has the layer file in its final place. If the fsync fails, but pageserver continues to execute, it's quite easy for subsequent pageserver code to observe the file being there and assume it's durable, when it really isn't. It can happen that we get ENOSPC during the fsync. However, 1. the timeline dir is small (remember, the big layer _file_ has already been synced). Small data means ENOSPC due to delayed allocation races etc are less likely. 2. what else are we going to do in that case? If we decide to bubble up the error, the file remains on disk. We could try to unlink it and fsync after the unlink. If that fails, we would _definitely_ need to error out. Is it worth the trouble though? Side note: all this logic about not carrying on after fsync failure implies that we `sync` the filesystem successfully before we restart the pageserver. We don't do that right now, but should (=> #6989) part of #6663

## Problem Typo ## Summary of changes Fix

…ync_all()` (#6986) Except for the involvement of the VirtualFile fd cache, this is equivalent to what happened before at runtime. Future PR #6378 will implement `VirtualFile::sync_all()` using tokio-epoll-uring if that's configured as the io engine. This PR is preliminary work for that. part of #6663

The template does not parse on GitHub

…6960 (#6999) This PR increases the `wait_until` timeout. These are where things became more flaky as of #6960. Most likely because it doubles the work in the `churn_while_failpoints_active_thread`. Slack context: https://neondb.slack.com/archives/C033RQ5SPDH/p1709554455962959?thread_ts=1709286362.850549&cid=C033RQ5SPDH

@bayandin

Fixes #6996 Thanks to @bayandin

pgbouncer 1.22.1 has been released > This release fixes issues caused by some clients using COPY FROM STDIN queries. Such queries could introduce memory leaks, performance regressions and prepared statement misbehavior. - NEWS: https://www.pgbouncer.org/2024/03/pgbouncer-1-22-1 - CHANGES: pgbouncer/pgbouncer@pgbouncer_1_22_0...pgbouncer_1_22_1 ## Summary of changes - vm-image: update pgbouncer from 1.22.0 to 1.22.1

tokio 1.36 has been out for a month. Release notes don't indicate major changes. Skimming through their issue tracker, I can't find open `C-bug` issues that would affect us. (My personal motivation for this is `JoinSet::try_join_next`.)

## Problem `cargo deny` fails - https://rustsec.org/advisories/RUSTSEC-2024-0019 - GHSA-r8w9-5wcg-vfj7 > The vulnerability is Windows-specific, and can only happen if you are using named pipes. Other IO resources are not affected. ## Summary of changes - Upgrade `mio` from 0.8.10 to 0.8.11 (`cargo update -p mio`)

## Problem Fix #6498 ## Summary of changes Only re-authenticate with zenith_admin if authentication fails. Otherwise, directly return the error message. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

part of #6663 See that epic for more context & related commits. Problem ------- Before this PR, the layer-file-creating code paths were using VirtualFile, but under the hood these were still blocking system calls. Generally this meant we'd stall the executor thread, unless the caller "knew" and used the following pattern instead: ``` spawn_blocking(|| { Handle::block_on(async { VirtualFile::....().await; }) }).await ``` Solution -------- This PR adopts `tokio-epoll-uring` on the layer-file-creating code paths in pageserver. Note that on-demand downloads still use `tokio::fs`, these will be converted in a future PR. Design: Avoiding Regressions With `std-fs` ------------------------------------------ If we make the VirtualFile write path truly async using `tokio-epoll-uring`, should we then remove the `spawn_blocking` + `Handle::block_on` usage upstack in the same commit? No, because if we’re still using the `std-fs` io engine, we’d then block the executor in those places where previously we were protecting us from that through the `spawn_blocking` . So, if we want to see benefits from `tokio-epoll-uring` on the write path while also preserving the ability to switch between `tokio-epoll-uring` and `std-fs` , where `std-fs` will behave identical to what we have now, we need to ***conditionally* use `spawn_blocking + Handle::block_on`** . I.e., in the places where we use that know, we’ll need to make that conditional based on the currently configured io engine. It boils down to investigating all the places where we do `spawn_blocking(... block_on(... VirtualFile::...))`. Detailed [write-up of that investigation in Notion](https://neondatabase.notion.site/Surveying-VirtualFile-write-path-usage-wrt-tokio-epoll-uring-integration-spawn_blocking-Handle-bl-5dc2270dbb764db7b2e60803f375e015?pvs=4 ), made publicly accessible. tl;dr: Preceding PRs addressed the relevant call sites: - `metadata` file: turns out we could simply remove it (#6777, #6769, #6775) - `create_delta_layer()`: made sensitive to `virtual_file_io_engine` in #6986 NB: once we are switched over to `tokio-epoll-uring` everywhere in production, we can deprecate `std-fs`; to keep macOS support, we can use `tokio::fs` instead. That will remove this whole headache. Code Changes In This PR ----------------------- - VirtualFile API changes - `VirtualFile::write_at` - implement an `ioengine` operation and switch `VirtualFile::write_at` to it - `VirtualFile::metadata()` - curiously, we only use it from the layer writers' `finish()` methods - introduce a wrapper `Metadata` enum because `std::fs::Metadata` cannot be constructed by code outside rust std - `VirtualFile::sync_all()` and for completeness sake, add `VirtualFile::sync_data()` Testing & Rollout ----------------- Before merging this PR, we ran the CI with both io engines. Additionally, the changes will soak in staging. We could have a feature gate / add a new io engine `tokio-epoll-uring-write-path` to do a gradual rollout. However, that's not part of this PR. Future Work ----------- There's still some use of `std::fs` and/or `tokio::fs` for directory namespace operations, e.g. `std::fs::rename`. We're not addressing those in this PR, as we'll need to add the support in tokio-epoll-uring first. Note that rename itself is usually fast if the directory is in the kernel dentry cache, and only the fsync after rename is slow. These fsyncs are using tokio-epoll-uring, so, the impact should be small.

`std` has had `pin!` macro for some time, there is no need for us to use the older alternatives. Cannot disallow `tokio::pin` because tokio macros use that.

Before this PR, the layer file download code would fsync the inode after rename instead of the timeline directory. That is not in line with what a comment further up says we're doing, and it's obviously not achieving the goal of making the rename durable. part of #6663

## Problem The value reconstruct of AUX_FILES_KEY from records is not deterministic since it uses a hash map under the hood. This caused vectored get validation failures when enabled in staging. ## Summary of changes Deserialise AUX_FILES_KEY blobs comparing. All other keys should reconstruct deterministically, so we simply compare the blobs.

## Problem Last weeks enablement of vectored get generated a number of panics. From them, I diagnosed two issues in the delta layer index traversal logic 1. The `key >= range.start && lsn >= lsn_range.start` was too aggressive. Lsns are not monotonically increasing in the delta layer index (keys are though), so we cannot assert on them. 2. Lsns greater or equal to `lsn_range.end` were not skipped. This caused the query to consider records newer than the request Lsn. ## Summary of changes * Fix the issues mentioned above inline * Refactor the layer traversal logic to make it unit testable * Add unit test which reproduces the failure modes listed above.

… metrics + regression test (#6953) part of #5899 Problem ------- Before this PR, the time spent waiting on the throttle was charged towards the higher-level page_service metrics, i.e., `pageserver_smgr_query_seconds`. The metrics are the foundation of internal SLIs / SLOs. A throttled tenant would cause the SLI to degrade / SLO alerts to fire. Changes ------- - don't charge time spent in throttle towards the page_service metrics - record time spent in throttle in RequestContext and subtract it from the elapsed time - this works because the page_service path doesn't create child context, so, all the throttle time is recorded in the parent - it's quite brittle and will break if we ever decide to spawn child tasks that need child RequestContexts, which would have separate instances of the `micros_spent_throttled` counter. - however, let's punt that to a more general refactoring of RequestContext - add a test case that ensures that - throttling happens for getpage requests; this aspect of the test passed before this PR - throttling delays aren't charged towards the page_service metrics; this aspect of the test only passes with this PR - drive-by: make the throttle log message `info!`, it's an expected condition Performance ----------- I took the same measurements as in #6706 , no meaningful change in CPU overhead. Future Work ----------- This PR enables us to experiment with the throttle for select tenants without affecting the SLI metrics / triggering SLO alerts. Before declaring this feature done, we need more work to happen, specifically: - decide on whether we want to retain the flexibility of throttling any `Timeline::get` call, filtered by TaskKind - versus: separate throttles for each page_service endpoint, potentially with separate config options - the trouble here is that this decision implies changes to the TenantConfig, so, if we start using the current config style now, then decide to switch to a different config, it'll be a breaking change Nice-to-haves but probably not worth the time right now: - Equivalent tests to ensure the throttle applies to all other page_service handlers.

## Problem Not really a problem. Improving visibility around redis communication. ## Summary of changes Added metric on the number of broken messages.

## Problem ref #6188 ## Summary of changes This pull request fixes `-Wmissing-prototypes` for the neon extension. Note that (1) the gcc version in CI and macOS is different, therefore some of the warning does not get reported when developing the neon extension locally. (2) the CI env variable `COPT = -Werror` does not get passed into the docker build process, therefore warnings are not treated as errors on CI. https://github.com/neondatabase/neon/blob/e62baa97041e10ce45772b3724e24e679a650d69/.github/workflows/build_and_test.yml#L22 There will be follow-up pull requests on solving other warnings. By the way, I did not figure out the default compile parameters in the CI env, and therefore this pull request is tested by manually adding `-Wmissing-prototypes` into the `COPT`. Signed-off-by: Alex Chi Z <chi@neon.tech>

Minor non-functional improvements to tiered compaction, mostly consisting of comment fixes. Followup of #6830, part of #6768

## Problem Branch/project and coldStart were not populated to data events. ## Summary of changes Populate it. Also added logging for the coldstart info.

The test is flaky due to #7006.

Moves some of the (legacy) compaction code to compaction.rs. No functional changes, just moves of code. Before, compaction.rs was only for the new tiered compaction mechanism, now it's for both the old and new mechanisms. Part of #6768

## Problem If large numbers of shards are attached to a pageserver concurrently, for example after another node fails, it can cause excessive I/O queue depths due to all the newly attached shards trying to calculate logical sizes concurrently. #6907 added the `lazy` flag to handle this. ## Summary of changes - Use `lazy=true` from all /location_config calls in the storage controller Reconciler.

## Problem - The storage controller is the source of truth for a tenant's stripe size, but doesn't currently have a way to propagate that to compute: we're just using the default stripe size everywhere. Closes: #6903 ## Summary of changes - Include stripe size in `ComputeHookNotifyRequest` - Include stripe size in `LocationConfigResponse` The stripe size is optional: it will only be advertised for multi-sharded tenants. This enables the controller to defer the choice of stripe size until we split a tenant for the first time.

This pull request mitigates #6969, but the longer-term problem is that we cannot properly stop Postgres if there is a subscription. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

## Problem Collection of small changes, batched together to reduce CI overhead. ## Summary of changes - Layer download messages include size -- this is useful when watching a pageserver hydrate its on disk cache in the log. - Controller migrate API could put an invalid NodeId into TenantState - Scheduling errors during tenant create could result in creating some shards and not others. - Consistency check could give hard-to-understand failures in tests if a reconcile was in process: explicitly fail the check if reconciles are in progress instead.

#7022) ## Problem Closes https://github.com/neondatabase/neon/security/dependabot/56 Supersedes #7013 Workflow run: https://github.com/neondatabase/neon/actions/runs/8157302480 ## Summary of changes - Update client libs for `test_runner/pg_clients` to their latest versions

## Problem Fix #7003. Fix #6982. Currently, neon extension is only upgraded when new compute spec gets applied, for example, when creating a new role or creating a new database. This also resolves `neon.lfc_stat` not found warnings in prod. ## Summary of changes This pull request adds the logic to spawn a background thread to upgrade the neon extension version if the compute is a primary. If for whatever reason the upgrade fails, it reports an error to the console and does not impact compute node state. This change can be further applied to 3rd-party extension upgrades. We can silently upgrade the version of 3rd party extensions in the background in the future. Questions: * Does alter extension takes some kind of lock that will block user requests? * Does `ALTER EXTENSION` writes to the database if nothing needs to be upgraded? (may impact storage size). Otherwise it's safe to land this pull request. Signed-off-by: Alex Chi Z <chi@neon.tech>

github-actions · 2024-03-07T06:42:54Z

2490 tests run: 2369 passed, 0 failed, 121 skipped (full report)

Flaky tests (3)

Postgres 16

test_multi_attach: release

Postgres 14

test_pageserver_small_inmemory_layers[True]: release
test_timeline_size_quota_on_startup: release

Code coverage* (full report)

functions: 28.8% (6993 of 24311 functions)
lines: 47.3% (43016 of 90873 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
d03ec9d at 2024-03-07T21:19:35.605Z :recycle:}

## Problem Some HTTP-specific TLS errors ## Summary of changes Add more logging, vendor `tls-listener` with minor modifications.

## Problem We attempted validation for cancelled errors under the assumption that if vectored get fails, sequential get will too. That's not right 100% of times though because sequential get may have the values cached and slip them through even when shutting down. ## Summary of changes Don't validate if either search impl failed due to tenant shutdown.

andreasscherbaum and others added 30 commits March 4, 2024 13:02

Fix type (#6998)

e1c032f

## Problem Typo ## Summary of changes Fix

fix epic issue template (#6920)

e938bb8

The template does not parse on GitHub

Update postgres-exporter to v0.12.1 (#7004)

0d2395f

Fixes #6996 Thanks to @bayandin

upgrade tokio 1.34 => 1.36 (#7008)

e62baa9

tokio 1.36 has been out for a month. Release notes don't indicate major changes. Skimming through their issue tracker, I can't find open `C-bug` issues that would affect us. (My personal motivation for this is `JoinSet::try_join_next`.)

compute_ctl: only try zenith_admin if could not authenticate (#6955)

b7db912

## Problem Fix #6498 ## Summary of changes Only re-authenticate with zenith_admin if authentication fails. Otherwise, directly return the error message. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

build: clippy disallow futures::pin_mut macro (#7016)

752bf5a

`std` has had `pin!` macro for some time, there is no need for us to use the older alternatives. Cannot disallow `tokio::pin` because tokio macros use that.

proxy: report redis broken message metric (#7021)

bdbb2f4

## Problem Not really a problem. Improving visibility around redis communication. ## Summary of changes Added metric on the number of broken messages.

Minor improvements to tiered compaction (#7020)

e69a255

Minor non-functional improvements to tiered compaction, mostly consisting of comment fixes. Followup of #6830, part of #6768

proxy: fix bug with populating the data (#7023)

15b3665

## Problem Branch/project and coldStart were not populated to data events. ## Summary of changes Populate it. Also added logging for the coldstart info.

test: disable large slru basebackup bench in ci (#7025)

2daa2f1

The test is flaky due to #7006.

fixup(#6991): it broke the macOS build (#7024)

eacdc17

Move compaction code to compaction.rs (#7026)

2f88e7a

Moves some of the (legacy) compaction code to compaction.rs. No functional changes, just moves of code. Before, compaction.rs was only for the new tiered compaction mechanism, now it's for both the old and new mechanisms. Part of #6768

fix(test): drop subscription when test completes (#6975)

5dc2088

This pull request mitigates #6969, but the longer-term problem is that we cannot properly stop Postgres if there is a subscription. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>

vipvap requested review from a team as code owners March 7, 2024 06:01

vipvap requested review from petuhovskiy, conradludgate, VladLazar and ololobus and removed request for a team March 7, 2024 06:01

conradludgate and others added 2 commits March 7, 2024 12:36

proxy http tls investigations (#7045)

c2876ec

## Problem Some HTTP-specific TLS errors ## Summary of changes Add more logging, vendor `tls-listener` with minor modifications.

conradludgate force-pushed the rc/proxy/2024-03-07 branch from a9c1d7e to d03ec9d Compare March 7, 2024 13:37

conradludgate approved these changes Mar 7, 2024

View reviewed changes

conradludgate merged commit e1b60f3 into release-proxy Mar 8, 2024
105 checks passed

conradludgate deleted the rc/proxy/2024-03-07 branch March 8, 2024 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxy release 2024-03-07 #7041

Proxy release 2024-03-07 #7041

vipvap commented Mar 7, 2024

github-actions bot commented Mar 7, 2024 •

edited

Postgres 16

Postgres 14

Proxy release 2024-03-07 #7041

Proxy release 2024-03-07 #7041

Conversation

vipvap commented Mar 7, 2024