Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test suite) virtual_file_io_engine and get_vectored_impl patametrization doesn't work #7113

Merged
merged 6 commits into from Mar 14, 2024

Conversation

problame
Copy link
Contributor

@problame problame commented Mar 13, 2024

Problem

While investigating #7124, I noticed that the benchmark was always using the DEFAULT_* virtual_file_io_engine , i.e., tokio-epoll-uring as of #7077.

The fundamental problem is that the control_plane code has its own view of PageServerConfig, which, I believe, will always be a subset of the real pageserver's pageserver/src/config.rs.

For the virtual_file_io_engine and get_vectored_impl parametrization of the test suite, we were constructing a dict on the Python side that contained these parameters, then handed it to control_plane::PageServerConfig's derived serde::Deserialize.
The default in serde is to ignore unknown fields, so, the Deserialize impl silently ignored the fields.
In consequence, the fields weren't propagated to the pageserver --init call, and the tests ended up using the pageserver/src/config.rs::DEFAULT_ values for the respective options all the time.

Tests that explicitly used overrides in env.pageserver.start() and similar were not affected by this.

But, it means that all the test suite runs where with parametrization didn't properly exercise the code path.

Changes

  • use serde(deny_unknown_fields) to expose the problem
    • With this change, the Python tests that override virtual_file_io_engine and
      get_vectored_impl fail on pageserver --init, exposing the problem.
  • use destructuring to uncover the issue in the future
  • fix the issue by adding the missing fields to the control_plane crate's PageServerConf
    • A better solution would be for control plane to re-use a struct provided
      by the pageserver crate, so that everything is in one place in
      pageserver/src/config.rs, but, our config parsing code is (almost)
      beyond repair anyways.
  • fix the pageserver_virtual_file_io_engine to be responsive to the env var
    • => required to make parametrization work in benchmarks

Testing

Before merging this PR, I re-ran the regression tests & CI with the full matrix of virtual_file_io_engine and tokio-epoll-uring, see 9c7ea36

With this change, the Python tests that override `virtual_file_io_engine` and
`get_vectored_impl` fail on `pageserver --init`, exposing the problem.
…verConf

A better solution would be for control plane to re-use a struct provided
by the pageserver crate, so that everything is in one place in
`pageserver/src/config.rs`, but, our config parsing code is (almost)
beyond repair anyways.
@problame problame marked this pull request as ready for review March 13, 2024 18:06
Copy link

github-actions bot commented Mar 13, 2024

2732 tests run: 2592 passed, 1 failed, 139 skipped (full report)


Failures on Postgres 14

  • test_bulk_insert[neon-github-actions-selfhosted]: release
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_bulk_insert[neon-release-pg14-github-actions-selfhosted]"
Flaky tests (1)

Postgres 15

  • test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

  • functions: 28.4% (7031 of 24721 functions)
  • lines: 47.2% (43470 of 92118 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
763c0b8 at 2024-03-14T11:26:28.864Z :recycle:

Copy link
Contributor

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this explains the "override is always specified" for remote storage which I couldn't believe.

@problame
Copy link
Contributor Author

Ok, this explains the "override is always specified" for remote storage which I couldn't believe.

Got more context on that?

@problame problame added the run-benchmarks Indicates to the CI that benchmarks should be run for PR marked with this label label Mar 14, 2024
@problame problame force-pushed the problame/testsuite-fix-ps_conf-parametrization branch from 9cfd09d to 9c7ea36 Compare March 14, 2024 07:42
@koivunej
Copy link
Contributor

Ok, this explains the "override is always specified" for remote storage which I couldn't believe.

Got more context on that?

I think it was from last weeks huddles with you and me. You were explaining how to run pageserver against in-memory minio, configuring it in the pageserver.toml did not work because neon_local always specifies the override, which I didn't find sensible.

Expanding the code more between changes explained to me why:

if !cli_overrides
.iter()
.any(|c| c.starts_with("remote_storage"))
{
overrides.push(format!(
"remote_storage={{local_path='../{PAGESERVER_REMOTE_STORAGE_DIR}'}}"
));
}

@problame
Copy link
Contributor Author

test_bulk_insert` benchmark failure is a known issue => #7124

@problame problame enabled auto-merge (squash) March 14, 2024 10:10
@problame problame merged commit 8075f09 into main Mar 14, 2024
55 of 56 checks passed
@problame problame deleted the problame/testsuite-fix-ps_conf-parametrization branch March 14, 2024 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmarks Indicates to the CI that benchmarks should be run for PR marked with this label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants