Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use less neon_local --pageserver-config-override / pageserver -c #7613

Merged
merged 22 commits into from
May 6, 2024

Conversation

problame
Copy link
Contributor

@problame problame commented May 4, 2024

Background

The pageserver binary provides a -c / --config-override flag.
During PS startup, the argument is parsed as TOML and merged non-recursively into the in-memory representation of pageserver.toml.
The merge result is then turned into the global PageServerConfig (parse_and_validate()).

The pageserver binary provides an --init flag.
It is used by neon_local init to write out the initial pageserver.toml.
I.e., neon_local init under the hood calls pageserver --init.

neon_local init uses the pageserver binary's --config-override flag to customize the pageserver.toml that is written by pageserver --init.
I.e., neon_local init under the hood calls pageserver --init --config-override ....

Last, neon_local init itself has a --pageserver-config-override, allowing neon_local users to pass through additional overrides to the pageserver binary invocation.
I.e., neon_local init --pageserver-config-override X maps to pageserver --init --config-override X.

The neon_local start also has a --pageserver-config-override flag, which is likewise passed through to the pageserver binary.

Problem

neon_local uses --pageserver-config-override to propagate some configuration values from its .neon/config file.
For example, it does that for pg_distrib_dir or the default remote_storage settings.

Presumably, the idea back in the days was that this allows changing variables like listen IP and port "centrally" in the .neon/config, and then a simple neon_local stop && neon_local start` would suffice to make the effect happen.

I argue that we should remove this functionality, and this PR takes the first step.
The rationale is that, while it's a nice convenience feature, I think it's very rarely used in practice, and thus it does not justify the maintenance burden / cognitive complexity.

Additional arguments in favor of removal:

  1. The functionality means that parts (but not all!) fields of the Pageserver config are duplicated into .neon/config. It's not a generic mechansim. New code has to be written for each field.
  2. When changing .neon/config + neon_local stop && neon_local_start, the change to .neon/configgets applied *at runtime*, butpageserver.tomlon disk doesn't get updated. One has to inspect the pageserver command line to understand what's going on, e.g., usingps -ef | grep pageserver`.

Changes

This PR makes changes to the test suite and neon_local such that --pageserver-config-override is used only during NeonCli.init / neon_local init.

As pointed out in the "Background" section, any overrides specified during init are stored in the pageserver.toml.
This means they don't need to repeated when starting the Pageserver with NeonCli.pageserver_start / neon_local start / neon_local pageserver start.

This is a BREAKING CHANGE with regard to neon_local semantics in the sense that the variables in the .neon/config pertaining Pageserver are no longer the source of truth. Instead, they are mere copies that, if there's a need to change them, must be changed in both places.

neon_local changes:

  • In PageServerNode::start_node, only pass through the --pageserver-config-override flags provided on the command line, not the ones derived from .neon/config.
  • This means neon_local_overrides() is now only used by PageServerNode::pageserver_init(). We don't inline it because it bloats the diff in an ugly way, can be done in a future commit.
  • When spawning the pageserver binary, use --config-override instead of -c for better greppability.

Test suite changes:

  • remove call to append_pageserver_param_overrides from NeonCli.pageserver_start()
  • inline append_pageserver_param_overrides into NeonCli.init()
  • env var NEON_PAGESERVER_OVERRIDES is no longer supported (it's not used in any committed code in the GitHub Neon org anyways)

Future Work

Item A: In a following PR, I will remove the remaining use of neon_local --pageserver-config-overrides in the test suite, replacing it with simple rewriting of the pageserver.toml.

Item B: The neon_local's PageServerNode::init() can do the TOML patching itself, instead of using the clunky pageserver --config-override
That is the ultimate goal of my PR series: removing --config-override entirely.

Copy link

github-actions bot commented May 4, 2024

2904 tests run: 2784 passed, 0 failed, 120 skipped (full report)


Flaky tests (4)

Postgres 16

  • test_download_remote_layers_api: debug

Postgres 15

  • test_partial_evict_tenant[relative_spare]: release
  • test_download_remote_layers_api: debug
  • test_delete_tenant_exercise_crash_safety_failpoints[real_s3-tenant-delete-before-remove-timelines-dir-False-Check.RETRY_WITH_RESTART]: release

Code coverage* (full report)

  • functions: 31.4% (6236 of 19863 functions)
  • lines: 47.1% (46722 of 99246 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
35b23cb at 2024-05-06T17:56:22.283Z :recycle:

…age during init, not start"

This reverts commit 511f593.
@problame problame changed the title refactor(test_suite): rely less on --pageserver-config-override outside of neon_local init refactor: use less --pageserver-config-override May 4, 2024
@problame problame changed the title refactor: use less --pageserver-config-override refactor!: use less --pageserver-config-override May 5, 2024
@problame problame changed the title refactor!: use less --pageserver-config-override use less neon_local --pageserver-config-override / pageserver -c May 5, 2024
Base automatically changed from problame/remove-pageserver-update-config-flag to main May 6, 2024 14:40
@problame problame marked this pull request as ready for review May 6, 2024 17:31
@problame problame requested a review from a team as a code owner May 6, 2024 17:31
@problame problame requested a review from skyzh May 6, 2024 17:31
@problame problame merged commit ac7dc82 into main May 6, 2024
55 checks passed
@problame problame deleted the problame/test-suite-narrow-pageserver-config-override branch May 6, 2024 20:31
problame added a commit that referenced this pull request May 7, 2024
Preceding PR #7613 reduced the
usage of `--pageserver-config-override`.

This PR builds on top of that work and fully removes the `neon_local
--pageserver-config-override`.

Tests that need a non-default `pageserver.toml` control it using two
options:

1. Specify `NeonEnvBuilder.pageserver_config_override` before
`NeonEnvBuilder.init_start()`. This uses a new `neon_local init
--pageserver-config` flag.
2. After `init_start()`: `env.pageserver.stop()` +
`NeonPageserver.edit_config_toml()` + `env.pageserver.start()`

A few test cases were using
`env.pageserver.start(overrides=("--pageserver-config-override...",))`.
I changed them to use one of the options above. 

Future Work
-----------

The `neon_local init --pageserver-config` flag still uses `pageserver
--config-override` under the hood. In the future, neon_local should just
write the `pageserver.toml` directly.

The `NeonEnvBuilder.pageserver_config_override` field should be renamed
to `pageserver_initial_config`. Let's save this churn for a separate
refactor commit.
conradludgate pushed a commit that referenced this pull request May 8, 2024
Preceding PR #7613 reduced the
usage of `--pageserver-config-override`.

This PR builds on top of that work and fully removes the `neon_local
--pageserver-config-override`.

Tests that need a non-default `pageserver.toml` control it using two
options:

1. Specify `NeonEnvBuilder.pageserver_config_override` before
`NeonEnvBuilder.init_start()`. This uses a new `neon_local init
--pageserver-config` flag.
2. After `init_start()`: `env.pageserver.stop()` +
`NeonPageserver.edit_config_toml()` + `env.pageserver.start()`

A few test cases were using
`env.pageserver.start(overrides=("--pageserver-config-override...",))`.
I changed them to use one of the options above. 

Future Work
-----------

The `neon_local init --pageserver-config` flag still uses `pageserver
--config-override` under the hood. In the future, neon_local should just
write the `pageserver.toml` directly.

The `NeonEnvBuilder.pageserver_config_override` field should be renamed
to `pageserver_initial_config`. Let's save this churn for a separate
refactor commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants