Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement live reconfiguration in the compute_ctl, rebased #3964

Closed

Conversation

hlinnaka
Copy link
Contributor

@hlinnaka hlinnaka commented Apr 5, 2023

This is PR #3923, rebased over #3938.

I tried to refrain from making any non-trivial changes, but here are a few notable ones:

  • ComputeState was quite confusing, as it was used both to define the format of the JSON that the /get_status API returned, and to hold other mutable state of a ComputeNode. I refactored it so that ComputeState defines the format of /get_status, like on 'main', and there's another struct, ComputeNodeInner, which holds all the mutable state.
  • I created yet another struct, ParsedSpec, which encapsulates ComputeSpec and the fields derived from it.

hlinnaka and others added 7 commits April 5, 2023 19:49
…rate.

This is in preparation of using compute_ctl to launch postgres nodes
in the neon_local control plane. And seems like a good idea to
separate the public interfaces anyway.

One non-mechanical change here is that we now use a RwLock rather than
atomics to protect the ComputeNode::metrics field. We were not using
atomics for performance but for convenience here, and an RwLock is now
more convenient.
We use the term "endpoint" in for compute Postgres nodes in the web UI
and user-facing documentation now. Adjust the nomenclature in the code.

This changes the name of the "neon_local pg" command to "neon_local
endpoint". Also adjust names of classes, variables etc. in the python
tests accordingly.

This also changes the directory structure so that endpoints are now
stored in:

    .neon/endpoints/<endpoint id>

instead of:

    .neon/pgdatadirs/tenants/<tenant_id>/<endpoint (node) name>

The tenant ID is no longer part of the path. That means that you
cannot have two endpoints with the same name/ID in two different
tenants anymore. That's consistent with how we treat endpoints in the
real control plane and proxy: the endpoint ID must be globally unique.
Accept spec in JSON format and request compute reconfiguration from
the configurator thread. If anything goes wrong after we set the
compute state to `ConfigurationPending` and / or sent spec to the
configurator thread, we basically leave compute in the potentially
wrong state. That said, it's control-plane's responsibility to
watch compute state after reconfiguration request and to clean
restart it in case of errors.

It still lacks ability of starting up without spec and some validations,
i.e. that live reconfiguration should be only available with
`--compute-id` and `--control-plane-uri` options.

Otherwise, it works fine and could be tested by running `compute_ctl`
locally, then sending it a new spec:
```shell
curl -d "$(cat ./compute-spec-new.json)" http://localhost:3080/spec
```

We have one configurator thread and async http server, so generally we
have single consumer - multiple producers pattern here. That's why we
use `mpsc` channel, not `tokio::sync::watch`. Actually, concurrency of
producers is limited to one due to code logic, but we still need an
ability to potentially pass `Sender` to several threads.

Next, we use async `hyper` + `tokio` http server, but all the other code
is completely synchronous. So we need to send data from async to sync,
that's why we use `mpsc::unbounded_channel` here, not `mpsc::channel`.
It doesn't make much sense to rewrite all code to async now, but we can
consider doing this in the future.

I think that a combination of `Mutex` and `CondVar` would work just fine
too, but as we already have `tokio`, I decided to try something from it.
With this commit one can start compute with something like
```shell
cargo run --bin compute_ctl -- -i no-compute \
 -p http://localhost:9095 \
 -D compute_pgdata \
 -C "postgresql://cloud_admin@127.0.0.1:5434/postgres" \
 -b ./pg_install/v15/bin/postgres
```
and it will hang waiting for spec.

Then send one spec
```shell
curl -d "$(cat ./compute-spec.json)" http://localhost:3080/spec
```
Postgres will be started and configured.

Then reconfigure it with
```shell
curl -d "$(cat ./compute-spec-new.json)" http://localhost:3080/spec
```

Most of safeguards and comments are added. Some polishing especially
around HTTP API is still needed.
@hlinnaka hlinnaka requested a review from a team as a code owner April 5, 2023 20:46
@hlinnaka hlinnaka requested review from piercypixel, ololobus and a team and removed request for a team April 5, 2023 20:46
@github-actions
Copy link

github-actions bot commented Apr 5, 2023

Test results for 89cc2c5:


debug build: 212 tests run: 202 passed, 0 failed, 10 (full report)


release build: 212 tests run: 202 passed, 0 failed, 10 (full report)


Comment on lines 61 to 64
pub state_changed: Condvar,
}

pub struct ComputeNodeInner {
Copy link
Member

@ololobus ololobus Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate state_changed looks good to me. I'm not sure about additional ComputeNodeInner, it's still a volatile compute state, but this

ComputeState was quite confusing, as it was used both to define the format of the JSON that the /get_status API returned, and to hold other mutable state of a ComputeNode.

is indeed a good point. Yet, I'd maybe solve it in a bit different way -- define GET /status response as a separate structure, which can be built from ComputeState. It should eliminate confusion and will be generally cleaner than currently in the main. I've already defined struct for error response

About metrics, I'm 100% +1 for moving them under the same lock, but as you properly pointed out in the original PR, let's keep subset of changes smaller to simplify review

If no objections, I'd incorporate mentioned above changes into #3963

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet, I'd maybe solve it in a bit different way -- define GET /status response as a separate structure, which can be built from ComputeState.

Works for me

@@ -12,6 +15,44 @@ use serde_json;
use tracing::{error, info};
use tracing_utils::http::OtelName;

async fn handle_spec_request(req: Request<Body>, compute: &Arc<ComputeNode>) -> Result<(), (String, StatusCode)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this in a similar way too, thanks. Also tweaked waiting part to spawn a separate blocking thread with tokio::task::spawn_blocking to do not block main pool of worker threads and serve status requests. I didn't expect it to be a problem, but it seems to be a safer approach

@hlinnaka hlinnaka requested review from a team, MMeent and shanyp and removed request for a team April 6, 2023 20:28
@ololobus
Copy link
Member

ololobus commented Apr 7, 2023

Separated the last part of the original PR into #3980, so I think we can close this one

ansrivas pushed a commit to ansrivas/neon that referenced this pull request Apr 11, 2023
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.7.0 to
3.7.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/aio-libs/aiohttp/releases">aiohttp's
releases</a>.</em></p>
<blockquote>
<h2>aiohttp 3.7.3 release</h2>
<h2>Features</h2>
<ul>
<li>Use Brotli instead of brotlipy
<code>[neondatabase#3803](aio-libs/aiohttp#3803)
&lt;https://github.com/aio-libs/aiohttp/issues/3803&gt;</code>_</li>
<li>Made exceptions pickleable. Also changed the repr of some
exceptions.
<code>[neondatabase#4077](aio-libs/aiohttp#4077)
&lt;https://github.com/aio-libs/aiohttp/issues/4077&gt;</code>_</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>Raise a ClientResponseError instead of an AssertionError for a blank
HTTP Reason Phrase.
<code>[neondatabase#3532](aio-libs/aiohttp#3532)
&lt;https://github.com/aio-libs/aiohttp/issues/3532&gt;</code>_</li>
<li>Fix <code>web_middlewares.normalize_path_middleware</code> behavior
for patch without slash.
<code>[neondatabase#3669](aio-libs/aiohttp#3669)
&lt;https://github.com/aio-libs/aiohttp/issues/3669&gt;</code>_</li>
<li>Fix overshadowing of overlapped sub-applications prefixes.
<code>[neondatabase#3701](aio-libs/aiohttp#3701)
&lt;https://github.com/aio-libs/aiohttp/issues/3701&gt;</code>_</li>
<li>Make <code>BaseConnector.close()</code> a coroutine and wait until
the client closes all connections. Drop deprecated &quot;with
Connector():&quot; syntax.
<code>[neondatabase#3736](aio-libs/aiohttp#3736)
&lt;https://github.com/aio-libs/aiohttp/issues/3736&gt;</code>_</li>
<li>Reset the <code>sock_read</code> timeout each time data is received
for a <code>aiohttp.client</code> response.
<code>[neondatabase#3808](aio-libs/aiohttp#3808)
&lt;https://github.com/aio-libs/aiohttp/issues/3808&gt;</code>_</li>
<li>Fixed type annotation for add_view method of UrlDispatcher to accept
any subclass of View
<code>[neondatabase#3880](aio-libs/aiohttp#3880)
&lt;https://github.com/aio-libs/aiohttp/issues/3880&gt;</code>_</li>
<li>Fixed querying the address families from DNS that the current host
supports.
<code>[neondatabase#5156](aio-libs/aiohttp#5156)
&lt;https://github.com/aio-libs/aiohttp/issues/5156&gt;</code>_</li>
<li>Change return type of MultipartReader.<strong>aiter</strong>() and
BodyPartReader.<strong>aiter</strong>() to AsyncIterator.
<code>[neondatabase#5163](aio-libs/aiohttp#5163)
&lt;https://github.com/aio-libs/aiohttp/issues/5163&gt;</code>_</li>
<li>Provide x86 Windows wheels.
<code>[neondatabase#5230](aio-libs/aiohttp#5230)
&lt;https://github.com/aio-libs/aiohttp/issues/5230&gt;</code>_</li>
</ul>
<h2>Improved Documentation</h2>
<ul>
<li>Add documentation for <code>aiohttp.web.FileResponse</code>.
<code>[neondatabase#3958](aio-libs/aiohttp#3958)
&lt;https://github.com/aio-libs/aiohttp/issues/3958&gt;</code>_</li>
<li>Removed deprecation warning in tracing example docs
<code>[neondatabase#3964](aio-libs/aiohttp#3964)
&lt;https://github.com/aio-libs/aiohttp/issues/3964&gt;</code>_</li>
<li>Fixed wrong &quot;Usage&quot; docstring of
<code>aiohttp.client.request</code>.
<code>[neondatabase#4603](aio-libs/aiohttp#4603)
&lt;https://github.com/aio-libs/aiohttp/issues/4603&gt;</code>_</li>
<li>Add aiohttp-pydantic to third party libraries
<code>[neondatabase#5228](aio-libs/aiohttp#5228)
&lt;https://github.com/aio-libs/aiohttp/issues/5228&gt;</code>_</li>
</ul>
<h2>Misc</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst">aiohttp's
changelog</a>.</em></p>
<blockquote>
<h1>3.7.4 (2021-02-25)</h1>
<h2>Bugfixes</h2>
<ul>
<li>
<p><strong>(SECURITY BUG)</strong> Started preventing open redirects in
the
<code>aiohttp.web.normalize_path_middleware</code> middleware. For
more details, see
<a
href="https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg">https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg</a>.</p>
<p>Thanks to <code>Beast Glatisant
&lt;https://github.com/g147&gt;</code>__ for
finding the first instance of this issue and <code>Jelmer Vernooij
&lt;https://jelmer.uk/&gt;</code>__ for reporting and tracking it down
in aiohttp.
<code>[neondatabase#5497](aio-libs/aiohttp#5497)
&lt;https://github.com/aio-libs/aiohttp/issues/5497&gt;</code>_</p>
</li>
<li>
<p>Fix interpretation difference of the pure-Python and the Cython-based
HTTP parsers construct a <code>yarl.URL</code> object for HTTP
request-target.</p>
<p>Before this fix, the Python parser would turn the URI's absolute-path
for <code>//some-path</code> into <code>/</code> while the Cython code
preserved it as
<code>//some-path</code>. Now, both do the latter.
<code>[neondatabase#5498](aio-libs/aiohttp#5498)
&lt;https://github.com/aio-libs/aiohttp/issues/5498&gt;</code>_</p>
</li>
</ul>
<hr />
<h1>3.7.3 (2020-11-18)</h1>
<h2>Features</h2>
<ul>
<li>Use Brotli instead of brotlipy
<code>[neondatabase#3803](aio-libs/aiohttp#3803)
&lt;https://github.com/aio-libs/aiohttp/issues/3803&gt;</code>_</li>
<li>Made exceptions pickleable. Also changed the repr of some
exceptions.
<code>[neondatabase#4077](aio-libs/aiohttp#4077)
&lt;https://github.com/aio-libs/aiohttp/issues/4077&gt;</code>_</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>Raise a ClientResponseError instead of an AssertionError for a blank
HTTP Reason Phrase.
<code>[neondatabase#3532](aio-libs/aiohttp#3532)
&lt;https://github.com/aio-libs/aiohttp/issues/3532&gt;</code>_</li>
<li>Fix <code>web_middlewares.normalize_path_middleware</code> behavior
for patch without slash.
<code>[neondatabase#3669](aio-libs/aiohttp#3669)
&lt;https://github.com/aio-libs/aiohttp/issues/3669&gt;</code>_</li>
<li>Fix overshadowing of overlapped sub-applications prefixes.
<code>[neondatabase#3701](aio-libs/aiohttp#3701)
&lt;https://github.com/aio-libs/aiohttp/issues/3701&gt;</code>_</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/aio-libs/aiohttp/commit/0a26acc1de9e1b0244456b7881ec16ba8bb64fc3"><code>0a26acc</code></a>
Bump aiohttp to v3.7.4 for a security release</li>
<li><a
href="https://github.com/aio-libs/aiohttp/commit/021c416c18392a111225bc7326063dc4a99a5138"><code>021c416</code></a>
Merge branch 'GHSA-v6wp-4m6f-gcjg' into master</li>
<li><a
href="https://github.com/aio-libs/aiohttp/commit/4ed7c25b537f71c6245bb74d6b20e5867db243ab"><code>4ed7c25</code></a>
Bump chardet from 3.0.4 to 4.0.0 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5333">#5333</a>)</li>
<li><a
href="https://github.com/aio-libs/aiohttp/commit/b61f0fdffc887df24244ba7bdfe8567c580240ff"><code>b61f0fd</code></a>
Fix how pure-Python HTTP parser interprets <code>//</code></li>
<li><a
href="https://github.com/aio-libs/aiohttp/commit/5c1efbc32c46820250bd25440bb7ea96cb05abe9"><code>5c1efbc</code></a>
Bump pre-commit from 2.9.2 to 2.9.3 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5322">#5322</a>)</li>
<li><a
href="https://github.com/aio-libs/aiohttp/commit/007507580137efcc0a20391a0792f39b337d9c1a"><code>0075075</code></a>
Bump pygments from 2.7.2 to 2.7.3 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5318">#5318</a>)</li>
<li><a
href="https://github.com/aio-libs/aiohttp/commit/5085173d947e6cc01b6daf1aa48fe7698834c569"><code>5085173</code></a>
Bump multidict from 5.0.2 to 5.1.0 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5308">#5308</a>)</li>
<li><a
href="https://github.com/aio-libs/aiohttp/commit/5d1a75e68d278c641c90021409f4eb5de1810e5e"><code>5d1a75e</code></a>
Bump pre-commit from 2.9.0 to 2.9.2 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5290">#5290</a>)</li>
<li><a
href="https://github.com/aio-libs/aiohttp/commit/6724d0e7a944fd7e3a710dc292d785fa8fe424fd"><code>6724d0e</code></a>
Bump pre-commit from 2.8.2 to 2.9.0 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5273">#5273</a>)</li>
<li><a
href="https://github.com/aio-libs/aiohttp/commit/c688451ce31b914c71b11d2ac6c326b0c87e6d1f"><code>c688451</code></a>
Removed duplicate timeout parameter in ClientSession reference docs. (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5262">#5262</a>)
...</li>
<li>Additional commits viewable in <a
href="https://github.com/aio-libs/aiohttp/compare/v3.7.0...v3.7.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=aiohttp&package-manager=pip&previous-version=3.7.0&new-version=3.7.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the
default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as
the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as
the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the
default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/neondatabase/neon/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Vadim Kharitonov <vadim@neon.tech>
@hlinnaka
Copy link
Contributor Author

These changes have been done in other PRs, closing.

@hlinnaka hlinnaka closed this Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants