Skip to content

Add Remote Keypair Loader, a new keypair hotloading mechanism #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 24, 2023

Conversation

drozdziak1
Copy link
Contributor

@drozdziak1 drozdziak1 commented Mar 22, 2023

How it works

We open two POST endpoints, /primary/load_keypair and secondary/load_keypair on a configurable address and port. Each takes a regular Solana keypair in json ([1, 241, 24, [...], 3]). We check that the account holds the configured minimum SOL balance and make it available in the system if the check passes. The endpoints only have an effect if a publisher keypair is not a valid file path on startup. This can be achieved by specifying a name like "". A warning is printed in logs when that happens, informing about the service waiting for the key to that network. If the keypair is valid, it is used at all times like before, ignoring the remote key loader completely.

Complexity

This new system lets you change the key of either network as many times as you wish. This has complicated the design noticeably, in comparison to something like a busy loop waiting for a single upload request before instantiating other components of the system. Here's my counter-points:

  • If an admin uploads the wrong key, they'll need to restart the service, possibly crippling the uptime on the other network (e.g. they botch primary network upload, while secondary is working fine)
  • If we ever split the keypairs into separate payer and identity (good practice IMO), swapping the payer mid flight will be much easier to implement, because the key change notification code will be already there.

Review highlights

  • exporter.rs - this is where the uploaded keypair is used for signing and it's important that all uses of publish_keypair subscribe to the key changes using the Rust channel that is passed there. It appeared to be used in only one place - the publish_batch method.

Testing

  • Verified manually, injecting the keypair during the integration test run, with publisher keypair renamed
  • Added a new integration test, responsible for running a publisher with no keypair file and supplying the keypair through the loader before reusing the existing test scenario. Making this a separate test ensures that both modes work correctly.

@drozdziak1 drozdziak1 changed the title [WIP] Add Remote Keypair Loader, a new keypair hotloading mechanism Add Remote Keypair Loader, a new keypair hotloading mechanism Mar 23, 2023
Comment on lines 337 to 351
let publish_keypair = if let Some(kp) = self.key_store.publish_keypair.as_ref() {
// It's impossible to sanely return a &Keypair in the
// other if branch, so we clone the reference.
Keypair::from_bytes(&kp.to_bytes())
.context("INTERNAL: Could not convert keypair to bytes and back")?
} else {
debug!(
self.logger,
"Exporter: Publish keypair is None, requesting remote loaded key"
);
let kp = RemoteKeypairLoader::request_keypair(&self.keypair_request_tx).await?;
debug!(self.logger, "Exporter: Keypair received");
kp
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to drop the request if it cannot get the key because when in the current code when it gets the key the api will bloat the rpc.

Copy link
Contributor Author

@drozdziak1 drozdziak1 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. I read the call stack of this code path when looking for the right place for this request. Here's some things to consider:

  • The maximum number of potentialn keypair requests in channel is the number of batches - From the top level, Exporter::run does not spawn the publish_updates future, meaning it waits until current attempt finishes before beginning the next one. In a given call, publish_updates will schedule publish_batch once per batch, awaiting the result of all using join_all. To me, this means that per Exporter instance, the number of keypair requests will never be more than the number of batches.
  • The keypair requests are extremely cheap to fulfill - RemoteKeypairLoader only has to clone and send back the Keypair object, and it doesn't even touch the channel if the keypair is not available yet (see handle_key_requests in remote_keypair_loader.rs, where guard the channel reads with if let)

I will leave a comment about the assumptions that prevent this part from clogging the system.

Copy link
Collaborator

@ali-behjati ali-behjati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is very well written and it's great that you added it to the integration tests 🔥 . I think as suggested before, changing the default to localhost and giving enough warning on the config.toml doc is good. (in general it's good to add the configs to the config.toml as the place that explains things with their default value)

Apart from that, your design choice to allow changing keys is interesting but it's a little complex. I left a comment and would appreciate if you handle it. I don't have any strong objection about the design but I recommend commenting it briefly on the code (a summary of what you already put on the PR description)

@drozdziak1 drozdziak1 merged commit e0c3be4 into main Mar 24, 2023
@drozdziak1 drozdziak1 deleted the drozdziak1/privkey-remote-load branch March 24, 2023 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants