Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] HTTP RPC System Refactor #3987

Merged
merged 115 commits into from Nov 1, 2023
Merged

[Refactor] HTTP RPC System Refactor #3987

merged 115 commits into from Nov 1, 2023

Conversation

jcnelson
Copy link
Member

@jcnelson jcnelson commented Oct 3, 2023

Alright folks, here it is: The Big HTTP RPC Refactor.

This PR achieves two things: it separates HTTP transport concerns from RPC business logic, and it refactors the RPC system so that each RPC method's business logic can live entirely in its own file. You can find them all under stackslib/src/net/api.

The bulk of the delta here is composed:

  • moving all of the code from stackslib/src/net/http.rs into the new module stackslib/src/net/http
  • moving most of the code from stackslib/src/net/rpc.rs into stackslib/src/net/api
  • moving most of the HTTP-specific code from stackslib/src/net/mod.rs into stackslib/src/net/httpcore.rs.

The first change sets the stage for fully removing HTTP transport concerns from stackslib, and putting them into their own top-level crate. The code in this module does not depend on any Stacks-specific code or data. The second change effectively isolates the RPC functions from the rest of the system, and exposes various Stacks node state under a new StacksNodeState struct (which can gain new members and functions as we need them). The third change binds the newly-isolated HTTP transport code in stackslib/src/net/http to the Stacks blockchain, and implements a high-level HTTP state machine atop net::connection::ProtocolFamily<P> to be used by net::server::ConversationHttp to manage client connection lifecycles.

As you can see from the individual RPC methods, one only needs to implement the net::http::HttpRequest, net::http::HttpResponse, and net::httpcore::RPCRequestHandler traits for a given struct in order to implement a new RPC method. Once this is done, it can be registered with the StacksHttp state machine with a call to register_rpc_endpoint() in src/net/api/mod.rs.

In a hopeless bid to keep this PR smaller than it could have been, I have opted to keep net::http and net::connection.rs within stackslib for now. However, I fully expect to move these both into top-level crates, and possibly move the chunked transfer encoding and PeerAddress/PeerHost types out of stacks_common and into one of these crates. But that is for another day.

…ndpoint, without destroying the write endpoint. This lets the caller send data to the read endpoint as it's being generated, so the read endpoint can consume it concurrently.
@jferrant
Copy link
Collaborator

First pass done. All my comments have to do with just readability/syntax. Didn't see anything functionally wrong, but I will do another pass on monday!

@CAGS295
Copy link

CAGS295 commented Oct 14, 2023

@wileyj I'm not sure what to do about the CI / Docker Debian (Source) / Build Image (pull_request) error. Do you have any insights?

per the call, i've seen errors compiling stackslib when clang isn't present. however, i don't think that's the reason here:

#11 338.0    Compiling stackslib v0.0.1 (/build/stackslib)
#11 349.1 error[E0599]: no method named `get_path` found for struct `StacksHttpRequest` in the current scope
#11 349.1    --> stackslib/src/monitoring/mod.rs:62:52
#11 349.1     |
#11 349.1 62  |     let timer = prometheus::new_rpc_call_timer(req.get_path());
#11 349.1     |                                                    ^^^^^^^^ method not found in `StacksHttpRequest`
#11 349.1     |
#11 349.1    ::: stackslib/src/net/httpcore.rs:452:1
#11 349.1     |
#11 349.1 452 | pub struct StacksHttpRequest {
#11 349.1     | ---------------------------- method `get_path` not found for this struct
#11 349.1 
#11 349.3    Compiling fixed-hash v0.8.0
#11 349.3 For more information about this error, try `rustc --explain E0599`.
#11 349.3 error: could not compile `stackslib` (lib) due to previous error
#11 349.3 warning: build failed, waiting for other jobs to finish...

RequestType was replaced by StacksHttpRequest in 0a97fac.
RequestType::get_path() was deleted in 4e41c23.

@jcnelson I assume you want something like:
let timer = prometheus::new_rpc_call_timer(&decode_request_path(req.request_path())?.0);

It compiles for me.

Copy link
Collaborator

@xoloki xoloki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try to run tests::signer::test_stackerdb_dkg, it never completes. I have no idea what's happening but the logs are spamming very fast.

Get this test running, then try it with num_keys = 4000 (testnet/stacks-node/src/tests/signer.rs:161). If DKG takes longer than ~90sec, or either of the sign operations takes longer than ~100sec, something is badly wrong with the HTTP layer and it's effectively breaking stackerDB under load.

@jcnelson
Copy link
Member Author

Thanks @xoloki! The bug was that the new handler uploading StackerDB chunks was not forwarding the chunk data to the relayer. This, in turn, prevented any StackerDB events from propagating to the signer.

I noticed that test_stackerdb_dkg isn't part of the integration test battery. I went ahead and added it.

@xoloki
Copy link
Collaborator

xoloki commented Oct 26, 2023

Thanks @jcnelson! Glad it was a quick fix, and that we're putting the dkg test into CI.

I noticed that test_stackerdb_dkg isn't part of the integration test battery. I went ahead and added it.

Can you change the number of signers/keys at testnet/stacks-node/src/tests/signer.rs:161? It's currently at 16 signers and 40 keys, which is weird since it means we can't evenly divide those keys among the signers. I'd suggest something like 100 keys and 4 signers. This should add less than a minute to CI.

Get that done and I'll approve this PR.

@jcnelson
Copy link
Member Author

Can you change the number of signers/keys at testnet/stacks-node/src/tests/signer.rs:161? It's currently at 16 signers and 40 keys, which is weird since it means we can't evenly divide those keys among the signers. I'd suggest something like 100 keys and 4 signers. This should add less than a minute to CI.

The DKG test would only run on release CI, so added time isn't too much of a concern for day-to-day tests. I went ahead and used 100 signers and 4000 keys. If this somehow fails release CI due to a timeout, I can bump it back to 4 signers and 100 keys. Does that work for you?

Copy link
Collaborator

@xoloki xoloki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks so much for the quick fix and CI inclusion!

@xoloki
Copy link
Collaborator

xoloki commented Oct 26, 2023

The DKG test would only run on release CI, so added time isn't too much of a concern for day-to-day tests. I went ahead and used 100 signers and 4000 keys. If this somehow fails release CI due to a timeout, I can bump it back to 4 signers and 100 keys. Does that work for you?

Keep in mind that there will be multiple threads per signer, and depending on the OS and hardware hundreds of threads might run poorly. But either way I'm happy.

@jcnelson jcnelson merged commit 9668fb1 into develop Nov 1, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants