New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't sign X.509 certs #34202
don't sign X.509 certs #34202
Conversation
9829bfe
to
ef4dea3
Compare
Related Firedancer patch: firedancer-io/firedancer@851e73f |
Can you elaborate on what problems this PR solves? Why do we need this? |
@lijunwangs There is lots of background here. It starts with the messy integration of QUIC into Solana's peer-to-peer protocols. The Rust ecosystem still lacks supports for some modern TLS standards, like RFC 7250 RawPublicKeys. So, in order to establish a TLS connection, the validator needs to work with X.509 certificates. Currently, it uses self-signed certificates. These certs are almost entirely redundant, except for the fact that they carry the peer's Ed25519 public key. Ever since QUIC was first added to the validator, there has been a historical security concern with how these certificates were created. Currently, the Labs validator uses the node identity key to sign X.509 certificates. The signature is useless however. Validators don't verify it. This form of signing introduces risk though: It exposes the node's private key to a handful of unaudited third-party dependencies. The signing payload could also be ambiguous with other messages signed by the identity key, such as Solana gossip packets or vote transactions. So, this PR removes a lot of complexity. It removes several third-party crates used to construct the X.509 certificate with a single hardcoded template. (As mentioned above, the content of the cert doesn't matter, so it is fine to hardcode). The signature contained in the certificate is now deliberately invalid, as nodes don't verify it anyways. For similar reasons, the SAN and serial number are removed too. What remains is a template holding a byte array with a DER serialization of an X.509 certificate. The pubkey field is then patched with the node's identity key as needed. |
@@ -97,8 +93,7 @@ pub struct QuicConfig { | |||
|
|||
impl NewConnectionConfig for QuicConfig { | |||
fn new() -> Result<Self, ClientError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this becomes infallible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@t-nelson No, unfortunately not. There are some trait impls of NewConnectionConfig for the legacy UDP code path that can still return an IOError.
468aea6
to
fe5c638
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes lgtm now. gonna tap in @lijunwangs and @behzadnouri for rationale
|
I'm pretty sure it's safe, but I cannot 100% confirm this, as I haven't tested a build from this PR against an old Solana build. Is there a simple CLI tool I can run that spins up a solana-streamer client or server? That would help with testing.
That's a great question, and I should have clarified! The TLS 1.3 Both client and server deterministically compute the transcript hash and then do the verification. The client and server also put in a 32 byte random value in their ClientHello and ServerHello messages to prevent replay attacks. (So it has some properties of a challenge-response mechanism) All TLS connections do this procedure described above. X.509 signatures are only used for the certificate chain, but they do not secure the connection itself. (e.g. I could easily take the In the web, we need both TLS CertificateVerifys (to prove cert ownership) and X.509 certs (to prove cert authenticity against the root CAs via cert chains). In P2P, we only need TLS CertificateVerify, as the blockchain state does the authentication part. See RFC 8446:
For completeness, here is a pretty picture describing exactly how the TLS CertificateVerify is created: |
I think the easiest would be to run a staked node on testnet and verify quic connections are ok, and other nodes extract the correct pubkey from the connection. |
fe5c638
to
57a35df
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #34202 +/- ##
=========================================
- Coverage 81.9% 81.9% -0.1%
=========================================
Files 819 819
Lines 219765 219689 -76
=========================================
- Hits 180094 179987 -107
- Misses 39671 39702 +31 |
.to_der() | ||
.expect("Failed to convert keypair to DER") | ||
.to_der(); | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a link to where the prefix is documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a link to https://www.rfc-editor.org/rfc/rfc8410#section-7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.rfc-editor.org/rfc/rfc8410#section-10.3 is probably a better example reference.
Nodes currently don't verify X.509 self-signed certificates because peer authentication is done via TLS 1.3 CertificateVerify. Thus, encodes an invalid signature in the X.509 certificate instead.
57a35df
to
2b1c177
Compare
.to_der() | ||
.expect("Failed to convert keypair to DER") | ||
.to_der(); | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.rfc-editor.org/rfc/rfc8410#section-10.3 is probably a better example reference.
cert_params | ||
.distinguished_name | ||
.push(DnType::CommonName, "Solana node"); | ||
let mut cert_der = Vec::<u8>::with_capacity(0xf4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference link please for the following sections. And instead of hard coding a certificate, can we not generate it without giving the key_pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with reference link? The X.509 spec? This code is functionally no different from what was there before. The previous version just used a crate to produce the same cert on all nodes, with just the serial number, SAN, and public key different. Here, we do the same, but precompute the ASN.1 serialization. I honestly thought this was worth getting rid of thousands of lines of dependency code, considering using X.509 is a mistake in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not generate it without giving the key_pair
Holding the public key is the only purpose of the cert, everything else is arbitrary. Without it, authentication is not possible. So we cannot remove the public key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the reference doc's link, like ITU-T X.690. This hard code is just very hard to review. I really need to understand each code to approve this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lijunwangs Maybe it would help to have a test case that asserts the hardcoded serialization decodes to the structure mentions in the comment. We can use a dev-dependency for this. What do you think?
@lijunwangs I wrote up an explanation of this change: https://forum.solana.com/t/deprecate-x-509-certs-for-p2p-connections/762. We could link it in the code if you think it helps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to fully review the changes in this PR and test the changes to verify they are safe and working:
- I tested against the mainnet -- connections to the existing nodes are okay
- Tested pubkey spoofing in the certificate (impersonating someone else's pubkey in the X.509 cert) -- the validator is correctly rejecting with BadSignature during TLS1.3 handshaking.
- I created a manual test tool to compare the cert and key generated using this code and without the code and use openssl to compare the generated DER file. Do not see worthy aberrations other than the issuer name.
The PR while making it safer by eliminating a third party crate does make the code a lot harder to review and maintain -- so the referential links are important.
The code need to be rebased.
// RelativeDistinguishedName SET (1 elem) | ||
// AttributeTypeAndValue SEQUENCE (2 elem) | ||
// type AttributeType OBJECT IDENTIFIER 2.5.4.3 commonName (X.520 DN component) | ||
// value AttributeValue [?] UTF8String Solana |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep using the old "Solana node" -- it is more appropriate. It is issued by this node -- not Solana networks.
Superseded by #34896. Thanks for your efforts @ripatel-fd |
Problem
Summary of Changes
Fixes #