Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add webpki-roots and native-certs crate features #1943

Closed
wants to merge 6 commits into from

Conversation

daxpedda
Copy link
Contributor

@daxpedda daxpedda commented May 22, 2023

I will need some guidance on how to proceed from here. Currently when both crate features are enabled, certificates from both webpki-roots and rustls-native-certs are loaded, which is probably not what we want.

Suggestions:

  1. Either we prefer one or the other.
  2. Allow only one crate feature to be enabled with a compile-time check.
  3. Expose a configuration option that selects between one of them, with probably native certs being the default.

I'm leaning towards option 3., because a follow-up from me would be to expose some API to disable/enable them during runtime anyway. This is also how reqwest does it, but when both features are active they will just load both.

I didn't touch the integration tests crate and CI yet.

Follow-up to #1940.

@djc
Copy link
Member

djc commented May 23, 2023

I would usually pick option 1. Do we have an API path that allows doing without either (that is, with the caller providing ClientConfig/ServerConfig)?

I'm leaning towards option 3., because a follow-up from me would be to expose some API to disable/enable them during runtime anyway. This is also how reqwest does it, but when both features are active they will just load both.

What do you mean by disabling them during runtime anyway? What are you looking to achieve for your use case?

@daxpedda
Copy link
Contributor Author

Do we have an API path that allows doing without either (that is, with the caller providing ClientConfig/ServerConfig)?

Apparently yes: ResolverConfig::set_tls_client_config().

What do you mean by disabling them during runtime anyway? What are you looking to achieve for your use case?

My main use-case is when shipping an application, I want to add a settings toggle that allows users to use their native certs if the want to, but using webpki-roots by default. This will become extra useful with rustls-platform-verifier.

To achieve this, some runtime control would be required. E.g. adding fields or methods to ResolverOpts or ResolverConfig.

Alternatively, this could be deemed too niche and using ResolverConfig::set_tls_client_config() enables this already.

I would usually pick option 1.

I just recently found out that reqwest for example just loads both: https://github.com/seanmonstar/reqwest/blob/06c8e5b0b008afee8114fb979b85cd8b73415391/src/async_impl/client.rs#L381-L425.


Let me know how you want me to proceed. With ResolverConfig::set_tls_client_config() any solution we discussed so far works for me.

@djc
Copy link
Member

djc commented May 23, 2023

I guess the simplest path forward is to enable rustls-native-certs by default everywhere and provide a webpki-roots feature that overrides the default.

@daxpedda
Copy link
Contributor Author

I guess the simplest path forward is to enable rustls-native-certs by default everywhere [..]

This isn't desirable either, because that would force a dependency on rustls-native-certs even when it's not needed. That was the reason I originally made #1941.

@djc
Copy link
Member

djc commented May 23, 2023

I meant, in terms of option 1.

@daxpedda daxpedda force-pushed the native-certs branch 2 times, most recently from 9bd46f9 to a395fed Compare May 23, 2023 10:41
@daxpedda daxpedda force-pushed the native-certs branch 2 times, most recently from 572d374 to 87d15bf Compare May 23, 2023 13:12
@daxpedda
Copy link
Contributor Author

Not sure why CI fails here: https://github.com/bluejekyll/trust-dns/actions/runs/5057735617/jobs/9077514569?pr=1943#step:5:298

2023-05-23T13:41:37.0805887Z running 0 tests
2023-05-23T13:41:37.0807928Z      Running tests/named_https_tests.rs (/home/runner/work/trust-dns/trust-dns/target/debug/deps/named_https_tests-20c4e45b4310e1ef)
2023-05-23T13:41:37.0808258Z 
2023-05-23T13:41:37.0853353Z test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
2023-05-23T13:41:37.0855807Z 
2023-05-23T13:41:37.0857952Z 
2023-05-23T13:41:37.0860197Z running 1 test
2023-05-23T13:42:07.0972205Z error: test failed, to rerun pass `--test named_https_tests`
2023-05-23T13:42:07.0972619Z 
2023-05-23T13:42:07.0972756Z Caused by:
2023-05-23T13:42:07.0973489Z   process didn't exit successfully: `/home/runner/work/trust-dns/trust-dns/target/debug/deps/named_https_tests-20c4e45b4310e1ef` (exit status: 255)

Locally this test works for me:

cargo test --features dns-over-https-rustls --test named_https_tests
cargo test --no-default-features --features dns-over-https-rustls --test named_https_tests

@djc
Copy link
Member

djc commented May 23, 2023

Yeah, that test is flaky.

@daxpedda
Copy link
Contributor Author

CI fails on nightly because of: sval-rs/value-bag#62. To fix this we need to update log to v0.4.18.

Do you want me to make a quick PR?

@djc
Copy link
Member

djc commented May 29, 2023

Yes, please!

Comment on lines 299 to 300
for cert in rustls_native_certs::load_native_certs()? {
if let Err(err) = root_store.add(&rustls::Certificate(cert.0)) {
tracing::warn!(
"failed to parse certificate from native root store: {:?}",
&err
);
}
}
if root_store.is_empty() {
return Err(ProtoErrorKind::NativeCerts.into());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I did here in general was to log warnings when encountering invalid certificates and returning an error if either load_native_certs() failed or if no certs were found at all.

In comparison to how reqwest does it, they also error out if load_native_certs() fails but they don't error out if no certificates at all where found, only if at least one invalid and no valid certificate was found.

See https://github.com/seanmonstar/reqwest/blob/06c8e5b0b008afee8114fb979b85cd8b73415391/src/async_impl/client.rs#L381-L425.

@@ -28,10 +29,31 @@ use crate::config::TlsClientConfig;
const ALPN_H2: &[u8] = b"h2";

// using the mozilla default root store
pub(crate) static CLIENT_CONFIG: Lazy<Arc<ClientConfig>> = Lazy::new(|| {
pub(crate) static CLIENT_CONFIG: Lazy<Result<Arc<ClientConfig>, ProtoError>> = Lazy::new(|| {
Copy link
Contributor Author

@daxpedda daxpedda Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this doesn't try and reload native certs if it fails.
We don't want to turn this into a function either because otherwise we would just be allocating the same stuff on every single connection.

Instead of making CLIENT_CONFIG a global we would have to store it in the connection provider, that would allow for proper reloading of native certificates per connection provider instead of per connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, storing it in the connection provider doesn't sound so bad. Can you explain for me from first principles why this is necessary?

Copy link
Contributor Author

@daxpedda daxpedda Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes down to how we want to reload the native certs.

  1. With a Lazy, we can only load the certificates once, if it fails, we have to restart the application to try again.
  2. With a OnceCell, we could potentially keep trying to load until it succeeds. To reload the certificates again after a success, we would have to restart the application.
  3. With a function, we would have to re-parse all certificates for every new connection, because that's where we need the configuration. That's not really a viable option.
  4. If we store it in the connection provider, you can reload or retry getting the certificates by creating a new connection provider instead of having to restart the application.

So depending on how we want to be able to reload native certificates, we have to choose one of those solutions:

  • To retry after a failure you have to restart the application -> 1.
  • To reload after a success you have to restart the application -> 2.
  • To reload or retry, you have to create a new connection provider -> 4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds reasonable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this turned out a bit ugly, let me know what you think, I can figure out how to improve this if you agree with the approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djc friendly ping, this is ready again.

@daxpedda
Copy link
Contributor Author

I think this CI failure has nothing to do with this PR?
It was spurious network error before, now it's bad TLS data.

@daxpedda
Copy link
Contributor Author

daxpedda commented Jun 14, 2023

I can actually reproduce this failure locally, seems to be an issue with the PR actually.

@daxpedda
Copy link
Contributor Author

So there are some tests that required any sort of root certificates, which was there by default before.
That should be fixed now.

@daxpedda daxpedda force-pushed the native-certs branch 3 times, most recently from ed21c7e to 11e8fae Compare July 6, 2023 12:22
@daxpedda
Copy link
Contributor Author

daxpedda commented Jul 6, 2023

Rebased on top of #1986 to let the CI run.

@daxpedda
Copy link
Contributor Author

daxpedda commented Jul 7, 2023

Rebased on top of main again.

Copy link
Member

@bluejekyll bluejekyll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks ok, I have a question about the Lazy usage if you wouldn't mind answering that?

@@ -48,6 +48,9 @@ dns-over-openssl = ["dns-over-tls", "dnssec-openssl"]
dns-over-rustls = ["dns-over-tls", "dnssec-ring", "trust-dns-proto/dns-over-rustls"]
dns-over-tls = []

webpki-roots = ["trust-dns-proto/webpki-roots"]
native-certs = ["trust-dns-proto/native-certs"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to file an issue for this. I think we want to move away from controlling the TLS and Certificate system in use from the features based model (we'll keep the features) and toward a config system that will allow for the desired model to be chosen at that time. The current system can be confusing especially if dependencies might introduce one or the other accidentally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to work on that as well!

crates/proto/src/quic/quic_client_stream.rs Outdated Show resolved Hide resolved
}
}

impl<P: RuntimeProvider + Default> Default for GenericConnector<P> {
fn default() -> Self {
Self {
runtime_provider: P::default(),
#[cfg(feature = "dns-over-rustls")]
client_config: Arc::new(Lazy::new(crate::tls::client_config)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed it, but I'm having trouble following the advantage of this lazy in this context, i.e. it seems like this will be a new reference, so even the Arc is unnecessary? Am I misunderstanding this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Arc here is required to clone the Lazy, which is needed if we want to keep GenericConnector cloneable.
Lazy is used here to lazily initialize the ClientConfig, because this config isn't used if the user supplies it's own.

@djc
Copy link
Member

djc commented Aug 22, 2023

In order to make this a bit easier to digest, can I request that this is split in multiple PRs? Maybe even one per commit for some of these wide-ranging commits.

@daxpedda
Copy link
Contributor Author

In order to make this a bit easier to digest, can I request that this is split in multiple PRs? Maybe even one per commit for some of these wide-ranging commits.

Some of these commits just overwrite each other or fix something from a previous one.
I will redo the whole PR and for sure can split it up into multiple ones.

@daxpedda
Copy link
Contributor Author

daxpedda commented Sep 7, 2023

#2005 included everything in here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants