Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Custom certificate verifier should not depend on presence of root certs #29667

Open
oakad opened this issue May 12, 2022 · 11 comments
Open

Comments

@oakad
Copy link

oakad commented May 12, 2022

TL;DR; I propose to make grpc_tls_certificate_verifier object, if set by user, an one and only authority on certificate validity. No messing with root cert/crl sets. Just the output of grpc_tls_certificate_verifier::Verify to decide.

This was to some extent discussed before, but the issue still applies, despite the refactoring of TLS code.

Consider a system such as OSX/IOS/Windows and to some extent Android which do not expose user certificates as a bunch of pem blobs in readable location (those easily amount to 90%+ of all client installations on this planet). All those systems provide a readily accessible API to verify peer certificate chains in a system dependent manner. In many cases it is critically important to comply with the system specific method of verification.

Right now we have the implementation referenced below.

The main issue with it: it does all the hard work trying to initialize the "root cert" verification set even if set_verify_server_certs is set to false. The check for this options happens rather late, so a lot of cycles are wasted for nothing (as verification will not be relying on those certs anyway).

Nevertheless, we must provide some "root certs" for the current implementation to work at all. Moreover, we must maintain them in valid state, even though we know them to be non-functional.

Additionally, the naming of an option set_verify_server_certs is misleading in such scenarios. We don't want to disable verification, we want to delegate it.

Thus, I think, it makes sense to drop the set_verify_server_certs option and instead simply check for presence of explicitly set grpc_tls_certificate_verifier. If it is set, no root set need to be populated and verification is fully delegated.

This will also make grpc_tls_certificate_provider use case simpler. Right now, in cases where intermediate root certs are delivered via network (xds style thing) the boilerplate of dynamic provider is pretty bad. It's actually easier to put together a custom X509_STORE_CTX (or whatever runtime platform offers) explicitly in the verifier than jump around the provider.

#if OPENSSL_VERSION_NUMBER >= 0x10100000
// X509_STORE_up_ref is only available since OpenSSL 1.1.
if (options->root_store != nullptr) {
X509_STORE_up_ref(options->root_store->store);
SSL_CTX_set_cert_store(ssl_context, options->root_store->store);
}
#endif
if (OPENSSL_VERSION_NUMBER < 0x10100000 || options->root_store == nullptr) {
result = ssl_ctx_load_verification_certs(
ssl_context, options->pem_root_certs, strlen(options->pem_root_certs),
nullptr);
if (result != TSI_OK) {
gpr_log(GPR_ERROR, "Cannot load server root certificates.");
break;
}
}
if (options->num_alpn_protocols != 0) {
result = build_alpn_protocol_name_list(
options->alpn_protocols, options->num_alpn_protocols,
&impl->alpn_protocol_list, &impl->alpn_protocol_list_length);
if (result != TSI_OK) {
gpr_log(GPR_ERROR, "Building alpn list failed with error %s.",
tsi_result_to_string(result));
break;
}
#if TSI_OPENSSL_ALPN_SUPPORT
GPR_ASSERT(impl->alpn_protocol_list_length < UINT_MAX);
if (SSL_CTX_set_alpn_protos(
ssl_context, impl->alpn_protocol_list,
static_cast<unsigned int>(impl->alpn_protocol_list_length))) {
gpr_log(GPR_ERROR, "Could not set alpn protocol list to context.");
result = TSI_INVALID_ARGUMENT;
break;
}
#endif /* TSI_OPENSSL_ALPN_SUPPORT */
SSL_CTX_set_next_proto_select_cb(
ssl_context, client_handshaker_factory_npn_callback, impl);
}
} while (false);
if (result != TSI_OK) {
tsi_ssl_handshaker_factory_unref(&impl->base);
return result;
}
if (options->skip_server_certificate_verification) {

@oakad
Copy link
Author

oakad commented May 20, 2022

I have rearranged the initialisation flow as following - I think it makes more sense in cases where there are no root certs readily available anywhere on the system and custom verifier will be present.

--- src/core/lib/security/security_connector/ssl_utils.cc.orig	2021-12-17 06:33:48.000000000 +1100
+++ src/core/lib/security/security_connector/ssl_utils.cc	2022-05-20 19:07:15.000000000 +1000
@@ -426,7 +426,7 @@
     tsi_ssl_client_handshaker_factory** handshaker_factory) {
   const char* root_certs;
   const tsi_ssl_root_certs_store* root_store;
-  if (pem_root_certs == nullptr) {
+  if (pem_root_certs == nullptr && !skip_server_certificate_verification) {
     gpr_log(GPR_INFO,
             "No root certificates specified; use ones stored in system default "
             "locations instead");
@@ -445,7 +445,7 @@
                            pem_key_cert_pair->private_key != nullptr &&
                            pem_key_cert_pair->cert_chain != nullptr;
   tsi_ssl_client_handshaker_options options;
-  GPR_DEBUG_ASSERT(root_certs != nullptr);
+  //GPR_DEBUG_ASSERT(root_certs != nullptr);
   options.pem_root_certs = root_certs;
   options.root_store = root_store;
   options.alpn_protocols =

--- src/core/tsi/ssl_transport_security.cc.orig	2021-12-17 06:33:48.000000000 +1100
+++ src/core/tsi/ssl_transport_security.cc	2022-05-20 19:38:34.000000000 +1000
@@ -1954,9 +1954,6 @@
 
   if (factory == nullptr) return TSI_INVALID_ARGUMENT;
   *factory = nullptr;
-  if (options->pem_root_certs == nullptr && options->root_store == nullptr) {
-    return TSI_INVALID_ARGUMENT;
-  }
 
 #if OPENSSL_VERSION_NUMBER >= 0x10100000
   ssl_context = SSL_CTX_new(TLS_method());
@@ -1990,27 +1987,6 @@
   }
 
   do {
-    result = populate_ssl_context(ssl_context, options->pem_key_cert_pair,
-                                  options->cipher_suites);
-    if (result != TSI_OK) break;
-
-#if OPENSSL_VERSION_NUMBER >= 0x10100000
-    // X509_STORE_up_ref is only available since OpenSSL 1.1.
-    if (options->root_store != nullptr) {
-      X509_STORE_up_ref(options->root_store->store);
-      SSL_CTX_set_cert_store(ssl_context, options->root_store->store);
-    }
-#endif
-    if (OPENSSL_VERSION_NUMBER < 0x10100000 || options->root_store == nullptr) {
-      result = ssl_ctx_load_verification_certs(
-          ssl_context, options->pem_root_certs, strlen(options->pem_root_certs),
-          nullptr);
-      if (result != TSI_OK) {
-        gpr_log(GPR_ERROR, "Cannot load server root certificates.");
-        break;
-      }
-    }
-
     if (options->num_alpn_protocols != 0) {
       result = build_alpn_protocol_name_list(
           options->alpn_protocols, options->num_alpn_protocols,
@@ -2033,16 +2009,45 @@
       SSL_CTX_set_next_proto_select_cb(
           ssl_context, client_handshaker_factory_npn_callback, impl);
     }
+
+    if (options->skip_server_certificate_verification) {
+      SSL_CTX_set_verify(ssl_context, SSL_VERIFY_PEER, NullVerifyCallback);
+      *factory = impl;
+      return TSI_OK;
+    }
+
+    if (options->pem_root_certs == nullptr && options->root_store == nullptr) {
+      result = TSI_INVALID_ARGUMENT;
+      break;
+    }
+
+    result = populate_ssl_context(ssl_context, options->pem_key_cert_pair,
+                                  options->cipher_suites);
+    if (result != TSI_OK) break;
+
+#if OPENSSL_VERSION_NUMBER >= 0x10100000
+    // X509_STORE_up_ref is only available since OpenSSL 1.1.
+    if (options->root_store != nullptr) {
+      X509_STORE_up_ref(options->root_store->store);
+      SSL_CTX_set_cert_store(ssl_context, options->root_store->store);
+    }
+#endif
+    if (OPENSSL_VERSION_NUMBER < 0x10100000 || options->root_store == nullptr) {
+      result = ssl_ctx_load_verification_certs(
+          ssl_context, options->pem_root_certs, strlen(options->pem_root_certs),
+          nullptr);
+      if (result != TSI_OK) {
+        gpr_log(GPR_ERROR, "Cannot load server root certificates.");
+        break;
+      }
+    }
   } while (false);
   if (result != TSI_OK) {
     tsi_ssl_handshaker_factory_unref(&impl->base);
     return result;
   }
-  if (options->skip_server_certificate_verification) {
-    SSL_CTX_set_verify(ssl_context, SSL_VERIFY_PEER, NullVerifyCallback);
-  } else {
-    SSL_CTX_set_verify(ssl_context, SSL_VERIFY_PEER, nullptr);
-  }
+  
+  SSL_CTX_set_verify(ssl_context, SSL_VERIFY_PEER, nullptr);
   /* TODO(jboeuf): Add revocation verification. */
 
   *factory = impl;

@yashykt
Copy link
Member

yashykt commented May 24, 2022

@ZhenLian ping

@ZhenLian
Copy link
Contributor

Thanks for bringing up the issue. Please see my comments below.

The main issue with it: it does all the hard work trying to initialize the "root cert" verification set even if set_verify_server_certs is set to false. The check for this options happens rather late, so a lot of cycles are wasted for nothing (as verification will not be relying on those certs anyway). Nevertheless, we must provide some "root certs" for the current implementation to work at all. Moreover, we must maintain them in valid state, even though we know them to be non-functional.

Yes, I agree this can be improved with the current API, and someone raised a similar concern in an internal platform. We can definitely change the API to not require a fake root certs if "set_verify_server_certs" is false. Currently gRPC security team doesn't have bandwidth to work on this, but let's keep this open until this can be picked up.

Additionally, the naming of an option set_verify_server_certs is misleading in such scenarios. We don't want to disable verification, we want to delegate it. Thus, I think, it makes sense to drop the set_verify_server_certs option and instead simply check for presence of explicitly set grpc_tls_certificate_verifier. If it is set, no root set need to be populated and verification is fully delegated.

To make sure I understand, you are proposing:

  • not setting verifier = No verification at all
  • setting verifier = do basic cert verification + whatever is presented in the verifier logic

Our current behavior is: if grpc_tls_certificate_verifier is not set, we will use the default hostname verifier. That is decided for security reasons: we don't want users running into a scenario when they accidentally forget to set verifier(or they don't know it's a thing that should be set), and hence accidentally not verifying anything, but still assuming they are doing the TLS authentication just because they use Tls*Credentials. There is always debate around "usability v.s. security", and I think we choose security at this point.

This will also make grpc_tls_certificate_provider use case simpler. Right now, in cases where intermediate root certs are delivered via network (xds style thing) the boilerplate of dynamic provider is pretty bad. It's actually easier to put together a custom X509_STORE_CTX (or whatever runtime platform offers) explicitly in the verifier than jump around the provider.

I am not sure I quite get your comment about "the boilerplate of dynamic provider". Would you mind elaborating that?
As for custom X509_STORE_CTX, it is a design decision we made a while ago to not expose X509_STORE_CTX, but instead wrap it around. IIRC, the rationale is that X509_STORE_CTX is an OpenSSL API, and we'd better to not directly expose it so that we can guarantee our API backward compatibility.

@ZhenLian
Copy link
Contributor

For the code diff change, would you mind turning into a PR? It's a bit hard to follow right now...

@oakad
Copy link
Author

oakad commented May 27, 2022

I will turn it into PR - my patch fixes the "set_verify_server_certs is false" issue as it is.

The more general issue is indeed security. I'm coming from the position, that certificate validation should rely on the platform mandated method of doing so. Right now, the gRPC approach of loading a pemball is only compatible with Linux and BSD, and totally incompatible with everything else (unless user chooses to maintain their own pemball, which is a very problematic thing.

To this end, I have implemented platform specific verifiers for OSX flavours (macos/ios), Windows and Android (with a JNI bridge). While Android is problematic, I can contribute the OSX and Windows verifiers, they are plain C++ files with a single "ifdef" for platform detection. I have borrowed the general approach from the Chromium code base.

My API change proposal (it does need some further thinking) above comes from the following use pattern (I have in my actual production code):

  opts.set_verify_server_certs(false);
  opts.set_certificate_verifier(std::make_shared<WindowsCertificateVerifierWrapper>());

By looking at such code, somebody may assume that we are not secure. But the contrary is true: we are quite secure, because we actually call the wincrypt api to comply with the platform policy (same on MacOS).

I am forced to disable "basic verification" (as you call it) because on those systems we don't have any material to feed into the OpenSSL context. The certs are stored in the policy-guarded platform stores.

oakad added a commit to oakad/grpc that referenced this issue May 27, 2022
…grpc#29667)

Presently, when establishing a TLS connection, gRPC will always attempt to
load some CA certificates from the file system or user supplied callback
mechanism.

However, on popular commercial platforms (MS Windows, Apple OSX) CA
certificates are not readily available to be loaded as files.

Therefore, requirement to always load the root certs creates a problem:

1. When custom, platform specific verifier is present.
2. When security is not an issue, but TLS connection is nevertheless required
(various testing situations).
@oakad
Copy link
Author

oakad commented May 27, 2022

Pr created as #29810

@matthewstevenson88
Copy link
Contributor

Apologies that this got dropped. Just sent #34859 to fix the core issue, building on top of #29810.

Note that we do not want grpc_tls_certificate_verifier to override the built-in cryptographic cert verification by default, since we do not (at present) give users the right tools to implement this themselves in their own grpc_tls_certificate_verifier implementation. Said differently, the grpc_tls_certificate_verifier should be additive.

@oakad
Copy link
Author

oakad commented Nov 3, 2023

In the current situation it explicitly must not be additive.

Consider, for example, Windows or MacOS. To use the system certificate store (the right thing to do) I have to do something like:

void ApplyPlatformTlsSettings(::grpc::experimental::TlsChannelCredentialsOptions &opts)
{
        opts.set_verify_server_certs(false);
        opts.set_certificate_verifier(std::make_shared<windows::WindowsCertificateVerifierWrapper>());
}

Whereupon the WindowsCertificateVerifierWrapper will call the CertVerifyCertificateChainPolicy and friends.

There's no scenario whereupon some custom pem bundle should override the system policy.

@matthewstevenson88
Copy link
Contributor

matthewstevenson88 commented Nov 3, 2023

Thanks for clarifying @oakad. I think there are a couple of issues here:

  1. We need to give users a way to use the system trust store on Windows or MacOS. (There are a few other open issues tracking this, e.g. Use Windows trust store by default. #16571 or Support loading system root certificates for Win32/OSX #25533. We're trying to burn down some of our open issues and these ones are on the list.)
  2. What's the role of the grpc_tls_certificate_verifier? Is it a replacement of the in-handshake cryptographic verification? Or is it a post-handshake extra validation step? At present, the implementation is the latter. If there is a need for the former (which I think is a totally reasonable ask and something we've thought about), maybe we could open a separate issue just for this (so we can better gauge community interest/need)?

@matthewstevenson88
Copy link
Contributor

Note that MacOS system trust store was added in #29957.

matthewstevenson88 added a commit that referenced this issue Nov 3, 2023
…re present. (#34859)

This PR fixes a bug identified in #29667, where the TLS channel
credentials still require a trust bundle even if the user has explicitly
opted to not verify the server certificate. This PR is based on #29810.
@oakad
Copy link
Author

oakad commented Nov 8, 2023

I don't see it using SecTrustEvaluate(WithError) and friends which will be the correct thing to do. I have a grpc based app which uses all the platform specific stores across Windows/IOS/Android and it's not that much work to do it properly (with the exception of Android, perhaps, which relies on some JNI and Java helper class).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants