Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A69: Certificate Revocation List Enhancements #382

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

gtcooke94
Copy link

No description provided.

@gtcooke94 gtcooke94 changed the title Crl proposal AXX: Crl proposal Jul 25, 2023
@gtcooke94 gtcooke94 requested a review from erm-g July 26, 2023 15:04
Copy link

@erm-g erm-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, suggested a bunch of changes related to wording / moving statements around

AXX-crl-enhancements.md Outdated Show resolved Hide resolved
AXX-crl-enhancements.md Outdated Show resolved Hide resolved
AXX-crl-enhancements.md Outdated Show resolved Hide resolved
AXX-crl-enhancements.md Outdated Show resolved Hide resolved
Per RFC5280, during revocation checking one of three status should be returned - `RevocationUnrevoked`, `RevocationRevoked`, and `RevocationUndetermined`. The user must specify whether `RevocationUndetermined` should be treated as `RevocationRevoked` or `RevocationUnrevoked` (failing open vs. failing closed).

Assumptions:
1. There should be no direct linkage between this interface and the creation/distribution of CRLs. gRPC is a user of CRLs and credentials, not a PKI itself.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add some 'high level overview' section (maybe as a comment to a diagram?) saying that existing impls use map as an in-memory storage and a provider to update it using some IOs. PKIs are completely out of scope of this gRFC

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also let's add an assumption that we can't guess CRL provider by a name of a file / metadata of strings. However if someone wants to rely on it they can do a specific impl

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the second

For the first, this document has a structure defined by a template. I could have a High Level Overview sub-section in the Proposal section.

existing impls use map as an in-memory storage and a provider to update it using some IOs

I'm a little confused at this comment - the existing CRL impls don't do this?

AXX-crl-enhancements.md Outdated Show resolved Hide resolved
If a CRL is not found for an issuer this is fine - the verification code will return RevocationUndetermined per RFC5280, then the user will have a setting to fail-open or fail-closed indicating whether RevocationUndetermined should be treated as Unrevoked or Revoked respectively.

#### API Outcomes and Error Handling
Because CRLs involve reading/writing from the filesystem, we will have to deal with potential edge cases of bad files, bad updates, etc. From a high-level perspective, we are going to follow the Authz Policy design patterns - use existing information in case an update contains bad data. Startup behavior is not different from updates. Particularly in the case of updating CRLs, we don't want to error and crash the server if there is a bad update. However, we still want this error to be known, so we will have an optional and overridable error callback when CRLs fail to be read. This will let users tie in whatever alerting/monitoring they may want in these failure cases. This callback will be for notifications/alerting, it will not be a decision-making callback (it is expected the IO for CRLs will already be done in the background, so these errors would not be on the main path by design anyways). Users should use `RevocationUndertermined` combined with the FailOpen/FailClosed knob for decision making on uncertain CRLs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a link to Authz gRFC

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and added to the related proposals at the top as well

AXX-crl-enhancements.md Outdated Show resolved Hide resolved
@gtcooke94 gtcooke94 marked this pull request as ready for review August 3, 2023 14:59
@gtcooke94 gtcooke94 changed the title AXX: Crl proposal A69: Crl proposal Aug 3, 2023
@gtcooke94 gtcooke94 changed the title A69: Crl proposal A69: Certificate Revocation List Enhancements Aug 3, 2023
@gtcooke94 gtcooke94 requested a review from erm-g August 31, 2023 15:33
Copy link

@erm-g erm-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor things

A69-crl-enhancements.md Outdated Show resolved Hide resolved

Users can also implement the interface with whatever behaviors they specifically desire.

Following [RFC5280](https://datatracker.ietf.org/doc/html/rfc5280), during revocation checking one of three status should be returned - `RevocationUnrevoked`, `RevocationRevoked`, and `RevocationUndetermined` ([RFC5280 defines `Undetermined` in the CRL Processing section](https://datatracker.ietf.org/doc/html/rfc5280#section-6.3.3). We ). There are many reasons a certificate can be revoked, but in the end gRPC cares about a certificate being revoked or not, thus we differentiate between `RevocationUnrevoked` and `RevocationRevoked`. The user must specify whether `RevocationUndetermined` should be treated as `RevocationRevoked` or `RevocationUnrevoked` (failing open vs. failing closed).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some pending leftovers
We ).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned

A69-crl-enhancements.md Show resolved Hide resolved
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments on top of @erm-g's. But looks good.

@markdroth, @dfawley, can y'all take a look?

A69-crl-enhancements.md Outdated Show resolved Hide resolved
A69-crl-enhancements.md Outdated Show resolved Hide resolved
A69-crl-enhancements.md Outdated Show resolved Hide resolved
A69-crl-enhancements.md Outdated Show resolved Hide resolved
gtcooke94 added a commit to grpc/grpc that referenced this pull request Oct 17, 2023
The basic APIs for the CRL Reloading features.
This adds external types to represent CRL Providers, CRLs, and
CertificateInfo.
Internally we will use `CrlImpl` - this layer is needed to hide OpenSSL
details from the user.

GRFC - grpc/proposal#382

Things Done
* Add external API for `CrlProvider`, `Crl`, `CertInfo` (`CertInfo` is
used during CRL lookup rather than passing the entire certificate).
* Add code paths in `ssl_transport_security` to utilize CRL providers
* Add `StaticCrlProvider`
* Refactor `crl_ssl_transport_security_test.cc` so it is more extensible
and can be used with providers
A69-crl-enhancements.md Outdated Show resolved Hide resolved
A69-crl-enhancements.md Outdated Show resolved Hide resolved
A69-crl-enhancements.md Outdated Show resolved Hide resolved
gtcooke94 added a commit to grpc/grpc that referenced this pull request Nov 3, 2023
This adds the directory reloader implementation of the CrlProvider. This
will periodically reload CRL files in a directory per [gRFC
A69](grpc/proposal#382)

Included in this is the following:
* A public API to create the `DirectoryReloaderCrlProvider`
* A basic directory interface in gprpp and platform specific impls for
getting the list of files in a directory (unfortunately prior C++17,
there is no std::filesystem, so we have to have platform specific impls)
* The implementation of `DirectoryReloaderCrlProvider` takes an
event_engine and a directory interface. This allows us to test using the
fuzzing event engine for time mocking, and to implement a test directory
interface so we avoid having to make temporary directories and files in
the tests. This is notably not in `include`, and the
`CreateDirectoryReloaderCrlProvider` is the only way to construct one
from the public API, so we don't expose the event engine and directory
details to the user.

---------

Co-authored-by: gtcooke94 <gtcooke94@users.noreply.github.com>
gtcooke94 added a commit to gtcooke94/grpc that referenced this pull request Nov 13, 2023
)

This adds the directory reloader implementation of the CrlProvider. This
will periodically reload CRL files in a directory per [gRFC
A69](grpc/proposal#382)

Included in this is the following:
* A public API to create the `DirectoryReloaderCrlProvider`
* A basic directory interface in gprpp and platform specific impls for
getting the list of files in a directory (unfortunately prior C++17,
there is no std::filesystem, so we have to have platform specific impls)
* The implementation of `DirectoryReloaderCrlProvider` takes an
event_engine and a directory interface. This allows us to test using the
fuzzing event engine for time mocking, and to implement a test directory
interface so we avoid having to make temporary directories and files in
the tests. This is notably not in `include`, and the
`CreateDirectoryReloaderCrlProvider` is the only way to construct one
from the public API, so we don't expose the event engine and directory
details to the user.

---------

Co-authored-by: gtcooke94 <gtcooke94@users.noreply.github.com>
gtcooke94 added a commit to gtcooke94/grpc that referenced this pull request Nov 13, 2023
)

This adds the directory reloader implementation of the CrlProvider. This
will periodically reload CRL files in a directory per [gRFC
A69](grpc/proposal#382)

Included in this is the following:
* A public API to create the `DirectoryReloaderCrlProvider`
* A basic directory interface in gprpp and platform specific impls for
getting the list of files in a directory (unfortunately prior C++17,
there is no std::filesystem, so we have to have platform specific impls)
* The implementation of `DirectoryReloaderCrlProvider` takes an
event_engine and a directory interface. This allows us to test using the
fuzzing event engine for time mocking, and to implement a test directory
interface so we avoid having to make temporary directories and files in
the tests. This is notably not in `include`, and the
`CreateDirectoryReloaderCrlProvider` is the only way to construct one
from the public API, so we don't expose the event engine and directory
details to the user.

---------

Co-authored-by: gtcooke94 <gtcooke94@users.noreply.github.com>
type CrlDirectoryReloadingProvider struct {
ReloadInterval time.Duration
CrlDirectory string
// maps issuer hash to crl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Tabs vs spaces. Make this a tab?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall design looks good! My comments here are mostly just details that need to be spelled out a little more clearly.

Please let me know if you have any questions. Thanks!

1. We cannot know users' PKI details, so our implementations should not enforce non-X509 metadata mattering (for example, file names). A user could certainly implement their own provider that cares about filenames.
1. We will not support OSCP-style checking (OSCP is an alternative to CRLs, not another form of CRLs).

![Basic Flow Diagram](A69_graphics/basic_diagram.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also include the source form (e.g., SVG files) for these diagrams, so that we can modify them in future if needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added SVG files

This is a simple provider that takes in all CRLs to be used as raw strings during initialization, then returns them in the appropriate format when called. The CRLs will be stored in a mapping of a hash of the issuer to the CRL content.

### Provider Implementation - Directory Provider
This provider will periodically read CRL files in a given directory and update gRPCs internal representation of those CRLs. We expect this will be heavily used, as a directory of CRLs is very common for X509 CRL files. The CRLs will be stored in a mapping of a hash of the issuer to the CRL content - `map<issuer_hash, crl_object>`. The initial read of the directory and filling of this map will be synchronous during server startup when the provider is created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth explicitly stating that the reason that the initial read is synchronous is to ensure that the CRL data is already loaded before the server starts up, so that we don't delay handshakes while waiting for the CRLs to load. We should explicitly state that the creation does not fail if there is an error in this initial load.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added text

// Implementation for getting and storing CRLs
```

Example feature use:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend this example to show setting the fail open/closed knob, so that it's clear that this knob is intended to be part of the TlsCreds API, not part of a specific CRL provider implementation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added opts.FailRevocationUndetermined = false

```
<pseudo code in the handshaker when verifying cert>

RevocationStatus GetRevocationStatus(cert, crlProvider, settings) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest changing this pseudo-code as follows:

  • Rename settings to opts, to clarify that this is the same options structure created in the previous pseudo-code block.
  • Instead of passing in crlProvider as a separate parameter, use opts.provider, since that's where we set the provider in the previous pseudo-code block.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


In the case that the directory and all files within are read completely during an update, we will treat the directory as the exact truth. This means if a CRL was in-memory and is no longer in the directory after the update, it will be removed in-memory.

In the case that there is an issue reading a file, we cannot be certain what exactly is happening. In particular, we enforce no naming convention on the CRL files, so if it cannot be read, we cannot know what it contains or what issuer it belongs to. We will do a best-effort safe update - for files which are read correctly, we can update those individual entries. Since we can't know the issuer if a file can't be read, it is unsafe to do deletion in this case. An overridable error callback will be called so the user can receive a signal that something has gone wrong with CRLs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explicitly state that this error callback will be passed as a parameter when instantiating the directory-based provider, so that it's clear that this is not part of the TlsCreds API itself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Example behavior for a directory reloader:

At startup we read 3 CRL files (A, B, C) and construct a map as described above. During the handshakes, we return the CRLs from the map.
Then, a bad update happens and CRL C ends up corrupted, but A and B are good. When this directory is reloaded, the resulting map will have the newer A and B CRL, but the older C CRL (as this will not see any update). In addition, the CrlReadErrorCallback would be invoked when CRL C fails to be read. There is no linkage between the map's CRL C and the fact that there was a read failure. It is the user's responsibility to act on file issues after being notified via the error callback.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text "but the older C CRL (as this will not see any update)" is confusing. Suggest changing that to something like "but it will also include the original C CRL, since we do not delete entries from the map when there is a failure reading any of the files in the directory".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, PTAL

If a CRL is not found for an issuer this is fine - the verification code will return RevocationUndetermined per RFC5280, then the user will have a setting to fail-open or fail-closed indicating whether RevocationUndetermined should be treated as Unrevoked or Revoked respectively.


#### API Outcomes and Error Handling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to specify the behavior of this callback a little more precisely:

  • The callback will be invoked at most once per update pass. If there are errors reading more than one file in a given pass, all of those errors will be returned in a single invocation of the callback.
  • The callback will be passed only one parameter, which is an indication of the error in a language-specific way (e.g., absl::Status in C++). The error text should include the list of files for which there were errors and what those errors were, but that information will not be passed in a structured way for the callback to understand those individual underlying errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


This approach adds complexity to the user facing API, and a user overriding the provider interface poorly could result in undefined or inefficient behaviors. To counteract this, we will provide generally useful implementations and good documentation on requirements if a user is writing their own implementation.

We explicitly will not be supporting OSCP-style revocation . We want to stay inside the realm of RFC5280 and X509-style CRLs, and OSCP represents a departure from that standard. Notably, the API surface for OSCP is different - for OSCP one makes a call to an external service that returns the revocation status. Rather, we expect to be given CRLs and will check the revocation status in gRPC. In addition, OSCP requires external calls during the handshake, whereas with X509 CRL files everything can done locally. Handshakes are on a path for which performance in critical, so making external calls during them is an approach to avoid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "performance in critical", change "in" to "is".

Also, I would say "important" instead of "critical" -- we do care about it, but handshakes are not on the per-call path, so their performance is not truly what I would consider critical.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed


> "sets the function to get the crl for a given certificate x. When found, the crl must be assigned to *crl. This function must return 0 on failure and 1 on success. If no function to get the issuer is provided, the internal default function will be used instead."

We will set this function to get the CRL from the CRLProvider if it is configured. We will add a `grpc::experimental::CrlProvider` that can be configured in `grpc::experimental::TlsChannelCredentialOptions`. This value and `set_crl_directory` cannot both be configured. The Provider will be configured in these options by the user and supplied as a pointer to grpc. Its lifetime needs to be managed by the user, and any handshake happening with a provider configured should share this provider, so the user can configure this to be shared across many connections if desired.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement "Its lifetime needs to be managed by the user" is inconsistent with the fact that we're passing it in as a shared_ptr<>.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed that statement

* [IdentityCertificateOptions](https://github.com/grpc/grpc-go/blob/2aa261560586eab6795301a3670d9dfdd7308625/security/advancedtls/advancedtls.go#L115)
* [certificateListExt](https://github.com/grpc/grpc-go/blob/2aa261560586eab6795301a3670d9dfdd7308625/security/advancedtls/crl.go#L88)

### C/C++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is probably a little too detailed. I think the main things we need to cover here are:

  • APIs that we're adding, both at the C++ and C-core layers.
  • A brief description of where the functionality will be implemented (i.e., the provider and the fail open/closed option will be passed down from TlsCreds to the security connector and from there to the TSI layer, where they will be used when interacting with the SSL library).

I don't think we need to cover the deeper implementation details in this doc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified implementation specifics significantly, and added text about us doing the CRL checks relating to RFC 5280 6.3.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants