Skip to content

Commit

Permalink
Create tls-artifacts-registry
Browse files Browse the repository at this point in the history
  • Loading branch information
vrutkovs committed Jan 11, 2024
1 parent 3ea5c01 commit da6966c
Showing 1 changed file with 313 additions and 0 deletions.
313 changes: 313 additions & 0 deletions enhancements/security/tls-artifacts-registry.md
@@ -0,0 +1,313 @@
---
title: tls-artifacts-registry
authors:
- "@vrutkovs"
- "@dgrisonnet"
- "@dinhxuanvu"
reviewers:
- "@deads2k"
approvers:
- "@deads2k"
api-approvers:
- "@deads2k"
creation-date: 2023-10-26
last-updated: 2023-12-08
tracking-link:
- https://issues.redhat.com/browse/API-1603
---

# TLS Artifacts Registry

## Summary

This enhancement describes a single registry to store all produced
TLS artifacts. Both engineering and customers have expressed concern
about a lack of understanding what is the purpose of a particular
certificate or CA bundle, who issued it, and which team is responsible
for its contents and lifecycle. This enhancement describes a registry
to store necessary information about all certificates in the cluster
and sets requirements for TLS artifacts metadata.

## Motivation

TLS artifacts registry is useful for several teams in Red Hat (i.e.
which component manages a particular certificate) and customers -
for instance, helping them to separate platform-managed certificates
from user-managed.

### User Stories

- As an Openshift administrator, I want to differentiate
platform-managed TLS artifacts from user-managed.
- As an Openshift developer, I want to be able to find which team
manages a particular certificate.
- As an Openshift developer, I want to have a test which verifies
that all certificates managed by the team have sufficient metadata
to explain its purpose
- As a member of OpenShift concerned with the release process
(TRT, dev, staff engineer, maybe even PM), I want to make sure all
managed certificates have sufficient metadata to find the team
responsible for its lifecycle.

### Goals

Create a registry of all managed TLS artifacts, and ensure that no new
certificates are added without sufficient metadata.
Add a test to the conformance suite that verifies the actual cluster TLS
artifacts have necessary metadata.

### Non-Goals

* Include certificates provided by user (additional CA bundle, custom ingress / API certificates)
* Include other encrypted artifacts - e.g. JWT tokens
* Include objects, which include TLS artifacts - e.g. kubeconfigs
* They contain CA bundles, cert and key - but verifying them requires kubeconfig parsing

## Proposal

* Define a structure for the certificate registry origin repo
* Each TLS artifact can be either
* on-disk file
* in-cluster secret or config map
* Additional expected metadata stored for this artifact
* JIRA component
* TLS artifact description (one sentence describing the purpose
of this artifact)
* Each TLS artifact is associated with a set of required metadata
* Create a test which saves existing information about TLS artifacts
in cluster.
* Some secrets may be versioned, contain hash or IP address / DNS
address - i.e.
`etcd-peer-ip-10-10-10-10.us-east-1.internal` becomes
`etcd-peer-<master-2>` etc.
The test would not store versioned copies and replace IPs with "master-n"
to make them look uniform when saving raw TLS information
* Create a new test, which ensures that all secrets are present in the
registry
* Create a test that verifies that TLS artifacts in a cluster have
necessary annotations to provide required metadata.
* Some TLS artifacts can be added to known violation list so that the test
is not permafailing to give teams time to update metadata. Each metadata
property has its own violation list.
* Metadata is split into required and optional properties. The test would
record missing optional metadata and fail if required metadata is missing.
This gives time to update certificate metadata when optional metadata
requirements become required. Violation files are meant to be "remove-only"
mode, meaning when metadata property becomes required users are not meant
to be added to the respective violation file.
* Initially required metadata properties would be "owning JIRA component" and
"one line description of artifact purpose". Later on other properties would be
added and eventually become required for new certificates.
* TLS artifact registry is being composed as a JSON file from several
parts (usually generated from the test job artifacts).
* In the `origin` repo create `hack/update-tls-artifacts.json` and
`hack/verify-tls-artifacts.json` scripts to compose TLS artifact
registry JSON from several JSONs and verify that all necessary
metadata is set respectively. `update` script also generate violation lists
in JSON and Markdown formats. For each required metadata field the Markdown
report can be customized.
* Update library-go functions to set metadata when the TLS artifact is
being generated, so that most operators would use the recommended
method.

### Workflow Description

#### Adding a new TLS artifact

The cluster administrator opens the TLS artifacts registry in the `origin` repo
and consults it to ensure that a particular certificate is platform-managed.
The administrator can use JIRA component metadata to file a
new issue related to this certificate.

When a components requires a new certificate created in cluster, . In order to properly
register a new TLS certificate the developer should do the following:

* the developers should start create a pull request adding this functionality.
"all tls certificates should be registered" test would fail.
The test job generates `rawTLSInformation` file which
contains collected information from the prow job, containing info about this new
certificate.
* the developer creates a PR to `origin` repo, adding this file to
`tls/raw-data` directory, runs `hack/update-tls-artifacts.sh` script.
The script would update `tls/ownership/tls-ownership.json` so that the test would no longer fail.
This script would also update `violation` files, so that the test verifying required metadata
would include this certificate
* the developer reruns tests and makes sure that the certificate is indeed registered and
its metadata has required properties and it matches the expected values stored in `origin` repo.
If tls artifact tests still fail, the developer repeats previous step making necessary changes.

Similarly, TRT members use this registry to file issues found in
periodic or release-blocking jobs.

#### Adding a new requirement for TLS artifacts

Reports and violations mechanisms can be extended to add new requirements. To add a new
certificate metadata requirements developers would need to:
* Implement [`Requirement` interface](https://github.com/openshift/origin/blob/master/pkg/cmd/update-tls-artifacts/generate-owners/tlsmetadatainterfaces/types.go#L9-L14):
```golang
type Requirement interface {
GetName() string

// InspectRequirement generates and returns the result for a particular set of raw data
InspectRequirement(rawData []*certgraphapi.PKIList) (RequirementResult, error)
}


type RequirementResult interface {
GetName() string

// WriteResultToTLSDir writes the content this requirement expects in directory.
// tlsDir is the parent directory and must be nested as: <tlsDir>/<GetName()>/<content here>.
// The content MUST include
// 1. <tlsDir>/<GetName()>/<GetName()>.md
// 2. <tlsDir>/<GetName()>/<GetName().json
// 3. <tlsDir>/violations/<GetName()>/<GetName()>-violations.json
WriteResultToTLSDir(tlsDir string) error

// DiffExistingContent compares the content of the result with what currently exists in the tlsDir.
// returns
// string representation to display to user (ideally a diff)
// bool that is true when content matches and false when content does not match
// error which non-nil ONLY when the comparison itself could not complete. A completed diff that is non-zero is not an error
DiffExistingContent(tlsDir string) (string, bool, error)

// HaveViolationsRegressed compares the violations of the result with was passed in and returns
// allViolationsFS is the tls/violations/<GetName> directory
// returns
// string representation to display to user (ideally a diff of what is worse)
// bool that is true when no regressions have been introduced and false when content has gotten worse
// error which non-nil ONLY when the comparison itself could not complete. A completed check that is non-zero is not an error
HaveViolationsRegressed(allViolationsFS embed.FS) ([]string, bool, error)
}
```

A simplified interface exists for annotation requirements:
```golang
type AnnotationRequirement interface {
GetName() string

// GetAnnotationName returns annotation name to use
GetAnnotationName() string

// GenerateMarkdown returns byte representation of markdown report
GenerateMarkdown(pkiInfo *certgraphapi.PKIRegistryInfo) ([]byte, error)

// InspectRequirement generates and returns the result for a particular set of raw data
InspectRequirement(rawData []*certgraphapi.PKIList) (RequirementResult, error)
}
```

Example:
```golang
type AutoRegenerateAfterOfflineExpiryRequirement struct{}

func NewAutoRegenerateAfterOfflineExpiryRequirement() tlsmetadatainterfaces.Requirement {
return AutoRegenerateAfterOfflineExpiryRequirement{}
}

func (o AutoRegenerateAfterOfflineExpiryRequirement) GetName() string {
return "autoregenerate-after-expiry"
}

func (o AutoRegenerateAfterOfflineExpiryRequirement) GetAnnotationName() string {
return "certificates.openshift.io/auto-regenerate-after-offline-expiry"
}

func (o AutoRegenerateAfterOfflineExpiryRequirement) GenerateMarkdown(pkiInfo *certgraphapi.PKIRegistryInfo) ([]byte, error) {
...
}
```

Markdown report can also be customized, see [example `GenerateMarkdown` method](https://github.com/openshift/origin/blob/master/pkg/cmd/update-tls-artifacts/generate-owners/tlsmetadata/autoregenerate_after_expiry/requirement.go#L31-L162) for autogenerate requirement.

### API Extensions

Not required

### Risks and Mitigations

Teams are responsible for accurate descriptions and timely owners
update of TLS artifacts

### Drawbacks

None

## Design Details

### Open Questions

* Which metadata should be present for all TLS artifacts?
* Minimum set:
* Description
* JIRA component
* Additional data:
* Validity period
* Refreshed after n days / at n% of lifetime
* Can be versioned / can contain IP address or hash
* Suggested by customers
* SSLConnections - which IP address / DNS name this certificate
is valid for.
* This can be derived from certificate content, not required to
be included in the TLS artifact registry and may depend on the base
domain
* Parent - a reference to another TLS artifact this cert is derived
from.
* Management type - operator-managed, user-managed.


### Test Plan

`origin` unit test would verify that the TLS artifact registry is valid:
* All TLS artifacts have owners
* All TLS artifacts have the necessary metadata

e2e conformance tests would ensure that:
* All certificates in `openshift-*` namespaces or on disk are
included in the TLS registry.
* in-cluster certificates have necessary annotations and their value
matches registry entries.

### Graduation Criteria

Not applicable

#### Dev Preview -> Tech Preview

Not applicable

#### Tech Preview -> GA

Not applicable

#### Removing a deprecated feature

Not applicable

### Upgrade / Downgrade Strategy

Not applicable

### Version Skew Strategy

Not applicable

### Operational Aspects of API Extensions

Not applicable

#### Failure Modes

Not applicable

#### Support Procedures

Not applicable

## Implementation History

* Current implementation: https://github.com/openshift/origin/pull/28305

## Alternatives

None known

0 comments on commit da6966c

Please sign in to comment.