Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add README.md clarifying TLS registry purpose and processes #28434

Merged
merged 1 commit into from Jan 5, 2024

Conversation

vrutkovs
Copy link
Member

@vrutkovs vrutkovs commented Dec 1, 2023

/cc @deads2k

@openshift-ci openshift-ci bot requested a review from deads2k December 1, 2023 11:37
tls/README.md Outdated
## Registry purpose

The registry is used to collect all TLS artifacts used in OpenShift, certificate key pairs and CA bundles alike.
For simplicity this document will use "certificate" for both certificate, key or CA bundle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For simplicity this document will use "certificate" for both certificate, key or CA bundle.
For simplicity this document will use "pki artifact" for both certificate, key or CA bundle.

? It's confusing to read certificate where it's actually a bundle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wanted to get away with this and just use "cert" throughout the text :/

Not sure if "pki artifact" is better, maybe "tls artifact"?

tls/README.md Outdated

In order to ensure certificates are following a set of defined standards, we need a way to collect
all certificates used in OpenShift. This is done via "[sig-arch][Late] collect certificate data" test.
The test produces a json in `openshift-e2e-test/artifacts/rawTLSInfo`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The test produces a json in `openshift-e2e-test/artifacts/rawTLSInfo`:
The test produces a json in `openshift-e2e-test/artifacts/rawTLSInfo/<put-the-pattern-of-filename-here>json`:

tls/README.md Outdated

This stores the following info:
* all secrets / configmaps which are certificates along with their metadata
* parses them and stores information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* parses them and stores information
* parses them and stores metadata

tls/README.md Outdated

Recently we've added `openshift.io/owning-component: foobar` annotation to (almost) all certificates,
so that issues related to this certificate would be routed to a proper location. For instance,
certificates created by `service-ca` are managed by network team regardless of the repo these are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
certificates created by `service-ca` are managed by network team regardless of the repo these are
certificates created by `service-ca` are managed by service-ca jira component regardless of the repo these are

The ownership by jira components, not teams, is extremely intentional. Teams can move and dissolve, but we're good about making sure every jira component has an owner.

tls/README.md Outdated
Apart from reports the registry is checked for requirement violations.
`make update` creates `tls/violations/ownership/ownership-violations.json` listing certificate locations
which don't have ownership annotation set. This file is meant to be "remove-only", meaning adding
new entries is prohibited. This is enforced by using a separate `OWNERS` file for this directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new entries is prohibited. This is enforced by using a separate `OWNERS` file for this directory.
new entries is prohibited. This is enforced using an e2e test and by using a separate `OWNERS` file for this directory.

tls/README.md Outdated

Reports and violations mechanism can be extended to add new requirements. In order to add a new
certificate metadata requirements developers would need to:
* implement `Requirement` interface from `pkg/cmd/update-tls-artifacts/generate-owners/tlsmetadata/tlsmetadatainterfaces/types.:go`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add courtesy link

certificate metadata requirements developers would need to:
* implement `Requirement` interface from `pkg/cmd/update-tls-artifacts/generate-owners/tlsmetadata/tlsmetadatainterfaces/types.:go`:
```golang
type Requirement interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followup consideration: we could streamline further by having annotation requirements indicate their annotations.

tls/README.md Outdated

Markdown report can also be customized:
```golang
func generateOwnershipMarkdown(pkiInfo *certgraphapi.PKIRegistryInfo) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicate method and provide link. I think once we generalize the annotation based requirements most people won't need this.

tls/README.md Show resolved Hide resolved
tls/README.md Outdated
## Registry purpose

The registry is used to collect all TLS artifacts used in OpenShift, certificate key pairs, and CA bundles alike.
For simplicity, this document will use "certificate" for both certificate, key, or CA bundle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll accept "tls artifact" if you like. But calling it a certificate is incorrect

tls/README.md Outdated

Recently we've added the `openshift.io/owning-component: foobar` annotation to (almost) all certificates
so that issues related to this certificate would be routed to a proper location. For instance,
problems with `service-ca` certificates are filed in Jira for "Networking / cluster-network-operator" component
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service-ca is owned by the service-ca component, not the networking component

tls/README.md Show resolved Hide resolved
tls/README.md Outdated

Along with the "collect tls artifacts" test the e2e test ensures that cluster certificates don't add
new certificates violating requirements via the "all registered tls artifacts must have no metadata violation regressions" test.
This test would collect the TLS registry from the cluster and generate new violation files from it. If this new file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This test would collect the TLS registry from the cluster and generate new violation files from it. If this new file
This test collects the TLS registry from the cluster and generate new violation files from it. If this new file

And similar through. Not future ("would"), but current

tls/README.md Outdated
from PRs that add new certificates without required metadata across the entire platform. Another
test would verify that metadata for actual certificates matches metadata for known certificate locations.

Once a baseline set of metadata is determined the tests would move from flaking to properly failing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to enforce ASAP. I think we have sufficient data to do so

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break a lot of optional operators (i.e. OpenShift GitOps or SRIOV operator), so we need a way of finding all jobs first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break a lot of optional operators (i.e. OpenShift GitOps or SRIOV operator), so we need a way of finding all jobs first

do you have a link to such a job so we can see the result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was wrong - I noticed the test didn't pass on a cluster with pipelines+gitops operators installed, but probably its due to service serving CAs, which are accounted now.
However Hypershift jobs would certainly fail, but its relatively easy to fix.

Just to be safe we should switch flake to fail in a separate PR so that TRT would revert just the small part

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be safe we should switch flake to fail in a separate PR so that TRT would revert just the small part

Yes. I also identified 21c9d0d which I'm prepared to sacrifice for now to get the majority enforcing.

@vrutkovs vrutkovs force-pushed the tls-registry-readme branch 2 times, most recently from cadc31f to a26f447 Compare December 4, 2023 10:55
@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: a26f447

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node IncompleteTests
Tests for this run (24) are below the historical average (1651): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

tls/README.md Outdated

Recently we've added the `openshift.io/owning-component: foobar` annotation to (almost) all TLS artifacts
so that issues related to this TLS artifact would be routed to a proper location. For instance,
problems with `service-ca` TLS artifacts are filed in Jira for "Networking / cluster-network-operator" component
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
problems with `service-ca` TLS artifacts are filed in Jira for "Networking / cluster-network-operator" component
problems with `service-ca` TLS artifacts are filed in Jira for "service-ca" component

or perhaps you intended to point to the proxy-cas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant proxy CAs

tls/README.md Show resolved Hide resolved
tls/README.md Outdated

## Certificate metadata

TLS artifact contents may however be insufficient - i.e. it's not clear which OpenShift is responsible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TLS artifact contents may however be insufficient - i.e. it's not clear which OpenShift is responsible
TLS artifact contents may however be insufficient - i.e. it's not clear which product component is responsible

tls/README.md Outdated
Comment on lines 203 to 220
type OwnerRequirement struct {
name string
}

func NewOwnerRequirement() tlsmetadatainterfaces.Requirement {
return OwnerRequirement{
name: "ownership",
}
}

// Return requirement name for filenames. Must be unique
func (o OwnerRequirement) GetName() string {
return o.name
}

// Boilerplate function which processes TLS registry and produces RequirementResult or throws an error
func (o OwnerRequirement) InspectRequirement(rawData []*certgraphapi.PKIList) (tlsmetadatainterfaces.RequirementResult, error) {
pkiInfo, err := tlsmetadatainterfaces.ProcessByLocation(rawData)
if err != nil {
return nil, fmt.Errorf("transforming raw data %v: %w", o.GetName(), err)
}

ownershipJSONBytes, err := json.MarshalIndent(pkiInfo, "", " ")
if err != nil {
return nil, fmt.Errorf("failure marshalling %v.json: %w", o.GetName(), err)
}
markdown, err := generateOwnershipMarkdown(pkiInfo)
if err != nil {
return nil, fmt.Errorf("failure marshalling %v.md: %w", o.GetName(), err)
}
violations := generateViolationJSON(pkiInfo)
violationJSONBytes, err := json.MarshalIndent(violations, "", " ")
if err != nil {
return nil, fmt.Errorf("failure marshalling %v-violations.json: %w", o.GetName(), err)
}

return tlsmetadata.NewRequirementResult(
o.GetName(),
ownershipJSONBytes,
markdown,
violationJSONBytes)
}
```

This interface calls two internal functions:
```golang
func generateViolationJSON(pkiInfo *certgraphapi.PKIRegistryInfo) *certgraphapi.PKIRegistryInfo {
ret := &certgraphapi.PKIRegistryInfo{}

// Iterate over certificate-key pairs (==secrets)
for i := range pkiInfo.CertKeyPairs {
curr := pkiInfo.CertKeyPairs[i]
owner := curr.CertKeyInfo.OwningJiraComponent
// Include in the violations list if has no owner set or owner set to Unknown
if len(owner) == 0 || owner == tlsmetadata.UnknownOwner {
ret.CertKeyPairs = append(ret.CertKeyPairs, curr)
}
}

// Iterate over CA bundles (==configmaps)
for i := range pkiInfo.CertificateAuthorityBundles {
curr := pkiInfo.CertificateAuthorityBundles[i]
owner := curr.CABundleInfo.OwningJiraComponent
// Include in the violations list if has no owner set or owner set to Unknown
if len(owner) == 0 || owner == tlsmetadata.UnknownOwner {
ret.CertificateAuthorityBundles = append(ret.CertificateAuthorityBundles, curr)
}
}

return ret
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type OwnerRequirement struct {
name string
}
func NewOwnerRequirement() tlsmetadatainterfaces.Requirement {
return OwnerRequirement{
name: "ownership",
}
}
// Return requirement name for filenames. Must be unique
func (o OwnerRequirement) GetName() string {
return o.name
}
// Boilerplate function which processes TLS registry and produces RequirementResult or throws an error
func (o OwnerRequirement) InspectRequirement(rawData []*certgraphapi.PKIList) (tlsmetadatainterfaces.RequirementResult, error) {
pkiInfo, err := tlsmetadatainterfaces.ProcessByLocation(rawData)
if err != nil {
return nil, fmt.Errorf("transforming raw data %v: %w", o.GetName(), err)
}
ownershipJSONBytes, err := json.MarshalIndent(pkiInfo, "", " ")
if err != nil {
return nil, fmt.Errorf("failure marshalling %v.json: %w", o.GetName(), err)
}
markdown, err := generateOwnershipMarkdown(pkiInfo)
if err != nil {
return nil, fmt.Errorf("failure marshalling %v.md: %w", o.GetName(), err)
}
violations := generateViolationJSON(pkiInfo)
violationJSONBytes, err := json.MarshalIndent(violations, "", " ")
if err != nil {
return nil, fmt.Errorf("failure marshalling %v-violations.json: %w", o.GetName(), err)
}
return tlsmetadata.NewRequirementResult(
o.GetName(),
ownershipJSONBytes,
markdown,
violationJSONBytes)
}
```
This interface calls two internal functions:
```golang
func generateViolationJSON(pkiInfo *certgraphapi.PKIRegistryInfo) *certgraphapi.PKIRegistryInfo {
ret := &certgraphapi.PKIRegistryInfo{}
// Iterate over certificate-key pairs (==secrets)
for i := range pkiInfo.CertKeyPairs {
curr := pkiInfo.CertKeyPairs[i]
owner := curr.CertKeyInfo.OwningJiraComponent
// Include in the violations list if has no owner set or owner set to Unknown
if len(owner) == 0 || owner == tlsmetadata.UnknownOwner {
ret.CertKeyPairs = append(ret.CertKeyPairs, curr)
}
}
// Iterate over CA bundles (==configmaps)
for i := range pkiInfo.CertificateAuthorityBundles {
curr := pkiInfo.CertificateAuthorityBundles[i]
owner := curr.CABundleInfo.OwningJiraComponent
// Include in the violations list if has no owner set or owner set to Unknown
if len(owner) == 0 || owner == tlsmetadata.UnknownOwner {
ret.CertificateAuthorityBundles = append(ret.CertificateAuthorityBundles, curr)
}
}
return ret
}
const annotationName string = "certificates.openshift.io/supports-offline-hostname-change"
type SupportsOfflineHostnameChange struct{}
func NewSupportsOfflineHostnameChange() tlsmetadatainterfaces.Requirement {
md := tlsmetadatainterfaces.NewMarkdown("")
md.Text("Offline hostname change is an SNO feature driven using tool (provide link here) while a cluster is not running.")
return tlsmetadatainterfaces.NewAnnotationRequirement(
// requirement name
"offline-hostname-change",
// cert or configmap annotation
annotationName,
"Supports Offline Hostname Change",
string(md.ExactBytes()),
)
}

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 33164f8

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-openstack-ovn High
[Jira:"Image Registry"] monitor test image-registry-availability setup
This test has passed 100.00% of 3 runs on release 4.16 [amd64 ha openstack ovn] in the last week.

@deads2k
Copy link
Contributor

deads2k commented Jan 4, 2024

a few typos to tidy up, but we can do that as a followup

/lgtm

@deads2k
Copy link
Contributor

deads2k commented Jan 4, 2024

/retest-required

@deads2k deads2k added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 4, 2024
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 4, 2024
Copy link
Contributor

openshift-ci bot commented Jan 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 4, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b4b9ccc and 2 for PR HEAD c17a17c in total

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: c17a17c

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial Low
[sig-arch] events should not repeat pathologically for ns/openshift-etcd-operator
This test has passed 28.07% of 57 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node-serial'] in the last 14 days.

@vrutkovs
Copy link
Member Author

vrutkovs commented Jan 5, 2024

/retest-required

Copy link
Contributor

openshift-ci bot commented Jan 5, 2024

@vrutkovs: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node c17a17c link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-cgroupsv2 c17a17c link false /test e2e-aws-ovn-cgroupsv2
ci/prow/e2e-openstack-ovn c17a17c link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node-serial c17a17c link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-agnostic-ovn-cmd c17a17c link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-aws-ovn-single-node-upgrade c17a17c link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-gcp-ovn-rt-upgrade c17a17c link false /test e2e-gcp-ovn-rt-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 4d2a9d3 into openshift:master Jan 5, 2024
15 of 22 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-enterprise-tests-container-v4.16.0-202401051232.p0.g4d2a9d3.assembly.stream for distgit openshift-enterprise-tests.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants