Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insights Operator pulling and exposing data from the OCM API #683

Merged
merged 19 commits into from Aug 18, 2021
Merged
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
156 changes: 156 additions & 0 deletions enhancements/insights/pulling-sca-certs-from-ocm.md
@@ -0,0 +1,156 @@
---
title: pulling-and-exposing-sca-certs-from-ocm
authors:
- "@tremes"
reviewers:
- "@sbose78"
- "@inecas"
- "@petli-openshift"
- "@bparees"
- "@dhellman"
- "@mfojtik"
- "@adambkaplan"
approvers:
- "@sbose78"
- "@bparees"
- "@dhellman"
creation-date: 2021-03-04
last-updated: 2021-08-10
status: implementable
see-also:
replaces:
superseded-by:
---

# Insights Operator pulling and exposing SCA certs from the OCM API

## Release Signoff Checklist

- [x] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [x] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

This enhancement will enable the Insights Operator to pull the data (SCA certs)
tremes marked this conversation as resolved.
Show resolved Hide resolved
from the OCM (OpenShift Cluster Manager) API. The data will be exposed by the Insights Operator
tremes marked this conversation as resolved.
Show resolved Hide resolved
in the OpenShift API to allow users to use them when consuming and building container images
on the platform.

## Motivation

Users could consume RHEL content and container images using the RHEL subscription in the OpenShift 3.x.
In the OpenShift 4, this is no longer possible because the Red Hat Enterprise Linux Core OS (RHCOS) does not
provide any attached subscription. This enhancement is to provide users the Simple Content Access (SCA) certs
from Red Hat Subscription Manager (RHSM).

### Goals

- Extend the Insights Operator config with an OCM API URL to be able to query the data
- Periodically pull the data from the OCM API and expose it in the OpenShift API
- This is an opt-in feature by a cluster user and might be moved to a different OCP component in the future
tremes marked this conversation as resolved.
Show resolved Hide resolved

### Non-Goals

- Insights Operator providing any transformation or post-processing of the SCA certs pulled
from the OCM API

## Proposal

### Why is it in the Insights Operator?

The Insights Operator is now the only OCP component that connects an OpenShift cluster to a Red Hat subscription experience (console.redhat.com APIs). The consumers of the SCA certs are not only builds, but also shared resources, such as the CSI driver.
tremes marked this conversation as resolved.
Show resolved Hide resolved

### User Stories

#### Consume SCA certs exposed in the API

As an OpenShift user
I want to consume SCA certs to be able to consume RHEL content and to build
corresponding container images.

### Risks and Mitigations

#### OCM API is down

Risk: OCM API is down or doesn't provide up to date data.

Risk: Insights Operator is unable to expose/update the data in the OpenShift API

Mitigation: Introduce a new state in the Insights Operator (e.g "SCADataDegraded") and
create a new alert based on this new state.
tremes marked this conversation as resolved.
Show resolved Hide resolved

## Design Details
tremes marked this conversation as resolved.
Show resolved Hide resolved
tremes marked this conversation as resolved.
Show resolved Hide resolved

### Authorization

The Insights Operator is able to pull the data from the OCM API using the existing `cloud.openshift.com` token
available in the `pull-secret` (in the `openshift-config-managed` namespace).
tremes marked this conversation as resolved.
Show resolved Hide resolved

The Insights Operator must provide a cluster ID as an identifier of the cluster.

### SCA certs in API

The SCA certificate is available via the `etc-pki-entitlement` secret in the `openshift-config-managed` namespace. The secret will be available for use in other namespaces by creating a cluster-scoped `Share` resource. Cluster admin creates a `clusterrolebinding` to allow a service account access to the `Share` resource.
tremes marked this conversation as resolved.
Show resolved Hide resolved
tremes marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not covered in this sentence is the structure of the Secret. Will a particular well-known key be used? If so, can we document it here, or is that a low-enough level of detail that it's not worth including in the enhancement proposal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the key name can be any .pem file. On a RHEL system the subscription cert key names have what appear to be a uuid for the file name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should mention it here. I am using kubernetes.io/tls secret with tls.crt for the certificate and tls.key for the private key.


### Use of the SCA certs

- The SCA certificate can be mounted to a `Pod` as a CSI volume (where the volume attributes will reference the `Share` resource making the secret accessible)
- The SCA certificate can be mounted to a `Build` strategy as a CSI volume. The CSI driver is described in the [Share Secrets And ConfigMaps Across Namespaces via a CSI Driver](/enhancements/cluster-scope-secret-volumes/csi-driver-host-injections.md) enhancement.

### Update period
- Insights Operator query the OCM API every 8 hours and downloads the full data provided
- The time period is configurable and can be changed by the cluster admin. Cluster admin can temporarily set a shorter time period to try to refresh the SCA certs
tremes marked this conversation as resolved.
Show resolved Hide resolved
- The documentation will describe the steps how to pull the SCA certs and update the secret manually

### Test Plan

- `insights-operator-e2e-tests` suite can verify the SCA cert data
is available
- Basic test of the validity of the SCA certs. Mount the `etc-pki-entitlement` secret and run e.g `yum install` in the container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since most consumers will presumably be mounting the Share, maybe this integration test should use that approach instead of shortcutting to use the Secret directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was discussed in a since-resolved comment thread somewhere, but.... i don't want the insights operator team's testing to be dependent on Share behavior.

they have the ability to test their functionality end to end directly, so they should do that.

The team that owns the Shared-Resource driver+builds should have tests that ensure they can consume this content successfully at their end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bits involving the Share resource should ultimately live in the OpenShift build suite. Test plan as follows:

  1. insights operator obtains SCA cert from OCM
  2. insights operator creates Share resource and ClusterRoleBinding
  3. ocp build suite creates a build that does a yum install of subscription-only content


### Graduation Criteria

This feature is planned as a technical preview in OCP 4.9 and is planned to go GA in 4.10.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the Share bits are now planned as tech preview for OCP 4.10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that it will block the graduation criteria from the TP to GA mentioned here? If so then we would need to go GA with your bits...I guess


#### Dev Preview -> Tech Preview
- opt-in feature (called `InsightsOperatorPullingSCA`) enabled with `TechPreviewNoUpgrade` feature set
- Insights Operator is able to download the data from OCM API and expose it in a cluster API
- basic functionality is tested
- this new functionality is documented

#### Tech Preview -> GA
- ability to distinguish various error states - e.g organization doesn't have SCA allowed versus API returns an error
- inform a cluster user about the error state (problem with pulling the certificates)
- the feature might be moved to a different OCP component

#### Removing a deprecated feature

The periodical data pulling can be easily disabled in the cluster configuration. Removing this feature will require updating the Insights operator code base and will remove the `etc-pki-entitlement` secret from the `openshift-config-managed` namespace.

### Upgrade / Downgrade Strategy

There is no upgrade/downgrade strategy needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a cluster is downgraded to a version that does not poll for entitlement updates, will that version of the insights operator (4.8.z?) have logic around to remove the etc-pki-entitlement secret and other cruft to keep in-cluster components from trying to consume stale data? If the insights operator is downgraded before some consumer (builds and/or CSI drivers?), will the higher-version consumers gracefully handle the consumed secret's removal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a cluster is downgraded to a version that does not poll for entitlement updates, will that version of the insights operator (4.8.z?)

for tech preview it's not applicable since upgrade/downgrade isn't allowed, but in general if you downgrade below the level at which the insights operator has this behavior then yes, i'd expect the content to become stale (i'm not sure how long the tokens are good for)

i'm not sure how you'd propose to solve that on downgrade, though. the older version of the operator wouldn't even know about the content to remove it(even if we could agree that was the right thing to do, which i don't think i do)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this sounds like an edge case to me and yes there will be a stale secret in such case.


### Version Skew Strategy

There is no Skew strategy needed. This work should have no impact on the upgrade. It doesn't require any coordinated behavior in the control plane. No other components will change.

The format of the SCA certs is not checked by the Insights Operator.

## Implementation History

There are no other major milestones in the implementation history than the graduation criteria mentioned above.

## Drawbacks

The performance of the OCM API can be a possible drawback.

## Alternatives

- Alternative is to implement this functionality in another control plane component/operator (e.g openshift-controller-manager).
- Another option is to create a new component/operator for this functionality. This would probably require the most effort and would require additional CPU and memory resources in a cluster.
- Current state, which is the manual addition of the SCA certs to cluster worker nodes. This is not very convenient because the SCA certs change regularly and the change requires node reboot.