Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insights Operator pulling and exposing data from the OCM API #683

Merged
merged 19 commits into from Aug 18, 2021
Merged
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
139 changes: 139 additions & 0 deletions enhancements/insights/pulling-data-from-ocm.md
@@ -0,0 +1,139 @@
---
title: pulling-and-exposing-data-from-ocm
authors:
- "@tremes"
reviewers:
- "@sbose78"
- "@inecas"
- "@petli-openshift"
- "@smarterclayton"
approvers:
- "@sbose78"
- "@smarterclayton"
creation-date: 2021-03-04
last-updated: 2021-03-09
status: implementable
see-also:
replaces:
superseded-by:
---

# Insights Operator pulling and exposing data from the OCM API
tremes marked this conversation as resolved.
Show resolved Hide resolved

## Release Signoff Checklist

- [x] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

This enhancement will enable the Insights Operator to pull the data (SCA certs)
dhellmann marked this conversation as resolved.
Show resolved Hide resolved
from the OCM (OpenShift Cluster Manager) API. The data will be exposed by the Insights Operator
in the OpenShift API to allow users to use them when consuming and building container images
on the platform.

## Motivation

Users could consume RHEL content and container images using the RHEL subscription in the OpenShift 3.x.
In the OpenShift 4, this is no longer possible because the Red Hat Enterprise Linux Core OS (RHCOS) does not
provide any attached subscription. This enhancement is to provide users the Simple Content Access (SCA) certs
from Red Hat Subscription Manager (RHSM).
The Insights Operator is now the only OCP component that connects an OpenShift cluster to a Red Hat subscription experience (console.redhat.com APIs). The consumers of the SCA certs are not only builds, but also shared resources, such as the CSI driver.

### Goals

- Extend the Insights Operator config with an OCM API URL to be able to query the data
dhellmann marked this conversation as resolved.
Show resolved Hide resolved
- Periodically pull the data from the OCM API and expose it in the OpenShift API
- This is an opt-in feature by a cluster user and might be moved to a different OCP component in the future

### Non-Goals

- Insights Operator providing any transformation or post-processing of the SCA certs pulled
from the OCM API

## Proposal

### User Stories

#### Consume SCA certs exposed in the API

As an OpenShift user
I want to consume SCA certs to be able to consume RHEL content and to build
corresponding container images.
tremes marked this conversation as resolved.
Show resolved Hide resolved

### Risks and Mitigations

#### OCM API is down

Risk: OCM API is down or doesn't provide up to date data.

Risk: Insights Operator is unable to expose/update the data in the OpenShift API

Mitigation: Introduce a new state in the Insights Operator (e.g "SCADataDegraded") and
tremes marked this conversation as resolved.
Show resolved Hide resolved
create a new alert based on this new state.
tremes marked this conversation as resolved.
Show resolved Hide resolved

## Design Details
tremes marked this conversation as resolved.
Show resolved Hide resolved

### Authorization

The Insights Operator is able to pull the data from the OCM API using the existing `cloud.openshift.com` token
available in the `pull-secret` (in the `openshift-config-managed` namespace).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who put that secret there? it is managed, so some operator has to sync the secret there :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good question - is this something that the installer does, or does something like the CVO sync the pull secret in openshift-config to openshift-config-managed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the code, it looks like it's just using the pull-secret from openshift-config directly, not openshift-config-managed.

so the expectation is that secret is maintained by someone (if it expires, your nodes will have issues pulling images), so i don't think it's the responsibility of this EP to worry about it, other than that this EP should address what happens if it can't talk to OCM because the token expires(or for any other reason), which i think the EP does already.

see: https://github.com/tremes/insights-operator/blob/154641a9fae7c5100f7dce3cf0eddf4b38d56cea/pkg/config/configobserver/configobserver.go#L78

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question was why the new etc-pki-entitlement secret is created in the openshift-config-managed namespace (rather than openshift-config ?).

This requires a new Role & RoleBinding definition to be able to create the secret in the openshift-config-managed namespace. Is that OK/acceptable? Comparing to extending our existing role to be able to create the secret in the openshift-config namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question was why the new etc-pki-entitlement secret is created in the openshift-config-managed namespace (rather than openshift-config ?).

Configuration in openshift-config are usually added manually by admins compared, given that etc-pki-entitlement is managed by an OpenShift component, Adam and I suggested it be managed in the openshift-config-managed namespace.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfojtik Hi, are you Ok if we resolve this conversation based on the comments above?


The Insights Operator must provide a cluster ID as an identifier of the cluster.

### Data in API

The SCA certificate is available via the `etc-pki-entitlement` secret in the `openshift-config-managed` namespace.
tremes marked this conversation as resolved.
Show resolved Hide resolved

### Update period
- Insights Operator query the OCM API every 8 hours and downloads the full data provided
tremes marked this conversation as resolved.
Show resolved Hide resolved

### Test Plan

- `insights-operator-e2e-tests` suite can verify the SCA cert data
is available
- Basic test of the validity of the SCA certs. Mount the `etc-pki-entitlement` secret and run e.g `yum install` in the container
tremes marked this conversation as resolved.
Show resolved Hide resolved

### Graduation Criteria

This feature is planned as a technical preview in OCP 4.9 and is planned to go GA in 4.10.
bparees marked this conversation as resolved.
Show resolved Hide resolved

#### Dev Preview -> Tech Preview
- opt-in feature (called `InsightsOperatorPullingSCA`) enabled with `TechPreviewNoUpgrade` feature set
- Insights Operator is able to download the data from OCM API and expose it in a cluster API
- basic functionality is tested
- this new functionality is documented

#### Tech Preview -> GA
- ability to distinguish various error states - e.g organization doesn't have SCA allowed versus API returns an error
- inform a cluster user about the error state (problem with pulling the certificates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add admission control for CSI volumes as a GA graduation criteria

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add admission control for CSI volumes as a GA graduation criteria

why?

Consumption of the content seems outside the scope of this EP. The GA version of this EP is a component that can pull the cert down and put it in a well-defined location that we are happy with, reliably+securely, that is debuggable, provides useful status info, etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we resolve this point @bparees @adambkaplan ?

- the feature might be moved to a different OCP component

#### Removing a deprecated feature

The periodical data pulling can be easily disabled in the cluster configuration. Removing this feature will require updating the Insights operator code base and will remove the `etc-pki-entitlement` secret from the `openshift-config-managed` namespace.

### Upgrade / Downgrade Strategy

There is no upgrade/downgrade strategy needed.

### Version Skew Strategy

There is no Skew strategy needed. This work should have no impact on the upgrade. It doesn't require any coordinated behavior in the control plane. No other components will change.
tremes marked this conversation as resolved.
Show resolved Hide resolved

## Implementation History

There are no other major milestones in the implementation history than the graduation criteria mentioned above.

## Drawbacks

There is no significant drawback.
tremes marked this conversation as resolved.
Show resolved Hide resolved

## Alternatives

- Alternative is to implement this functionality in another control plane component/operator (e.g openshift-controller-manager).
- Current state, which is the manual addition of the SCA certs to cluster worker nodes. This is not very convenient because the SCA certs change regularly and the change requires node reboot.
dhellmann marked this conversation as resolved.
Show resolved Hide resolved