Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Encryption Config #516

Closed
wants to merge 97 commits into from
Closed

[WIP] Encryption Config #516

wants to merge 97 commits into from

Conversation

enj
Copy link
Contributor

@enj enj commented Jul 3, 2019

  1. Observer for a single secret with the complete config
  2. Wiring of secret to revision control and syncing from managed namespace
  3. Empty shell of encryption controller - I deleted the logic I was considering

Signed-off-by: Monis Khan mkhan@redhat.com

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 3, 2019
@@ -61,10 +64,11 @@ func NewConfigObserver(
SchedulerLister: configInformer.Config().V1().Schedulers().Lister(),

ConfigmapLister: kubeInformersForNamespaces.ConfigMapLister(),
SecretLister_: kubeInformersForNamespaces.SecretLister(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

underscore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matches others in the same file.

return err
}

switch originalSpec.ManagementState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have library-go helper to deal with this

}

func NewEncryptionObserver(targetNamespace string, encryptionConfigPath []string) configobserver.ObserveConfigFunc {
return func(genericListers configobserver.Listers, recorder events.Recorder, existingConfig map[string]interface{}) (map[string]interface{}, []error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange to return a func this big

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think this is because we try to be generic. Also what I personally don't like about the definition is that we lose type safety and it's really hard to read what the function does. We accept map[string]interface{} and return map[string]interface{} :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange to return a func this big

I did not want to create a new struct just to close over the inputs.

encryptionConfigSecret, err := listers.SecretLister().Secrets(targetNamespace).Get(encryptionConfSecret)
if errors.IsNotFound(err) {
recorder.Warningf("ObserveEncryptionConfigNotFound", "encryption config secret %s/%s not found", targetNamespace, encryptionConfSecret)
// TODO what is the best thing to do here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what produce this secret?

resourcesynccontroller.ResourceLocation{Namespace: targetNamespace, Name: encryptionConfSecret},
resourcesynccontroller.ResourceLocation{Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace, Name: sourceName},
); err != nil {
panic(err) // coding error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't panic, propagate the error up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 9, 2019
return previouslyObservedConfig, append(errs, err)
}

if !equality.Semantic.DeepEqual(existingEncryptionConfig, []string{encryptionConfFilePath}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could compare existingEncryptionConfig field before SetNestedStringSlice (line 58).

what if existingEncryptionConfig hasn't changed ? Should we then simply return previouslyObservedConfig ? For example:

if equality.Semantic.DeepEqual(existingEncryptionConfig, []string{encryptionConfFilePath}) {
   return previouslyObservedConfig, errs
}
recorder.Eventf("ObserveEncryptionConfigChanged", "encryption config file changed from %s to %s", existingEncryptionConfig, encryptionConfFilePath)
err := unstructured.SetNestedStringSlice(observedConfig, []string{encryptionConfFilePath}
...

errs = append(errs, err)
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is okay to continue even if errs != nil ? Does it mean that previouslyObservedConfig could be incomplete? (some fields could be missing?)

// TODO what is the best thing to do here?
// for now we do not unset the config as we are checking a synced version of the secret that could be deleted
// return observedConfig, errs
return previouslyObservedConfig, errs // do not append the not found error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can fail because errs could actually have some errors (line 35)

}

if !equality.Semantic.DeepEqual(existingEncryptionConfig, []string{encryptionConfFilePath}) {
recorder.Eventf("ObserveEncryptionConfigChanged", "encryption config file changed from %s to %s", existingEncryptionConfig, encryptionConfFilePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we make the msg a bit more accurate and assume that existingEncryptionConfig holds only one element? Then the msg would look like changed from x to y instead of changed from [x] to y

}

func NewEncryptionObserver(targetNamespace string, encryptionConfigPath []string) configobserver.ObserveConfigFunc {
return func(genericListers configobserver.Listers, recorder events.Recorder, existingConfig map[string]interface{}) (map[string]interface{}, []error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think this is because we try to be generic. Also what I personally don't like about the definition is that we lose type safety and it's really hard to read what the function does. We accept map[string]interface{} and return map[string]interface{} :(

Name: fmt.Sprintf("encryption-%s-%s-%s-%d", c.componentName, gr.Group, gr.Resource, keyID),
Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace,
Labels: map[string]string{
encryptionSecretComponent: c.componentName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't encryptionSecretComponent label point to targetNamespace?


func newAES256Key() []byte {
b := make([]byte, 32) // AES-256 == 32 byte key
if _, err := rand.Read(b); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd handle the error, perhaps there are actually cases when this function might fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will never fail on linux, and the OAuth server has matched this behavior since the beginning of OpenShift.

c.componentSelector = labelSelectorOrDie(encryptionSecretComponent + "=" + targetNamespace)

operatorClient.Informer().AddEventHandler(c.eventHandler())
kubeInformersForNamespaces.InformersFor(operatorclient.GlobalMachineSpecifiedConfigNamespace).Core().V1().Secrets().Informer().AddEventHandler(c.eventHandler())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the sync func be triggered if there is no secrets in GlobalMachineSpecifiedConfigNamespace namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operator client informer will always trigger this.

Copy link
Contributor

@p0lyn0mial p0lyn0mial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I truly enjoyed reading your code it would be nice to add documentation and some unit tests at some point.

I'm not sure if I fully understand how the code works but hopefully, the following will help the next reviewers.

NewEncryptionKeyController essentially generates encryption keys for validGRs based on needsNewKey condition and places them in GlobalMachineSpecifiedConfigNamespace ns.

NewEncryptionStateController takes all those keys and generates a secret (encryption-config-kube-apiserver) with an EncryptionConfiguration in GlobalMachineSpecifiedConfigNamespace

SyncEncryptionConfig propagates encryption-config-kube-apiserver secret to a secret in targetNamespace ns.

NewEncryptionPodStateController waits for encryption-config-kube-apiserver in targetNamespace ns and propagates state back to the secrets created by NewEncryptionKeyController by adding some labels.

NewEncryptionMigrationController waits for encryption-config-kube-apiserver in targetNamespace ns and takes all the keys/secrets in GlobalMachineSpecifiedConfigNamespace finds the writeKey and performs encryption by modifing desired resources.

NewEncryptionPruneController removes the last 10 secrets with encryptionSecretMigratedTimestamp label attached.

}

func (c *EncryptionKeyController) sync() error {
if ready, err := shouldRunEncryptionController(c.operatorClient); err != nil || !ready {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this controller and the others are not guarded by a TechPreviewNoUpgrade feature flag?

func (c *EncryptionKeyController) generateKeySecret(gr schema.GroupResource, keyID uint64) *corev1.Secret {
return &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("encryption-%s-%s-%s-%d", c.componentName, gr.Group, gr.Resource, keyID),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I have seen the keyID is monotonic and I'm not sure if I understand how does it relate to
https://github.com/openshift/cluster-kube-apiserver-operator/pull/516/files#diff-9b3719f78d8f89fa6b78d0f3040a3862R159


preRunCachesSynced: []cache.InformerSynced{
operatorClient.Informer().HasSynced,
kubeInformersForNamespaces.InformersFor(operatorclient.GlobalMachineSpecifiedConfigNamespace).Core().V1().Secrets().Informer().HasSynced,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factor out kubeInformersForNamespaces.InformersFor(operatorclient.GlobalMachineSpecifiedConfigNamespace) into var and reuse below. Too easy to lose track of these, leading to ugly bugs.

return configError
}

func (c *EncryptionKeyController) handleEncryptionKey() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"handle" here means that new keys are created if needed, right? Call the function checkAndCreateKeys.


const encWorkKey = "key"

type EncryptionKeyController struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a ten liner comment what this controller does.


preRunCachesSynced []cache.InformerSynced

validGRs map[schema.GroupResource]bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does valid mean? encrypted CRs?


validGRs map[schema.GroupResource]bool

componentName string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a component? Be more precise. Below I see that you get secrets with the selector.

return keyID, true // eh?
}

return keyID, time.Now().After(migrationTimestamp.Add(30 * time.Minute)) // TODO how often?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it a constant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewriting all secrets every 30 min? Is that a good idea?

Copy link
Contributor

@sttts sttts Jul 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton sounds like this interval came from you. We have to migrate all resources, i.e. update/patch all of them. Are we sure something in this dimension makes sense? Was expecting something around days to a month, like for certs.


const stateWorkKey = "key"

type EncryptionStateController struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ten-liner godoc

return configError
}

func (c *EncryptionStateController) handleEncryptionStateConfig() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better name

resourcesynccontroller.ResourceLocation{Namespace: targetNamespace, Name: encryptionConfSecret},
resourcesynccontroller.ResourceLocation{Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace, Name: sourceName},
); err != nil {
panic(err) // coding error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -82,6 +84,61 @@ func RunOperator(ctx *controllercmd.ControllerContext) error {
ctx.EventRecorder,
)

validGRs := map[schema.GroupResource]bool{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encryptedGRs

@@ -190,6 +252,9 @@ var RevisionSecrets = []revision.RevisionResource{
// this is needed so that the cert syncer itself can request certs. It uses localhost
{Name: "kube-apiserver-cert-syncer-client-cert-key"},
{Name: "kubelet-client"},

// etcd encryption
{Name: "encryption-config", Optional: true},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does optional mean that a Secret read error in the kubelet, some race or some admin deleting the secret by accident will lead to an API server to start with invalid encryption config? This will stop it from reading data, but even worse: it might make it store data unencrypted. CVE ahead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to think on this some.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 16, 2019
@enj enj changed the base branch from master to feature-etcd-encryption August 21, 2019 05:05
@enj enj changed the base branch from feature-etcd-encryption to master August 21, 2019 11:36
operatorInformer := operatorClient.Informer()
operatorInformer.AddEventHandler(eventHandler)

secretsInformer := kubeInformersForNamespaces.InformersFor(operatorclient.GlobalMachineSpecifiedConfigNamespace).Core().V1().Secrets().Informer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked myself whether it is safe to assume that an informer will be created for that namespace. I think it is because we do secretClient.Secrets(operatorclient.GlobalMachineSpecifiedConfigNamespace) which uses CachedSecretGetter.

}

if len(revisions) != 1 {
return "", nil // api servers have not converged onto a single revision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could at least log that we are waiting for pods to converge, otherwise, it could be hard to tell why the controller isn't running.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: enj
To complete the pull request process, please assign sttts
You can assign the PR to them by writing /assign @sttts in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@enj
Copy link
Contributor Author

enj commented Sep 4, 2019

/retest
/refresh

@enj
Copy link
Contributor Author

enj commented Sep 10, 2019

/retest

2 similar comments
@enj
Copy link
Contributor Author

enj commented Sep 10, 2019

/retest

@enj
Copy link
Contributor Author

enj commented Sep 11, 2019

/retest

enj and others added 26 commits September 24, 2019 09:47
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
…aded

Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
Signed-off-by: Monis Khan <mkhan@redhat.com>
@openshift-ci-robot
Copy link

@enj: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/verify-deps 8752b4e link /test verify-deps
ci/prow/verify 8752b4e link /test verify
ci/prow/e2e-aws 8752b4e link /test e2e-aws
ci/prow/e2e-aws-upgrade 8752b4e link /test e2e-aws-upgrade
ci/prow/e2e-aws-operator-encryption 8752b4e link /test e2e-aws-operator-encryption

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@enj
Copy link
Contributor Author

enj commented Oct 9, 2019

@mfojtik @sttts @p0lyn0mial you probably care about this sooner rather than later.

/close

@openshift-ci-robot
Copy link

@enj: Closed this PR.

In response to this:

@mfojtik @sttts @p0lyn0mial you probably care about this sooner rather than later.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants