Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeeringAcceptor controller #1225

Merged
merged 44 commits into from
May 27, 2022
Merged

PeeringAcceptor controller #1225

merged 44 commits into from
May 27, 2022

Conversation

ndhanushkodi
Copy link
Contributor

@ndhanushkodi ndhanushkodi commented May 17, 2022

Changes proposed in this PR:

  • PeeringAcceptor CR + controller + unit tests
  • was NOT planning to unit test getExistingSecret or createK8sPeeringTokenSecretWithOwner because their cases are covered in the ReconcileCreateUpdate test. LMK if you see gaps or prefer the helper function tests.
  • was NOT planning to unit test the generateToken, deletePeering, or createSecret helper functions as they are heavily used throughout other tests.

Future PRs (tasks are in Cluster Peering Asana):

  • PeeringDialer CR + controller
  • upstream changes for CRDs and upstream annotation
  • Acceptance test for service mesh
  • Version annotation (and storing that as peering metadata)
  • Adding Finalizer
  • Add peering state/status to the CR status

How I've tested this PR:
helm install nitya ../../consul-k8s/charts/consul -f consul-values.yml

consul-values.yml:

global:
  domain: consul
  datacenter: dc1
  imageK8S: ndhanushkodi/consul-k8s-dev:peering22
  image: ndhanushkodi/consul-dev:peering4
server:
  replicas: 1
  bootstrapExpect: 1
client:
  enabled: true
  grpc: true
ui:
  enabled: true
connectInject:
  enabled: true
controller:
  enabled: true

k apply -f peering-acceptor.yaml

apiVersion: consul.hashicorp.com/v1alpha1
kind: PeeringAcceptor
metadata:
  name: peeraaaaa
spec:
  peer:
    secret:
      name: "foo"
      key: "bar"
      backend: "kubernetes"

k delete -f peering-acceptor.yaml

Look at the logs and make sure it's as expected.

How I expect reviewers to test this PR:
Code review

Checklist:

  • Tests added
  • [-] CHANGELOG entry added

    HashiCorp engineers only, community PRs should not add a changelog entry.
    Entries should use present tense (e.g. Add support for...)

@ndhanushkodi ndhanushkodi changed the base branch from main to peering May 17, 2022 18:23
@ndhanushkodi ndhanushkodi changed the title Peering controllers PeeringAcceptor controller May 17, 2022
Copy link
Contributor

@thisisnotashwin thisisnotashwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what im seeing so far

@ndhanushkodi ndhanushkodi force-pushed the initial-controller branch 3 times, most recently from dfe0cc6 to ba63d22 Compare May 20, 2022 04:53
@ndhanushkodi ndhanushkodi marked this pull request as ready for review May 23, 2022 16:36
@ndhanushkodi ndhanushkodi requested review from thisisnotashwin, a team, t-eckert and ishustava and removed request for a team May 23, 2022 16:41
@@ -142,14 +142,22 @@ jobs:
- run: mkdir -p ${{env.TEST_RESULTS}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file would be removed when merging in the feature branch. This is because the most recently released version of consul does not have peering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we have the right binaries by the time we merge? I'm thinking that maybe we should keep it on the feature branch as well until we have released consul binaries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup agreed, we'd not want to cut a release with all this unless there's something for Consul also plus we'll want to address TODOs and unit tests (for peering dialer) on the feature branch peering before merging to main.

@t-eckert
Copy link
Contributor

I don't feel I have much to add given my lack of context. I perused to really just gain a better understanding of what you are working on here. I think you've expressed it very clearly.

Copy link
Contributor

@ishustava ishustava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Nitya! As usual, love all the comments and the thought you put into this ❤️

I'm leaving my comments so far (I got to reviewing most of the reconcile logic, except for some private functions and tests).

I will review some more today, but wanted to leave my comments so far. Some of them are very minor and I marked them as non-blockers. Others are more questions or thoughts about edge cases, so feel free to punt on those too if need be!

@@ -142,14 +142,22 @@ jobs:
- run: mkdir -p ${{env.TEST_RESULTS}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we have the right binaries by the time we merge? I'm thinking that maybe we should keep it on the feature branch as well until we have released consul binaries.

@@ -0,0 +1,99 @@
{{- if .Values.connectInject.enabled }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud. I think it makes sense to make this part of connect injector. I think at some point we wanted to make the CRD controller part of the injector deployment as well, so I think making this controller be part of connect inject makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds good-- there is technically a service discovery case that can be enabled without service mesh, which could mean Consul DNS for imported services could work without enabling mesh functionality. Do you think it's worth enabling these controllers by default, not gated on any values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think we will in the future, but probably not for this feature.


// Look up existing secret.
var existingSecret *corev1.Secret
if statusSecretSet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could there be an edge case when the status is not set but secret exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time we create a secret, we do then set the status. However, if the secret was successfully created but the status didn't properly get updated, then the scenario you described could happen. However, that would return a reconcile error since updateStatus would return an error. Upon the next reconcile, statusSecretSet would be false, but the peering would exist in Consul, so it'll go into the case in L140 and try to generate a token again, so I think it'd be fine.

// Compare the existing secret hash.
// Get the secret specified by the status, make sure it matches the status' secret.latestHash.
if existingSecret != nil {
existingSecretHashBytes := sha256.Sum256(existingSecret.Data[peeringAcceptor.Status.Secret.Key])
Copy link
Contributor

@ishustava ishustava May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use secret's resourceVersion instead of computing a hash?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea!!

Copy link
Contributor Author

@ndhanushkodi ndhanushkodi May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, great idea! update: still working on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use resource version! this was a great idea!

Copy link
Contributor

@ishustava ishustava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Leaving my remaining comments. I think this is good to go once we decide what to do about some questions I left.

}

// updateStatusError updates the peeringAcceptor's ReconcileError in the status.
func (r *PeeringAcceptorController) updateStatusError(ctx context.Context, peeringAcceptor *consulv1alpha1.PeeringAcceptor, reconcileErr error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always ignore the error that's returned by this function. I'm assuming we shouldn't, so maybe it could be a TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything we can do to handle the error from this function. If we fail to update the status with the error message, we don't want the status update failure to be what's returned from the reconcile, instead we want the real error. I suppose we could have this function just not return an error, and document that we are ignoring any errors in this case? The only error case is if the K8s api is down when trying to make an update to the status, and instead the real error would be returned as a reconcile error, wdyt?

listKind: PeeringAcceptorList
plural: peeringacceptors
singular: peeringacceptor
scope: Namespaced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be a cluster resource?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was something that i had suggested. This to me felt similar to our config entries of proxy_defaults and mesh ie it is ok for these to be namespaced because there are users that prefer to save all of their config in a single configuration namespace and this feels like something that falls into the same category. additionally, given it is responsible for creating and managing a secret, it is easy to assume that the secret needs to be created in the same namespace as the resource itself instead of us having to determine which namespace the secret has to be written to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh ok! I was thinking that you would only want one of these per cluster, and so that's why it might make sense from rbac perspective to make it cluster-wide. For secret, i think it's ok to manage a secret in some namespace (e.g. ClusterRoleBinding refers to a service account and service accounts are namespaced).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I also think it might be weird for PeeringAcceptors to be cluster wide and for their managed secrets to be namespaced, it might be nicer to always create the secret in whatever ns the PeeringAcceptor is in. And yup there can be multiple PeeringAcceptors in a single cluster

@ndhanushkodi
Copy link
Contributor Author

Updates:

Hey @ishustava re-requesting review, while still addressing:

  • secret's resourceVersion instead of computing a hash (need to fix tests for this case, and make sure it all works)

@thisisnotashwin I'm still addressing:

  • using the object reference
  • refactor to have getters on the CRD fields

@ndhanushkodi
Copy link
Contributor Author

Made a cleanup Asana to refactor to have getters on the CRD fields

Currently thinking to not using the object reference, and using the local type instead due to https://github.com/kubernetes/api/blob/be84346886a48ff00fc676f9cb37f10e79dfd0da/core/v1/types.go#L5624

Copy link
Contributor

@thisisnotashwin thisisnotashwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. i haven't tested the behavior on a cluster myself but this great!

@ndhanushkodi ndhanushkodi merged commit 590ce14 into peering May 27, 2022
@ndhanushkodi ndhanushkodi deleted the initial-controller branch May 27, 2022 18:16
ndhanushkodi added a commit that referenced this pull request Jun 13, 2022
Adds new PeeringAcceptor CRD and controller with unit tests.

This controller determines whether to generate a new peering token in Consul by reconciling the PeeringAcceptor resource. It currently supports writing the peering token to a K8s secret. 

Generating the new CRD and API:
After updating control-plane/PROJECT L6 to reference the control plane
folder, ran the command:
`operator-sdk create api --group consul --version v1alpha1 --kind Peering --controller --namespaced=true --make=false --resource=true`
ndhanushkodi added a commit that referenced this pull request Jun 13, 2022
Adds new PeeringAcceptor CRD and controller with unit tests.

This controller determines whether to generate a new peering token in Consul by reconciling the PeeringAcceptor resource. It currently supports writing the peering token to a K8s secret.

Generating the new CRD and API:
After updating control-plane/PROJECT L6 to reference the control plane
folder, ran the command:
`operator-sdk create api --group consul --version v1alpha1 --kind Peering --controller --namespaced=true --make=false --resource=true`
ndhanushkodi added a commit that referenced this pull request Jun 13, 2022
Adds new PeeringAcceptor CRD and controller with unit tests.

This controller determines whether to generate a new peering token in Consul by reconciling the PeeringAcceptor resource. It currently supports writing the peering token to a K8s secret.

Generating the new CRD and API:
After updating control-plane/PROJECT L6 to reference the control plane
folder, ran the command:
`operator-sdk create api --group consul --version v1alpha1 --kind Peering --controller --namespaced=true --make=false --resource=true`
ndhanushkodi added a commit that referenced this pull request Jun 15, 2022
Adds new PeeringAcceptor CRD and controller with unit tests.

This controller determines whether to generate a new peering token in Consul by reconciling the PeeringAcceptor resource. It currently supports writing the peering token to a K8s secret.

Generating the new CRD and API:
After updating control-plane/PROJECT L6 to reference the control plane
folder, ran the command:
`operator-sdk create api --group consul --version v1alpha1 --kind Peering --controller --namespaced=true --make=false --resource=true`
thisisnotashwin pushed a commit that referenced this pull request Jun 16, 2022
Adds new PeeringAcceptor CRD and controller with unit tests.

This controller determines whether to generate a new peering token in Consul by reconciling the PeeringAcceptor resource. It currently supports writing the peering token to a K8s secret.

Generating the new CRD and API:
After updating control-plane/PROJECT L6 to reference the control plane
folder, ran the command:
`operator-sdk create api --group consul --version v1alpha1 --kind Peering --controller --namespaced=true --make=false --resource=true`
thisisnotashwin pushed a commit that referenced this pull request Jun 16, 2022
Adds new PeeringAcceptor CRD and controller with unit tests.

This controller determines whether to generate a new peering token in Consul by reconciling the PeeringAcceptor resource. It currently supports writing the peering token to a K8s secret.

Generating the new CRD and API:
After updating control-plane/PROJECT L6 to reference the control plane
folder, ran the command:
`operator-sdk create api --group consul --version v1alpha1 --kind Peering --controller --namespaced=true --make=false --resource=true`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants