Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NET-9500] Cleanup orphaned inline-certs and acl role/policy #4067

Merged

Conversation

jm96441n
Copy link
Member

@jm96441n jm96441n commented Jun 3, 2024

Changes proposed in this PR

  • Add a goroutine on start up that loops and tries to clean up inline certs that are no longer referenced by any gateway and the original shared acl role/policy that is now specific to each gateway. Once all those are cleaned up the goroutine will exit and stop polling

How I've tested this PR

Run/Write tests

Manual Testing:

  1. clone the following gist: https://gist.github.com/jm96441n/b92a80c2251c7a54d1feada320c5eac6
  2. run the start.sh script in the gist (this will install consul 1.18-ent so ensure you have an ent license setup)
  3. port forward port 8501 to the consul server instance
  4. once the cluster is up run the following in another terminal:
export CONSUL_HTTP_ADDR="127.0.0.1:8501"
export CONSUL_HTTP_TOKEN=$(kubectl get --namespace consul secrets/consul-consul-bootstrap-acl-token --template={{.data.token}} | base64 -d)
export CONSUL_HTTP_SSL=true
export CONSUL_HTTP_SSL_VERIFY=false
  1. run consul acl role list and you will see the shared "managed-gateway-acl-role"
  2. run consul acl policy list and you will see the shared "api-gateway-token-policy"
  3. run consul acl binding-rule list and you will see the shared binding-rules referencing both managed-gateway-acl-role
  4. run consul config list -kind inline-certificate and you will see an inline certificate listed
  5. in the consul_values.yaml uncomment lines 3 and 4 and comment lines 4 and 5 (also make sure you build consul-k8s from this branch and have an up to date build of main of consul)
  6. in the same terminal run export CONSUL_K8S_CHARTS_LOCATION="$HOME/hashi/consul-k8s/charts/consul" and replace the value being set with the location of the helm charts in your local copy of consul-k8s
  7. run the following helm upgrade --install consul $CONSUL_K8S_CHARTS_LOCATION -f ./consul_values.yaml -n consul --create-namespace --wait to install the new version of consul-k8s that you built
  8. force a re-reconciliation of your gateways (this should happen by default due to the change in dataplane version, if it does not you can delete and recreate the gateways)
  9. run consul acl role list and you will not see the shared "managed-gateway-acl-role"
  10. run consul acl policy list and you will not see the shared "api-gateway-token-policy"
  11. . run consul acl binding-rule list and you will not see the shared binding-rules referencing managed-gateway-acl-role
  12. run consul config list -kind inline-certificate and you will see no inline certificates listed

How I expect reviewers to test this PR

read the code
run the tests
do the above steps

Checklist

@jm96441n jm96441n changed the title add cleanup goroutine to handle cleaning up old acl roles and inline [NET-9500] Cleanup orphaned inline-certs and acl role/policy Jun 4, 2024
@jm96441n jm96441n added backport/1.2.x This release branch is no longer active. backport/1.3.x backport/1.4.x backport/1.5.x pr/no-changelog PR does not need a corresponding .changelog entry labels Jun 4, 2024
@jm96441n jm96441n marked this pull request as ready for review June 5, 2024 15:30

oldBindingRules := make(map[string]*api.ACLBindingRule)

// here we need to find binding rules with the old name that have a matching selector to the new gateway specific binding rule
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so another direction we could've gone down here is to fetch all the gateways and then for each gateway match it to a binding rule and if we match each gateway to a new binding rule then we can safely delete all the binding rules, definitely open to push back to change this to utilize that heuristic for deleting binding rules rather than what we have here (and we already fetch all the gateways for the inline cert cleanup so it's not like we don't need that data anyways)

@jm96441n jm96441n requested review from a team, NiniOak and wangxinyi7 and removed request for a team June 5, 2024 15:35
Copy link
Contributor

@NiniOak NiniOak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine but the test steps don't seem correct. Happy to re-review it with you before approval

@jm96441n
Copy link
Member Author

Code looks fine but the test steps don't seem correct. Happy to re-review it with you before approval

repro steps should be fixed now! sorry about that!

control-plane/api-gateway/binding/cleanup.go Outdated Show resolved Hide resolved
Comment on lines +40 to +44
client, err := consul.NewClientFromConnMgr(c.ConsulConfig, c.ServerMgr)
if err != nil {
c.Logger.Error(err, "failed to create Consul client")
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to recreate the client every 60s?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do this for the cache code as well, everytime the cache syncs we re-create the client so I wanted to mirror that functionality

control-plane/api-gateway/binding/cleanup.go Outdated Show resolved Hide resolved

// cleanupACLRoleAndPolicy deletes the old shared gateway ACL role and policy if they exist.
func (c Cleaner) cleanupACLRoleAndPolicy(client *api.Client) (bool, error) {
existingRules, _, err := client.ACL().BindingRuleList(c.AuthMethod, &api.QueryOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this list for all namespaces?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed on Slack that the gateway approach described below is probably more scalable in all use cases and would query directly whatever namespace the gateway resided in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also always write the binding-rules into the default namespace and partition https://github.com/hashicorp/consul-k8s/blob/main/control-plane/api-gateway/cache/consul.go#L557

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
jm96441n and others added 3 commits June 12, 2024 16:15
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
@jm96441n jm96441n enabled auto-merge (squash) June 12, 2024 20:44
@jm96441n jm96441n merged commit b08043e into main Jun 12, 2024
25 of 50 checks passed
@jm96441n jm96441n deleted the NET-9500-cleanup-existing-inline-certs-and-old-acl-policy branch June 12, 2024 22:46
jm96441n added a commit that referenced this pull request Jun 12, 2024
* add cleanup goroutine to handle cleaning up old acl roles and inline
certs

* added first unit test

* added more tests

* fixing binding rule deleting

* linting

* added tests and cleaned up

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup_test.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Increase sleep time

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
jm96441n added a commit that referenced this pull request Jun 12, 2024
* add cleanup goroutine to handle cleaning up old acl roles and inline
certs

* added first unit test

* added more tests

* fixing binding rule deleting

* linting

* added tests and cleaned up

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup_test.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Increase sleep time

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
jm96441n added a commit that referenced this pull request Jun 13, 2024
…4125)

* add cleanup goroutine to handle cleaning up old acl roles and inline
certs

* added first unit test

* added more tests

* fixing binding rule deleting

* linting

* added tests and cleaned up

* Update control-plane/api-gateway/binding/cleanup.go



* Update control-plane/api-gateway/binding/cleanup.go



* Update control-plane/api-gateway/binding/cleanup.go



* Update control-plane/api-gateway/binding/cleanup_test.go



* Increase sleep time

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
jm96441n added a commit that referenced this pull request Jun 13, 2024
…4124)

* add cleanup goroutine to handle cleaning up old acl roles and inline
certs

* added first unit test

* added more tests

* fixing binding rule deleting

* linting

* added tests and cleaned up

* Update control-plane/api-gateway/binding/cleanup.go



* Update control-plane/api-gateway/binding/cleanup.go



* Update control-plane/api-gateway/binding/cleanup.go



* Update control-plane/api-gateway/binding/cleanup_test.go



* Increase sleep time

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
jm96441n added a commit that referenced this pull request Jun 13, 2024
* add cleanup goroutine to handle cleaning up old acl roles and inline
certs

* added first unit test

* added more tests

* fixing binding rule deleting

* linting

* added tests and cleaned up

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup_test.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Increase sleep time

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
jm96441n added a commit that referenced this pull request Jun 13, 2024
* add cleanup goroutine to handle cleaning up old acl roles and inline
certs

* added first unit test

* added more tests

* fixing binding rule deleting

* linting

* added tests and cleaned up

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup_test.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Increase sleep time

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
jm96441n added a commit that referenced this pull request Jun 13, 2024
* add cleanup goroutine to handle cleaning up old acl roles and inline
certs

* added first unit test

* added more tests

* fixing binding rule deleting

* linting

* added tests and cleaned up

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup_test.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Increase sleep time

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
jm96441n added a commit that referenced this pull request Jun 20, 2024
* add cleanup goroutine to handle cleaning up old acl roles and inline
certs

* added first unit test

* added more tests

* fixing binding rule deleting

* linting

* added tests and cleaned up

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/api-gateway/binding/cleanup_test.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Increase sleep time

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.2.x This release branch is no longer active. backport/1.3.x backport/1.4.x backport/1.5.x pr/no-changelog PR does not need a corresponding .changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants