Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cleanup controller #433

Merged
merged 2 commits into from
Feb 11, 2021
Merged

Conversation

lkysow
Copy link
Member

@lkysow lkysow commented Feb 4, 2021

  • Runs by default
  • Watches for pod delete events and ensures there are no stale Consul
    service instances
  • Runs a Reconcile loop on a timer that checks all Consul services
  • Adds new connect-inject flags: -enable-cleanup-controller,
    -cleanup-controller-reconcile-period.
  • Adds new k8s-namespace meta key to connect service instances
  • Adds new server-acl-init flag: -enable-cleanup-controller (default
    true) and modifies the ACL templates.

How I've tested this PR:

  • manually and with acceptance tests: Cleanup controller consul-helm#806
  • reconcile can be tested by registering a service directly with a consul client:
    • port-forward to consul client on 8500
    • use svc.json:
      {
        "ID": "redis1",
        "Name": "redis",
        "Tags": ["primary", "v1"],
        "Address": "127.0.0.1",
        "Port": 8000,
        "Meta": {
          "pod-name": "dead",
          "k8s-namespace": "default"
        }
      }
      
    • curl -X PUT -v --data @svc.json localhost:8500/v1/agent/service/register
    • wait for reconcile to run (can bump default time with the helm value)
    2021-02-10T00:50:32.329Z [DEBUG] cleanupResource: starting reconcile
    2021-02-10T00:50:32.342Z [INFO]  cleanupResource: found service instance from terminated pod still registered: pod=dead id=redis1
    2021-02-10T00:50:32.346Z [INFO]  cleanupResource: service instance deregistered: id=redis1
    2021-02-10T00:50:32.346Z [DEBUG] cleanupResource: finished reconcile
    

How I expect reviewers to test this PR:

  • try it out with ghcr.io/lkysow/consul-k8s-dev:feb09 and see if you can break it

Checklist:

  • Tests added
  • CHANGELOG entry added (HashiCorp engineers only, community PRs should not add a changelog entry)

@lkysow lkysow changed the base branch from inject-connect-command-refactor to feature/cleanup-controller February 4, 2021 20:37
@lkysow lkysow marked this pull request as ready for review February 4, 2021 20:53
@lkysow lkysow force-pushed the cleanup-controller branch 2 times, most recently from 5c99f1d to 9731e4f Compare February 10, 2021 00:54
lock sync.Mutex
}

// Run starts the long-running Reconcile loop that runs on a timer.
Copy link
Member Author

@lkysow lkysow Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slight change to the Run function from the health check resource (https://github.com/hashicorp/consul-k8s/blob/master/connect-inject/health_check_resource.go#L54-L76).

  1. I refactored the loop so we don't have h.Reconcile() called twice. I believe they're logically equivalent but please double check.
  2. reconcile() here doesn't return an error. I did this because we're already logging all the places there are errors inside reconcile() with a lot of good context as to why the error is occurring and so logging it again here just results in duplicate error logs.
  3. I've also passed in ConsulScheme and ConsulPort as separate fields into the resource struct up above rather than ConsulURL. I did this because we are only using the scheme and port from the URL so it felt weird to pass in the full URL when we're discarding the hostname.

I plan to contribute these changes back to HealthCheckResource in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea on point 3!

@@ -493,31 +494,32 @@ namespace_prefix "prefix-" {
}
}

// There are three true/false settings so 8 permutations to test.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry in advance. I implemented these tests via a truth table that covers all permutations.

* Runs by default
* Watches for pod delete events and ensures there are no stale Consul
service instances
* Runs a Reconcile loop on a timer that checks all Consul services
* Adds new connect-inject flags: `-enable-cleanup-controller` (default
true),
`-cleanup-controller-reconcile-period`.
* Adds new server-acl-init flag: `-enable-cleanup-controller` (default
true) and modifies the ACL templates.
@lkysow lkysow requested review from a team, ishustava and thisisnotashwin and removed request for a team February 10, 2021 02:05
Copy link
Contributor

@thisisnotashwin thisisnotashwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried hard to look for something to pick on, but the best i could find was deregister to de-register 😫
Great work @lkysow

@lkysow
Copy link
Member Author

lkysow commented Feb 10, 2021

I tried hard to look for something to pick on, but the best i could find was deregister to de-register 😫
Great work @lkysow

lol I went back and forth on that because my editor was complaining. https://www.dictionary.com/browse/deregister?s=t says it's allowed and it's one less character which is why I went with deregister but I'm easy.

@thisisnotashwin
Copy link
Contributor

pff..good chance i would have suggested deregister to save a character had you chosen de-register 😉

I'm easy

❤️

Copy link
Contributor

@ishustava ishustava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good!!! I really appreciate all comments and notes you left along the way - felt like you read my mind in a lot of places!

I've added some comments, but they are non-blocking.

connect-inject/cleanup_resource.go Outdated Show resolved Hide resolved
connect-inject/cleanup_resource.go Show resolved Hide resolved
connect-inject/cleanup_resource_test.go Show resolved Hide resolved
connect-inject/cleanup_resource_test.go Show resolved Hide resolved
subcommand/inject-connect/command.go Show resolved Hide resolved
@lkysow lkysow merged commit c1299fc into feature/cleanup-controller Feb 11, 2021
@lkysow lkysow deleted the cleanup-controller branch February 11, 2021 20:04
lkysow added a commit that referenced this pull request Feb 12, 2021
* Implement cleanup controller

* Runs by default
* Watches for pod delete events and ensures there are no stale Consul
service instances
* Runs a Reconcile loop on a timer that checks all Consul services
* Adds new connect-inject flags: `-enable-cleanup-controller` (default
true),
`-cleanup-controller-reconcile-period`.
* Adds new server-acl-init flag: `-enable-cleanup-controller` (default
true) and modifies the ACL templates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants