Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor daemonset check #515

Merged
merged 32 commits into from
Jun 25, 2020
Merged

Refactor daemonset check #515

merged 32 commits into from
Jun 25, 2020

Conversation

joshulyne
Copy link
Collaborator

No description provided.

// If there is a cancellation interrupt signal.
log.Infoln("Canceling removing daemonset and daemonset pods and shutting down from interrupt.")
return
default:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this always hit the default case? I think there are only three cases here -- your first three w/o default. Especially since you aren't using a for-loop over the select statement, it will just skip straight to the default case

log.Infoln("Worker: waitForPodRemoval started")
defer close(outChan)
outChan <- waitForPodRemoval()
wg.Wait()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is wg.Done() called?

log.Infoln("Received an interrupt signal from the signal channel.")
log.Debugln("Signal received was:", sig.String())

go shutdown(doneChan)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be prudent to pass the context to this function and call the ctxCancel func here -- most of your check run steps respect a ctx.Done() which would return a result if the context was canceled

log.Infoln("Worker: waitForAllDaemonsetsToClear started")
defer close(outChan)
outChan <- waitForAllDaemonsetsToClear()
wg.Wait()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with the wg.Done() here

go func() {
defer close(outChanDS)
outChanDS <- waitForDSRemoval()
wg.Wait()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

@integrii
Copy link
Collaborator

Before we merge this to master, lets move the paths back to daemonset-check rather than the daemonset-check-refactor directory.

defer wg.Done()
defer close(outChan)
outChan <- waitForAllDaemonsetsToClear()
wg.Wait()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wg.Wait() will always block here since you deferd the wg.Done() within the same function. wg.Wait() is going to block until all wg.Done()s are called, but since you defer it here, it would always be called after the wg.Wait()

@integrii
Copy link
Collaborator

integrii commented Jun 17, 2020

I have the build as of ecf5d86 running locally without errors:

time="2020-06-17T23:30:38Z" level=info msg="Found instance namespace: kuberhealthy"
time="2020-06-17T23:30:38Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2020-06-17T23:30:38Z" level=info msg="Setting check time limit to: 10m51.0247799s"
time="2020-06-17T23:30:38Z" level=info msg="Parsed POD_NAMESPACE: kuberhealthy"
time="2020-06-17T23:30:38Z" level=info msg="Performing check in kuberhealthy namespace."
time="2020-06-17T23:30:38Z" level=info msg="Setting DS pause container image to: gcr.io/google-containers/pause:3.1"
time="2020-06-17T23:30:38Z" level=info msg="Setting shutdown grace period to: 1m0s"
time="2020-06-17T23:30:38Z" level=info msg="Setting check daemonset name to: daemonset"
time="2020-06-17T23:30:38Z" level=info msg="Kubernetes client created."
time="2020-06-17T23:30:38Z" level=info msg="Running pre-check cleanup. Deleting any rogue daemonset or daemonset pods before running daemonSet check."
time="2020-06-17T23:30:38Z" level=info msg="Cleaning up daemonsets"
time="2020-06-17T23:30:39Z" level=info msg="Cleaning up daemonset pods"
time="2020-06-17T23:30:39Z" level=info msg="Waiting for all daemonSets or daemonset pods to clean up"
time="2020-06-17T23:30:42Z" level=info msg="All daemonsets cleared"
time="2020-06-17T23:30:42Z" level=info msg="Finished cleanup. No rogue daemonsets or daemonset pods exist"
time="2020-06-17T23:30:42Z" level=info msg="Running daemonset check"
time="2020-06-17T23:30:42Z" level=info msg="Running daemonset deploy..."
time="2020-06-17T23:30:42Z" level=info msg="Deploying daemonset."
time="2020-06-17T23:30:42Z" level=info msg="Found taints to tolerate: []"
time="2020-06-17T23:30:42Z" level=info msg="Generating daemonset kubernetes spec."
time="2020-06-17T23:30:42Z" level=info msg="Deploying daemonset with tolerations:  []"
time="2020-06-17T23:30:42Z" level=info msg="Timeout set: 10m51.0247799s for all daemonset pods to come online"
time="2020-06-17T23:30:43Z" level=info msg="DaemonsetChecker: Daemonset check waiting for 1 pod(s) to come up on nodes [docker-desktop]"
time="2020-06-17T23:30:44Z" level=info msg="DaemonsetChecker: All daemonset pods have been ready for 1 / 5 seconds."
time="2020-06-17T23:30:45Z" level=info msg="DaemonsetChecker: All daemonset pods have been ready for 2 / 5 seconds."
time="2020-06-17T23:30:46Z" level=info msg="DaemonsetChecker: All daemonset pods have been ready for 3 / 5 seconds."
time="2020-06-17T23:30:47Z" level=info msg="DaemonsetChecker: All daemonset pods have been ready for 4 / 5 seconds."
time="2020-06-17T23:30:48Z" level=info msg="DaemonsetChecker: All daemonset pods have been ready for 5 / 5 seconds."
time="2020-06-17T23:30:48Z" level=info msg="DaemonsetChecker: Daemonset daemonset-daemonset-1592436630-1592436638 done deploying pods."
time="2020-06-17T23:30:48Z" level=info msg="Successfully deployed daemonset."
time="2020-06-17T23:30:48Z" level=info msg="Running daemonset removal..."
time="2020-06-17T23:30:48Z" level=info msg="Removing daemonset."
time="2020-06-17T23:30:48Z" level=info msg="DaemonsetChecker deleting daemonset: daemonset-daemonset-1592436630-1592436638"
time="2020-06-17T23:30:48Z" level=info msg="There are 1 daemonset pods to remove"
time="2020-06-17T23:30:48Z" level=info msg="DaemonsetChecker removing daemonset. Proceeding to remove daemonset pods"
time="2020-06-17T23:30:48Z" level=info msg="Successfully requested daemonset removal"
time="2020-06-17T23:30:49Z" level=info msg="Successfully removed daemonset."
time="2020-06-17T23:30:49Z" level=info msg="DaemonsetChecker using LabelSelector: app=daemonset-daemonset-1592436630-1592436638,source=kuberhealthy,khcheck=daemonset to remove ds pods"
time="2020-06-17T23:30:49Z" level=info msg="DaemonsetChecker waiting for 1 pods to delete"
time="2020-06-17T23:30:49Z" level=info msg="DaemonsetChecker is still removing: kuberhealthy daemonset-daemonset-1592436630-1592436638-cs6d7 on node docker-desktop"
time="2020-06-17T23:30:50Z" level=info msg="DaemonsetChecker using LabelSelector: app=daemonset-daemonset-1592436630-1592436638,source=kuberhealthy,khcheck=daemonset to remove ds pods"
time="2020-06-17T23:30:50Z" level=info msg="DaemonsetChecker waiting for 1 pods to delete"
time="2020-06-17T23:30:50Z" level=info msg="DaemonsetChecker is still removing: kuberhealthy daemonset-daemonset-1592436630-1592436638-cs6d7 on node docker-desktop"
time="2020-06-17T23:30:51Z" level=info msg="DaemonsetChecker using LabelSelector: app=daemonset-daemonset-1592436630-1592436638,source=kuberhealthy,khcheck=daemonset to remove ds pods"
time="2020-06-17T23:30:51Z" level=info msg="DaemonsetChecker waiting for 1 pods to delete"
time="2020-06-17T23:30:51Z" level=info msg="DaemonsetChecker is still removing: kuberhealthy daemonset-daemonset-1592436630-1592436638-cs6d7 on node docker-desktop"
time="2020-06-17T23:30:52Z" level=info msg="DaemonsetChecker using LabelSelector: app=daemonset-daemonset-1592436630-1592436638,source=kuberhealthy,khcheck=daemonset to remove ds pods"
time="2020-06-17T23:30:52Z" level=info msg="DaemonsetChecker waiting for 1 pods to delete"
time="2020-06-17T23:30:52Z" level=info msg="DaemonsetChecker is still removing: kuberhealthy daemonset-daemonset-1592436630-1592436638-cs6d7 on node docker-desktop"
time="2020-06-17T23:30:53Z" level=info msg="DaemonsetChecker using LabelSelector: app=daemonset-daemonset-1592436630-1592436638,source=kuberhealthy,khcheck=daemonset to remove ds pods"
time="2020-06-17T23:30:53Z" level=info msg="DaemonsetChecker waiting for 1 pods to delete"
time="2020-06-17T23:30:53Z" level=info msg="DaemonsetChecker is still removing: kuberhealthy daemonset-daemonset-1592436630-1592436638-cs6d7 on node docker-desktop"
time="2020-06-17T23:30:54Z" level=info msg="DaemonsetChecker using LabelSelector: app=daemonset-daemonset-1592436630-1592436638,source=kuberhealthy,khcheck=daemonset to remove ds pods"
time="2020-06-17T23:30:54Z" level=info msg="DaemonsetChecker waiting for 1 pods to delete"
time="2020-06-17T23:30:54Z" level=info msg="DaemonsetChecker is still removing: kuberhealthy daemonset-daemonset-1592436630-1592436638-cs6d7 on node docker-desktop"
time="2020-06-17T23:30:55Z" level=info msg="DaemonsetChecker using LabelSelector: app=daemonset-daemonset-1592436630-1592436638,source=kuberhealthy,khcheck=daemonset to remove ds pods"
time="2020-06-17T23:30:55Z" level=info msg="DaemonsetChecker waiting for 1 pods to delete"
time="2020-06-17T23:30:55Z" level=info msg="DaemonsetChecker is still removing: kuberhealthy daemonset-daemonset-1592436630-1592436638-cs6d7 on node docker-desktop"
time="2020-06-17T23:30:56Z" level=info msg="DaemonsetChecker using LabelSelector: app=daemonset-daemonset-1592436630-1592436638,source=kuberhealthy,khcheck=daemonset to remove ds pods"
time="2020-06-17T23:30:56Z" level=info msg="DaemonsetChecker waiting for 0 pods to delete"
time="2020-06-17T23:30:56Z" level=info msg="DaemonsetChecker has finished removing all daemonset pods"
time="2020-06-17T23:30:56Z" level=info msg="Successfully removed daemonset pods."
time="2020-06-17T23:30:56Z" level=info msg="Done running daemonset check"
time="2020-06-17T23:31:01Z" level=info msg="Running final check cleanup before shutting down"
time="2020-06-17T23:31:01Z" level=info msg="Cleaning up daemonsets"
time="2020-06-17T23:31:01Z" level=info msg="Cleaning up daemonset pods"

@integrii integrii requested a review from jonnydawg June 17, 2020 23:50

// daemonset does not exist, return false
return false, nil
os.Exit(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might want to defer the os.Exit() calls to allow shutdownCtxCancel() to be called before the process exits since shutdownCtxCancel() was deferred

}
log.Infoln("Finished cleanup. No rogue daemonsets or daemonset pods exist")
return nil
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having a whole separate runCheckCleanup function like before, we just use the remove function to remove any daemonset created by the khcheck since there shouldn't be any left around. before there was a ton of logic to determine whether or not the daemonset was rogue but any daemonset left after a check run should be removed i believe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this simplifies a LOT of the cleanUp stuff we were doing before

log.Infoln("Done running daemonset check")
case <-signalChan:
log.Infoln("Received shutdown signal. Canceling context and proceeding directly to cleanup.")
reportOKToKuberhealthy()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a reportOKToKuberhealthy since if no report is made, kuberhealthy defaults the error to: Check execution error: kuberhealthy/daemonset: timed out waiting for checker pod to report in . not sure if we want to report an error instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants