Skip to content

Releases: kuberhealthy/kuberhealthy

v2.3.0

27 Oct 19:54
687c3d5
Compare
Choose a tag to compare

Kuberhealthy 2.3.0

Upgrade Instructions:

  • When upgrading to this release, you must be sure the khjob custom resource is applied in your cluster. Without this, the check reaper will crash. If you use the Helm chart, this will be done for you automatically.

Features:

  • Breaking Change: ConfigMap support for Kuberhealthy with live-reloading. #252, #482 #557 @jdowni000
    • Make sure the Kuberhealthy configMap is applied and the configMap volume added is added to the Kuberhealthy deployment
  • Daemonset and Deployment checks now offer nodeSelector options. #546, #547, #549 @jonnydawg
  • Daemonset check refactor #515 @joshulyne
    • Add appropriate timeouts throughout the check run along with respective error messages
    • Add node information in the error messages
    • Add exponential backoff to all kube api calls during ds check run, addressing bug #527
  • Helm chart features a priorityClassName option for the Kuberhealthy deployment. #517, #530 @joshulyne
  • Python client available for Kuberhealthy - external contribution by @bbkgh 🎉 #523
  • Detailed Kuberhealthy installation instructions available in order to capture K8s KPIs using Kuberhealthy. #381, #542 @joshulyne
  • Node Check package added for khchecks to run initial node ready checks before the actual checks run #518, #544 @joshulyne
    • WaitForKuberhealthy: waits for the Kuberhealthy endpoint to be ready
    • WaitForKubeProxy: waits for kube-proxy to be running and ready on the same node
    • WaitForNodeAge: waits for the node age to reach a minimumNodeAge before the check runs
  • Refactored khchecks: http, http-content, and dns-resolution checks to add nodeChecks before check runs #572, #576, #575 @jdowni000 @joshulyne
  • Refactor kuberhealthy to get an A+ on goReport #563 @2infinitee
  • Manual trigger for khchecks added by creating kuberhealthy jobs crd #584 @joshulyne
  • Added chart parameter for pod disruption budgets #633 @jmatias
  • Add startingDeadlineSeconds to check reaper cronjob template #635 @isaaguilar
  • Custom toleration support added to Helm chart #641 @yteraoka
  • Added AWS AMI Exists check to External Checks Registry #643 @mtougeron
  • Allow an optional priority class name to the Daemonset check #646 @mtougeron
  • New Kuberhealthy releases now trigger an automated Kuberhealthy Helm repo update #647 @chadbitzer
  • PodSecurityPolicy added to helm chart #664 @czunker

New Kuberhealthy Checks:

Bug Fixes:

  • Daemonset check now gracefully handles intermittent failures when making kube api calls using exponential backoff. This addresses a bug where daemonset check falsely reports an error when etcd leader changes during ds check run. #527, #515 @joshulyne
  • Fix goroutine leak from kuberhealthy watch events and refactor concurrency on external checks #533, #548 @sbueringer @integrii
  • Fixes a bug where the main kuberhealthy process is blocked if a checker pod gets stuck in "pending" or "container creating". Also ensures that pods that aren't in succeeded or failed phases are evicted properly during cleanup. #447, #540 @joshulyne
  • Fix for misconfigured khchecks that block the main kuberhealthy process by fetching all khchecks using unstructured objects #554 #573 @joshulyne
  • Fix scratch image for Kuberhealthy, no longer giving missing /bin/sh error. Also fixed a null pointer error. #632 @integrii
  • Security Context added to check-reaper pods #607 @czunker
  • Fix missing ‘CurrentMaster’ field on Kuberhealthy status page #584 @joshulyne

v2.2.0

27 May 23:04
740ab19
Compare
Choose a tag to compare

Kuberhealthy 2.2.0 Release:

Bug Fixes:

  • Fixed race condition when checker pod has to verify its uuid with the kh server before being allowed to report its check status #407
  • Small fix to KHState reflector that helped unblock creating new Kuberhealthy builds #418
  • Removing harcoded runAsUser configuration for daemonset check #429
  • When a checker pod gets deleted during pending phase, the main Kuberhealthy process get blocked. Adding a watch to a pod deleted event before we check its running #437
  • Added necessary service accounts and roles / rolebindings to pod-status and pod-restarts khchecks #456
  • Updating check registry links as they broke from a check name refactoring done some time back #463
  • Replace null error messages when reporting to Kuberhealthy with list of empty string #468

Features:

  • Publish Kuberhealthy as a Helm Repository :celebrate:
  • Adding ownerReferences to all external checks #378
  • Adding docs / README to deploy files #403
  • Switching to docker-hub from quay #406
  • Using go mod download for faster Dockerfile builds in CI/CD pipelines #409
  • Add ability to modify security context to khchecks #423
  • Adding more Helm variables for khchecks such as RunInterval, Timeout, etc. -- providing users with more granular control over khcheck installation via Helm #424, #425
  • Using the k8sErrors package to better identify Not Found errors #433
  • Added new variable that gets injected into khcheck pods: KH_CHECK_RUN_DEADLINE. Checks are given up to this deadline to complete their check runs #434, #452
  • Gosec: implement a github workflow to run gosec on external checks for PRs against master and move gosec scanning in Dockerfile to a github action #455, #459
  • Allow option for Kuberhealthy to be namespace aware with new listenNamespace flag #461
  • Various improvements to Deployment Check:
    -- Respect race conditions when checker pod starts up by utilizing KH_CHECK_RUN_DEADLINE #434
    -- Report back to Kuberhealthy on the current state of the check when given the interrupt / kill signal #434
    -- Adding custom resource requests and limits on the deployment pods that are brought up during the deployment check #434
    -- Update deployment check time limit to ensure that the check report its last stage before timing out #462
    -- Make the default namespace for the deployment check the same as the pod's (that get provisioned by the deployment check) namespace #469
  • Adding information about errors in Prometheus metrics for helpful integration with alertmanager #471
  • JS client available for Kuberhealthy using the kuberhealthy-client NPM package #476

v2.1.1

14 Feb 01:10
8a61505
Compare
Choose a tag to compare
  • Fixed name on checkClient directory to properly match package name of checkclient. This was breaking go module imports.

v2.1.0

28 Jan 00:46
b9758e9
Compare
Choose a tag to compare
  • The new CheckReaper (included in Helm chart specs and flat files) will now reap checks sitting around in the state of Completed but leave checks that have failed around longer for inspection. @joshulyne
  • New namespace filtering on the JSON status page output with the GET variable namespace. Use commas for multiple namespaces. Example: ?namespace=kuberhealthy,kube-system @jonnydawg
  • Check run duration has been added to khstate information and the status page. @joshulyne
  • Many bug fixes as listed in the 2.1.0 milestone

Thanks to everyone who reported issues and contributed to discussion on #kuberhealthy in the Kubernetes slack!

v2.0.0

16 Nov 02:17
f787557
Compare
Choose a tag to compare

Kuberhealthy 2.

  • new khcheck resource
  • new external check system implemented
  • all internal checks except "external" removed
  • caching of khstate resources used on the status page
  • Helm 3
  • New readme
  • ability to create and run your own checker containers
  • greatly improved master election logic
  • greatly improved check reconfiguration logic
  • new deployment checker

It is not possible to directly update from Kuberhealthy 1. You must learn about KH2's new external checks and enable the ones you would like to use. By default, this release only installs the daemonset checker and deployment checker.

Thank you to all our contributors who made this large refactor possible!

If you are interested in writing your own synthetic tests with Kuberhealthy, check out the docs for creating your own checks.

v1.1.0-rc3

06 Sep 22:22
c710145
Compare
Choose a tag to compare
v1.1.0-rc3 Pre-release
Pre-release
  • Fixed over-mutexing that caused deadlock

image: quay.io/comcast/kuberhealthy:v1.1.0-rc3

v1.1.0-rc2

04 Sep 17:01
41bc85a
Compare
Choose a tag to compare
v1.1.0-rc2 Pre-release
Pre-release
  • Enhanced mutexting to prevent race conditions in CRD use on some clusters

image: quay.io/comcast/kuberhealthy:v1.1.0-rc2

v1.1.0-rc1

05 Jun 17:52
d97f86f
Compare
Choose a tag to compare
v1.1.0-rc1 Pre-release
Pre-release
  • Moved to a single gomod for the whole project
  • New DNS Status check
  • Fix a misspelling, Authorative -> Authoritative 
  • Added InfluxDB metric forwarding
  • Became compliant with CII Best Practices
  • Added an annotation to the DaemonSet test pods to avoid interfering with https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler node scale down calculations
  • Add helm chart flags to disable checks
  • Sunset the unstable branch - releases are cut off of master now to avoid bad interactions with gomod and general confusion

v1.0.2

27 Feb 20:57
Compare
Choose a tag to compare
  • Allows the user to override the daemon set checker pause container image location
  • Sets the daemon set checker DS containers to run as user 1000 instead of root

v1.0.1

25 Feb 18:52
Compare
Choose a tag to compare
  • Includes a runtime panic bug fix #114
  • Adds better logging around the daemon set checker failures #114