Releases: kuberhealthy/kuberhealthy
Releases · kuberhealthy/kuberhealthy
v2.3.0
Kuberhealthy 2.3.0
Upgrade Instructions:
- When upgrading to this release, you must be sure the
khjob
custom resource is applied in your cluster. Without this, the check reaper will crash. If you use the Helm chart, this will be done for you automatically.
Features:
- Breaking Change: ConfigMap support for Kuberhealthy with live-reloading. #252, #482 #557 @jdowni000
- Make sure the Kuberhealthy configMap is applied and the configMap volume added is added to the Kuberhealthy deployment
- Daemonset and Deployment checks now offer nodeSelector options. #546, #547, #549 @jonnydawg
- Daemonset check refactor #515 @joshulyne
- Add appropriate timeouts throughout the check run along with respective error messages
- Add node information in the error messages
- Add exponential backoff to all kube api calls during ds check run, addressing bug #527
- Helm chart features a priorityClassName option for the Kuberhealthy deployment. #517, #530 @joshulyne
- Python client available for Kuberhealthy - external contribution by @bbkgh 🎉 #523
- Detailed Kuberhealthy installation instructions available in order to capture K8s KPIs using Kuberhealthy. #381, #542 @joshulyne
- Node Check package added for khchecks to run initial node ready checks before the actual checks run #518, #544 @joshulyne
- WaitForKuberhealthy: waits for the Kuberhealthy endpoint to be ready
- WaitForKubeProxy: waits for kube-proxy to be running and ready on the same node
- WaitForNodeAge: waits for the node age to reach a minimumNodeAge before the check runs
- Refactored khchecks: http, http-content, and dns-resolution checks to add nodeChecks before check runs #572, #576, #575 @jdowni000 @joshulyne
- Refactor kuberhealthy to get an A+ on goReport #563 @2infinitee
- Manual trigger for khchecks added by creating kuberhealthy jobs crd #584 @joshulyne
- Added chart parameter for pod disruption budgets #633 @jmatias
- Add
startingDeadlineSeconds
to check reaper cronjob template #635 @isaaguilar - Custom toleration support added to Helm chart #641 @yteraoka
- Added AWS AMI Exists check to External Checks Registry #643 @mtougeron
- Allow an optional priority class name to the Daemonset check #646 @mtougeron
- New Kuberhealthy releases now trigger an automated Kuberhealthy Helm repo update #647 @chadbitzer
- PodSecurityPolicy added to helm chart #664 @czunker
New Kuberhealthy Checks:
- Image pull check: tests the availability of external image repositories #221, #516 @zjhans
- SSL handshake check and [SSL expiry check]: tests the expiry and security of SSL certificates #477 @zjhans
Bug Fixes:
- Daemonset check now gracefully handles intermittent failures when making kube api calls using exponential backoff. This addresses a bug where daemonset check falsely reports an error when etcd leader changes during ds check run. #527, #515 @joshulyne
- Fix goroutine leak from kuberhealthy watch events and refactor concurrency on external checks #533, #548 @sbueringer @integrii
- Fixes a bug where the main kuberhealthy process is blocked if a checker pod gets stuck in "pending" or "container creating". Also ensures that pods that aren't in succeeded or failed phases are evicted properly during cleanup. #447, #540 @joshulyne
- Fix for misconfigured khchecks that block the main kuberhealthy process by fetching all khchecks using unstructured objects #554 #573 @joshulyne
- Fix scratch image for Kuberhealthy, no longer giving missing /bin/sh error. Also fixed a null pointer error. #632 @integrii
- Security Context added to check-reaper pods #607 @czunker
- Fix missing ‘CurrentMaster’ field on Kuberhealthy status page #584 @joshulyne
v2.2.0
Kuberhealthy 2.2.0 Release:
Bug Fixes:
- Fixed race condition when checker pod has to verify its uuid with the kh server before being allowed to report its check status #407
- Small fix to KHState reflector that helped unblock creating new Kuberhealthy builds #418
- Removing harcoded runAsUser configuration for daemonset check #429
- When a checker pod gets deleted during pending phase, the main Kuberhealthy process get blocked. Adding a watch to a pod deleted event before we check its running #437
- Added necessary service accounts and roles / rolebindings to pod-status and pod-restarts khchecks #456
- Updating check registry links as they broke from a check name refactoring done some time back #463
- Replace null error messages when reporting to Kuberhealthy with list of empty string #468
Features:
- Publish Kuberhealthy as a Helm Repository :celebrate:
- Adding ownerReferences to all external checks #378
- Adding docs / README to deploy files #403
- Switching to docker-hub from quay #406
- Using go mod download for faster Dockerfile builds in CI/CD pipelines #409
- Add ability to modify security context to khchecks #423
- Adding more Helm variables for khchecks such as RunInterval, Timeout, etc. -- providing users with more granular control over khcheck installation via Helm #424, #425
- Using the k8sErrors package to better identify Not Found errors #433
- Added new variable that gets injected into khcheck pods:
KH_CHECK_RUN_DEADLINE
. Checks are given up to this deadline to complete their check runs #434, #452 - Gosec: implement a github workflow to run gosec on external checks for PRs against master and move gosec scanning in Dockerfile to a github action #455, #459
- Allow option for Kuberhealthy to be namespace aware with new
listenNamespace
flag #461 - Various improvements to Deployment Check:
-- Respect race conditions when checker pod starts up by utilizing KH_CHECK_RUN_DEADLINE #434
-- Report back to Kuberhealthy on the current state of the check when given the interrupt / kill signal #434
-- Adding custom resource requests and limits on the deployment pods that are brought up during the deployment check #434
-- Update deployment check time limit to ensure that the check report its last stage before timing out #462
-- Make the default namespace for the deployment check the same as the pod's (that get provisioned by the deployment check) namespace #469 - Adding information about errors in Prometheus metrics for helpful integration with alertmanager #471
- JS client available for Kuberhealthy using the kuberhealthy-client NPM package #476
v2.1.1
v2.1.0
- The new CheckReaper (included in Helm chart specs and flat files) will now reap checks sitting around in the state of
Completed
but leave checks that have failed around longer for inspection. @joshulyne - New namespace filtering on the JSON status page output with the GET variable
namespace
. Use commas for multiple namespaces. Example:?namespace=kuberhealthy,kube-system
@jonnydawg - Check run duration has been added to
khstate
information and the status page. @joshulyne - Many bug fixes as listed in the 2.1.0 milestone
Thanks to everyone who reported issues and contributed to discussion on #kuberhealthy in the Kubernetes slack!
v2.0.0
Kuberhealthy 2.
- new
khcheck
resource - new external check system implemented
- all internal checks except "external" removed
- caching of khstate resources used on the status page
- Helm 3
- New readme
- ability to create and run your own checker containers
- greatly improved master election logic
- greatly improved check reconfiguration logic
- new deployment checker
It is not possible to directly update from Kuberhealthy 1. You must learn about KH2's new external checks and enable the ones you would like to use. By default, this release only installs the daemonset checker and deployment checker.
Thank you to all our contributors who made this large refactor possible!
If you are interested in writing your own synthetic tests with Kuberhealthy, check out the docs for creating your own checks.
v1.1.0-rc3
- Fixed over-mutexing that caused deadlock
image: quay.io/comcast/kuberhealthy:v1.1.0-rc3
v1.1.0-rc2
- Enhanced mutexting to prevent race conditions in CRD use on some clusters
image: quay.io/comcast/kuberhealthy:v1.1.0-rc2
v1.1.0-rc1
- Moved to a single gomod for the whole project
- New DNS Status check
- Fix a misspelling, Authorative -> Authoritative
- Added InfluxDB metric forwarding
- Became compliant with CII Best Practices
- Added an annotation to the DaemonSet test pods to avoid interfering with https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler node scale down calculations
- Add helm chart flags to disable checks
- Sunset the unstable branch - releases are cut off of master now to avoid bad interactions with gomod and general confusion