Skip to content

shishir-a412ed/nomad-health-checks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nomad-health-checks

Sample health checks for Nomad node problem detector (NNPD)
NOTE These are not real health checks and only serve as a reference to how your actual health checks should be defined.

Nomad-node-problem-detector (NNPD)

Nomad-node-problem-detector (NNPD) is a system which scans through the problems
on nomad client nodes, and take the bad nodes out of the scheduling pool so that
nomad doesn't schedule any new jobs on these bad nodes.

If the problem is transient and fixes itself in sometime, NNPD will put the node back
in the scheduling pool, in the next scanning cycle.

NNPD is composed of two main components

  • Detector
  • Aggregator

Detector

  • Detector runs on every nomad client node and scans through some pre-defined health checks
  • This repo (nomad-health-checks) is just a sample repo on how these health checks should be defined.
  • This repo is mostly used by Nomad-node-problem-detector (NNPD) repo for it's integration tests.
  • These are not real health checks and only serve as a reference to how your actual health checks should be defined.

Aggregator

Aggregator is the central component (mastership) to which every detector (node) reports it's problems to.
Based on those results, aggregator will either be taking the node out of the scheduling pool (bad node)
or put the node back to the scheduling pool (good node) or do nothing in case of no state change.

About

Health checks for Nomad node problem detector (NNPD)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published