Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design a robust health system for SPIRE #2047

Closed
azdagron opened this issue Jan 12, 2021 · 3 comments
Closed

Design a robust health system for SPIRE #2047

azdagron opened this issue Jan 12, 2021 · 3 comments

Comments

@azdagron
Copy link
Member

SPIRE support for health determination up until this point has been introduced organically and is likely far from where we want to be. The current health system is implemented as:

  1. Server CLI check which checks to see if a bundle is retrievable over the registration API (for both readiness and liveness)
  2. Agent CLI check which checks to see if the Workload API is responding with anything other than UNAVAILABLE (for both readiness and liveness)
  3. Server HTTP endpoints with noop liveness and readiness implementations
  4. Agent HTTP endpoints with noop liveness and readiness implementations

A recent PR (#2015) updates the HTTP endpoints to mimic the CLI checks.

Even so, SPIRE lacks a cohesive vision for what health means. At a minimum, we really could use the following:

  • Determine what "live" and "ready" mean for each component individually, not just spire-server and spire-agent but all components that are part of SPIRE (e.g. the k8s-workload-registrar)
  • Decide how we want to surface that information (e.g. HTTP endpoints)
  • Determine if/how plugins participate in determining readiness and liveness

This issue tracks the creation of such a proposal.

@evan2645
Copy link
Member

We've received a lot of interest in this issue after surfacing it through the LFX mentorship program. We are planning to have our LFX mentees tackle this, and would like for them to own the full cycle i.e. design to implementation. If you're interested in working on this issue specifically, or you're interested in a paid internship in which you'll undertake core SPIRE work, please see the official program web page for information on applying to our LFX program.

If you're interested in helping out and are open to other areas of work in the SPIRE project, please reach out to us in the #spire channel in the SPIFFE Slack and we'll find something that suits your personal interest :)

@sachinkumarsingh092
Copy link
Contributor

I think as a first step it makes sense to inspect the different components of SPIRE and evaluates the states that can act as metrics for health for those components. Thoughts?

@amartinezfayo
Copy link
Member

Although there is room for improvement in the current implementation (e.g. extend the implementation to cover more subsystems), I'm closing this issue since the goal of having a more robust health subsystem has been completed.
Separate issues can be opened to extend this implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants