Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for health-checking #113

Closed
wants to merge 4 commits into from

Conversation

JarvisCraft
Copy link
Contributor

Description

Kubernetes-backed servers may want to report their current status via a health check which may rely on implementation-specific details (such as servers exposing their internal connectivity info).

In order to implement this behaviour as part of this plugin:

  • HealthCheck and HealthChecks interfaces were added to facilitate the creation of such health checks
  • HealthChecks became registered as a Bukkit service so that other plugins can modify it
  • an extra endpoint /health was added in order to response with 2xx (i.e. 200) code when the checks return healthy status and 4xx (i.e. 403) otherwise

Affected code

Among the changes, the creation of MetricsController was abstracted into a static factory.

Notes

  • Default Jetty handler is now used for handling incorrect endpoint.

  • It may be reasonable to make exact endpoints used configurable but it is not part of this PR (as this was not configurable before).

@JarvisCraft JarvisCraft changed the title feat: add HealthCheck Add support for health-checking Sep 17, 2021
@sladkoff sladkoff added this to the 2.5.0 milestone Nov 4, 2021
@Combustible
Copy link
Contributor

This is great, I actually just bounced through here thinking about adding a PR for something similar.

It looks like this doesn't include a default health check though? It looks like it'll always return healthy in the current state.

What we do on our system is start a timer task that updates a last tick time variable with the current time. Every time a liveness check comes in, we check if it's been less than 15s since the last tick. If yes, healthy, if not, server is probably hung and unhealthy. This is also handy because it means the server shows up as unhealthy until it starts processing ticks (desired behavior which allows kubernetes to block players joining a server that isn't ready yet)

I doubt you need reference code for this... but here it is anyway:
Runnable created on startup/enable

          new BukkitRunnable() {
              @Override
              public void run() {
                  mLastTickTime = System.currentTimeMillis();
              }
          }.runTaskTimer(mPlugin, 1, 1);

Request handler

          @Override
          public void handle(HttpExchange request) throws IOException {
              final byte[] response;
              if (System.currentTimeMillis() - mLastTickTime < ALIVE_TICK_THRESHOLD) {
                  response = "Alive!".getBytes();
                  request.sendResponseHeaders(200, response.length);
              } else {
                  response = "Server not ticking - possibly hung".getBytes();
                  request.sendResponseHeaders(503, response.length);
              }
              OutputStream stream = request.getResponseBody();
              stream.write(response);
              stream.close();
          }

@sladkoff
Copy link
Owner

@JarvisCraft Sorry for the delayed response. I think this is quite useful and the PR looks very promising already! However, I'm currently having a hard time deciding if this technically belongs into this plugin or not because it has nothing to do with Prometheus (as in "minecraft-prometheus-exporter" 😄 )

That's more of a philosophical question though.

Personally, I feel like the options are:

  • Merge this and rename the plugin to something more broad like 'minecraft-monitor' / 'minecraft-status-exporter' / etc (I'm bad with names)
    • good: 'all-in-one' plugin for Minecraft observability stuff
    • bad: codebase might grow a bit and become more complex, some people might never need one or the other (Prometheus or health checks)
  • Offer a separate plugin based on this PR ('minecraft-health-exporter'...)
    • good: two simple plugins with clear responsibilities
    • bad: the new plugin will cause some (code) duplication (another HTTP server...)

I might be overthinking. What's your opinion on this?

@Combustible
Copy link
Contributor

The biggest reason I'd like to see this exist in this plugin is that it already has a web server. Not that the overhead is at all high to have two web servers... but this one is far more likely to actually be maintained than a stand-alone health checking plugin IMO, and it doesn't "cost" anything to just add an extra endpoint to the existing server (the port, etc. is already all set up)

I'm currently shading a health checking webserver into my own already-giant plugin for exactly this purpose, which I don't particularly like. Of all the plugins we run with, this is the only one that it makes sense to combine into. I'd also use a stand-alone plugin if someone maintained it, but it's just one more thing I need to configure (we are running in docker containers, need to expose ports / etc.). Not hard, just another step.

@JarvisCraft
Copy link
Contributor Author

Hi there,

@sladkoff, I guess that the better option is to merge the functionality into this single plugin as it lets reuse the HTTP-server bound to the single port. As for the naming it's of course up to you :) I don't think that this would lead to uncontrolled growth of the codebase. Especially, considering that the current change does not require the plugin itself to implement any health-checks but provides a mechanism (via ServicesManager) for other plugins to register their own checks.

@Combustible, sorry for the late reply. Yes, the changeset deliberately does not include any health check by itself because different servers may have different statuses for these. This can be added on top of current API but I think that it should (at least) be presented as a separate PR once this one is in its ready-approved state.

@sladkoff
Copy link
Owner

Alright guys, thanks for the input. I'll try to review this in detail in the coming weeks 🤞

@sladkoff sladkoff modified the milestones: 2.5.0, 2.6.0 Apr 11, 2022
@stale
Copy link

stale bot commented Jul 10, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 10, 2022
@sladkoff sladkoff removed the wontfix label Jul 11, 2022
@sladkoff
Copy link
Owner

I'm really really sorry for the delay on this 🙈

I've rebased this branch and created a new PR: #229

@sladkoff sladkoff closed this Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants