Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add predefined health checks #2

Closed
BrunnerLivio opened this issue Sep 6, 2018 · 16 comments
Closed

Add predefined health checks #2

BrunnerLivio opened this issue Sep 6, 2018 · 16 comments

Comments

@BrunnerLivio
Copy link
Member

BrunnerLivio commented Sep 6, 2018

I'm submitting a...


[ ] Regression 
[ ] Bug report
[x] Feature request
[ ] Documentation issue or request
[ ] Support request => Please do not submit support request here, instead post your question on Stack Overflow.

Expected behavior

As discussed on nestjs/nest#966 there should be prefedined health checks, so the user can easily access data like (as @weeco mentioned)

  • When all initialization stuff is done (asynchronous connections, binding routes, asynchronous modules, etc..)
  • When the status of the connections to the infrastructure services (such as a database) is good
@BrunnerLivio
Copy link
Member Author

I am currently investigating what is the best architecture to go with.
As an example I had a look into heptiolabs/healthcheck. This library written in Go-lang provides predefined healthchecks as such:

health := healthcheck.NewHandler()

// Our app is not happy if we've got more than 100 goroutines running.
health.AddLivenessCheck("goroutine-threshold", healthcheck.GoroutineCountCheck(100))

// Our app is not ready if we can't resolve our upstream dependency in DNS.
health.AddReadinessCheck(
    "upstream-dep-dns",
    healthcheck.DNSResolveCheck("upstream.example.com", 50*time.Millisecond))

// Our app is not ready if we can't connect to our database (`var db *sql.DB`) in <1s.
health.AddReadinessCheck("database", healthcheck.DatabasePingCheck(db, 1*time.Second)

I think this is a good approach and we should try to integrate it with the existing infrastructure of godday/terminus.

@BrunnerLivio
Copy link
Member Author

BrunnerLivio commented Sep 14, 2018

A possible proposal:

Provided by @nestjs/terminus. Can be see as a "repository" full of health checks of the given topic (DatabaseHealth, DNSHealth, MachineHealth, etc.):

database.health.ts

@Injectable()
export class DatabaseHealth {
  constructor(private connection: Connection)) {
  }
  
  async checkConnection(responseTime: Number): Promise<boolean> {
    // connect to database (`var db *sql.DB`) in < {reponseTime}.
    return true;
  }
}

Provided by user:
health.service.ts

import { Injectable } from '@nestjs/common';
import { TerminusOptions, TerminusOptionsFactory, DatabaseHealth } from '@nestjs/terminus';

@Injectable()
export class HealthService implements TerminusOptionsFactory {
   constructor(private databaseHealth: DatabaseHealth){ }

  async customCheck() {
    return true;
  }

  async health(): Promise<any> {
    return {
      'database': await this.databaseHealth.checkConnection(1),
      'custom': await this.customCheck()
    };
  }

   async createTerminusOptions(): Promise<TerminusOptions> {
    return {
      healthChecks: { '/health': this.health }
    };
  }
}

Would result into:

{
  "status": "ok",
  "info": {
    "database": true,
    "custom": true
  }
}

What do you guys think? @weeco @kamilmysliwiec

This approach would not break the API as it is used at the moment.

@weeco
Copy link
Contributor

weeco commented Sep 16, 2018

Do you think the health check should be made in it's own module/class?

Personally I expected the health check to be close to the code/instance which you want to run a health check on. However the most common use case for health checks would probably be providers, as they can be used to instantiate and await async operations (e. g. connection attempts). I don't see how we could implement a proper health check inside of the exported provider:

export const fortniteClientProviders: Provider[] = [
  {
    provide: InjectToken.FortniteClient,
    useFactory: async (config: ConfigService): Promise<FortniteClient> => {
      const api: FortniteClient = new FortniteClient(config.fortniteClient, {
        timeoutMs: 1.5 * 1000
      });
      await api.login();

      return api;
    },
    inject: [InjectToken.ConfigService]
  }
];

For this reason I believe HealthChecks should be added to modules:

Module:

@Module({
  providers: [...fortniteClientProviders],
  exports: [...fortniteClientProviders],
  healthChecks: [FortniteClientHealthCheck]
})
export class FortniteClientModule {}

Health check:

@Injectable
export class FortniteClientHealthCheck implements HealthIndicator {
    // If isHealthy() takes longer than this, consider the check as failed and consider this health indicator as down
    protected healthCheckTimeoutMs: number = 1 * 1000;
    // If false only status: up or down will be shown in the response
    protected isPublic: boolean = true;

    constructor(@Inject(InjectToken.FortniteClient) private readonly api: FortniteClient) {}

    public async isHealthy(): HealthResponse {
        const builder = new HealthBuilder("fortniteClient"); // Or use class name with applied lower camel case format instead? Then pass the builder as argument, so that you don't need to create an instance
        try {
            const testResponse = await api.ping();
            if (testResponse != null) return builder.up();
            return builder.down({ reason: "FortniteClient ping response is null or undefined" });
        } catch (err) {
            // Return down status, optionally along with more information
            return builder.down({ host: api.host, ...err });
        }
    }
}

which would result in either:

Success:
HTTP response status code: 200
HTTP response body:

{
    "status": "UP", 
    "diskSpace": {
        "status": "UP",
        "free": 56443746,
        "threshold": 1345660
    },
   "fortniteClient": {
      "status": "UP"
    }
}

or

Down:
HTTP response status code: 503
HTTP response body:

{  
  "status":"DOWN",
  "reason":"fortniteClient is DOWN",
  "diskSpace":{  
    "status":"UP",
    "free":56443746,
    "threshold":1345660
  },
  "fortniteClient":{  
    "status":"DOWN",
    "error": {....}
  }
}

I prefer this approach over your proposal because it is more "decoupled" and the health check is still implemented near the component you are checking against. What do you think?

If you decide for this route you'd still need to think about the predefined health checks. The only problem I see here is the dependency injection of a "TypeORM / MySQL connection" for instance. Other than that you can easily write predefined Health Checks I believe

@BrunnerLivio
Copy link
Member Author

BrunnerLivio commented Sep 17, 2018

@weeco I am not 100% sure if we're on the same page, but that what you proposed is more or less the same I propose.

In my opinion having healthChecks: Provider[] inside a @Module()-option is not a good idea. Everything, even interceptors go into providers: Provider[], why should Health Checks be the exception?

Personally I expected the health check to be close to the code/instance which you want to run a health check on.

Agree. My solution actually supports this. I do not recommend writing your health checks in health.service.ts as I did (just did it for simplicity reasons). Your health.service.ts is just to connect your defined health checks with terminus. But you do not have to write the check inside the service itself. You can provide them anywhere you want.

Here is how I would implement your FortniteClientHealthCheck in my proposal.

import { Injectable } from '@nestjs/common';
import { TerminusOptions, TerminusOptionsFactory } from '@nestjs/terminus';
import { FortniteClientHealthCheck } from './fortnite/fortnite-client.health.ts'

@Injectable()
export class HealthService implements TerminusOptionsFactory {
   constructor(private fortniteClientHealthCheck: FortniteClientHealthCheck){ }

  async health(): Promise<any> {
    return {
      'fortnite': await this.fortniteClientHealthCheck.isHealthy()
    };
  }

   async createTerminusOptions(): Promise<TerminusOptions> {
    return {
      healthChecks: { '/health': this.health }
    };
  }
}

So you have your actual health check still near your feature, but it is connected through a "root"-health provider called `health.service.

One thing you'd need to refactor is the HealthBuilder. I also thought about such a class, but I dropped that, because it is not really Angular like. Since it is not being injected by the constructor it is harder to test. So the developer who write the health check has to build it's own HealthBuilder, if he prefers such a solution.

@weeco
Copy link
Contributor

weeco commented Sep 19, 2018

Can you elloborate if this "root health provider" is actually necessary? This sounds like boilerplate which could potentially be avoided. If I write one or more health checks I usually want it to be picked up as well.

I can imagine it's not possible because you wouldn't be able to tell if it's a normal service or a health check if you put them all together in the providers property of a module. That's why I proposed to put it into a healthcheck property which indicates that this will be picked up in our "HealthCheckRepository". Is that the main difference between our two proposals?

@kamilmysliwiec
Copy link
Member

kamilmysliwiec commented Sep 19, 2018

In my opinion having healthChecks: Provider[] inside a @module()-option is not a good idea. Everything, even interceptors go into providers: Provider[], why should Health Checks be the exception?

100% agree. Let's don't mess with @Module() properties.

However, I totally understand your point @weeco. Keeping healthchecks close to their domain sounds clear. I guess that another possible middle ground would be to delay instantiation of terminus itself (for example move it to the OnModuleInit or OnApplicationBootstrap instead of creating it as a provider using options object). Then, provide a class like TerminusRegistry that exposes register method. Each controller/service/anything could inject TerminusRegistry and inside the constructor, call this.terminusRegistry.register(this) for instance. Registry would hold an array of healthchecks as a class property and during the OnModuleInit/Bootstrap, create terminus instance and pass down these registered methods. :)

@BrunnerLivio
Copy link
Member Author

BrunnerLivio commented Sep 19, 2018

@kamilmysliwiec
Thanks a lot for joining the discussion. That sounds like a nice solution we all can agree on. Will have a look into implementing this asap and will add you guys as reviewers in the PR.

So then I can finally finish the documentation and release the first version :)

—-

One thing which stands out for me; this would mean that the Terminus feature of reacting to process signals like SIGTERM is not covered with Kamils proposal, because that was solved by the “root health provider”.

In my opinion we should kick the signal feature out for the first release and then start the discussion afterwards, if that is okay?

@weeco
Copy link
Contributor

weeco commented Sep 19, 2018

I believe the sig feature is important, but until recently no one has ever requested such a health check module, so I believe it would be a very solid module without sig hook as well :-). Looking forward to it.

@kamilmysliwiec
Copy link
Member

One thing which stands out for me; this would mean that the Terminus feature of reacting to process signals like SIGTERM is not covered with Kamils proposal, because that was solved by the “root health provider”.

Things like onSignal and onShutdown might be covered in the same fashion as healthchecks. We could provide sth like TerminusLifecycleManager with addOnSignalHook() (?).

The rest (e.g. signal and logger) could work same as now (using createTerminusOptions).

@kamilmysliwiec
Copy link
Member

Sorry for closing 😅

@BrunnerLivio BrunnerLivio added this to To do in First Release Sep 26, 2018
@BrunnerLivio BrunnerLivio moved this from To do to In progress in First Release Sep 26, 2018
@BrunnerLivio BrunnerLivio added this to the First release milestone Sep 26, 2018
@BrunnerLivio
Copy link
Member Author

BrunnerLivio commented Sep 26, 2018

@weeco I tried to adapt to @kamilmysliwiec solution. You can see a working example with my dogs-app (not a cat person, sorry :D)

What I've found is that the general behavior of @godaddy/terminus is not so pleasant.

In case a subsystem is down as shown here with info.dogs.status:

{"status":"ok","info":{"dogs":{"status":"down","goodboys":5}}}

The property status should also change to down, because info.dogs.status is "down", right?
This is not possible with @godaddy/terminus.. It expects if no error is thrown, everything is great. Therefore it will just automatically use status = "ok" as shown here

So then you have to throw a error to change the state? This is hardcoded too in @godaddy/terminus, so in case a error is thrown it can only show this JSON response (no additional data / message):

{
  "status": "error"
}

Maybe my use cases are not needed, so we could proceed with this implementation. Otherwise we may consider moving away from @godaddy/terminus or create issues on their repository.

@weeco
Copy link
Contributor

weeco commented Sep 26, 2018

The usage looks very nice, good job.

In my oppinion having detailed information on subsystems is a must have. Otherwise it may take some time to figure out why the healthcheck is failing. I believe it shouldn't take long to implement the godaddy/terminus featureset in Typescript and is worth the effort

@BrunnerLivio
Copy link
Member Author

BrunnerLivio commented Sep 26, 2018

I believe it shouldn't take long to implement the godaddy/terminus featureset in Typescript and is worth the effort

It wouldn't be so hard to implement it, but keep in mind we have to maintain this, think about testing and security etc. So that is why I created an issue on @godaddy/terminus and lets hear out if the guys from over there support us. If not it seems like we have no other option than just writing it for our own as you proposed...

@BrunnerLivio
Copy link
Member Author

BrunnerLivio commented Oct 10, 2018

@kamilmysliwiec I had some discussions with @weeco:

I experienced some issues with your proposal with registering the health check in the constructor as you proposed.

this.terminusRegistry.register(this);

The problem is with predefined health checks. Let's say nestjs/terminus module offers a predefined database health check. How does we register this? Certainly we can not do that in the constructor, because what if the user does not want to use our predefined health check in his ecosystem?

Therefor a user must select the predefined health checks himself. In the end this would lead back to my proposal, having a "main" health check class, which binds health check to a health address as I proposed in my previous comment. @weeco which represents "the customer" in this case would be fine with this solution.

Do you have another idea, or should I go with this?

@kamilmysliwiec
Copy link
Member

Couldn't we support two cases at once? I mean, combine options (manually defined checks) + registered ones (flexibility).

@BrunnerLivio
Copy link
Member Author

@kamilmysliwiec I think it should be consistent in the end. Predefined health checks and user defined health checks should use the same interface and should be able to be registered the same way. I really do not see many other options than a main file..

First Release automation moved this from In progress to Done Nov 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
First Release
  
Done
Development

No branches or pull requests

3 participants