Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health check api for agent (#3074) #3781

Merged
merged 1 commit into from Sep 27, 2017

Conversation

Projects
None yet
4 participants
@ketan
Copy link
Member

commented Aug 25, 2017

This adds 2 endpoints for testing the health of the agent <-> server comm

  • Perform a GET on /health/{v1,latest}/isConnectedToServer, will
    return status 200 along with text OK! if the agent is:
    • able to establish HTTP(s) contact
    • is authorized by the server (by an admin, or by auto-registration)
      The endpoint will return status 500 in any other cases.
      Two consecutive failing pings will be treated as having failed healthcheck.

These health checks will (by default) bind to port 8152 on localhost,
but can be configured by the following system properties:

  • go.agent.status.api.enabled. Defaults to true. Set to false to
    disable health check api endpoint
  • go.agent.status.api.bind.host. Defaults to localhost. Set to a
    specific ip address or hostname to bind to that host. Set to 0.0.0.0
    to bind to all network interfaces.
  • go.agent.status.api.bind.port. Defaults to 8152. Set to 0 to
    use an ephemeral port, which will be displayed in a log statement.

In case the http server is unable to bind (usually because of port
conflicts, or multiple agents running on the same macchine), a warning
will be emitted to the console log and agent startup will continue.

@ketan ketan force-pushed the ketan:feature/agent-status-endpoint branch 4 times, most recently from 100c817 to abadca7 Aug 25, 2017

@ketan ketan requested a review from varshavaradarajan Aug 25, 2017

@ketan

This comment has been minimized.

Copy link
Member Author

commented Aug 25, 2017

@varshavaradarajan — can you check if this health check is going to be sufficient to see if an agent is running well and connected?

@ketan ketan force-pushed the ketan:feature/agent-status-endpoint branch 3 times, most recently from 0648f1a to 342be28 Aug 25, 2017

@varshavaradarajan varshavaradarajan added this to the Release 17.10 milestone Aug 27, 2017

@ketan ketan force-pushed the ketan:feature/agent-status-endpoint branch 2 times, most recently from 46c55c0 to c7c0dc7 Sep 1, 2017

@ketan ketan force-pushed the ketan:feature/agent-status-endpoint branch 5 times, most recently from 53c0104 to 845e334 Sep 13, 2017

@ketan ketan force-pushed the ketan:feature/agent-status-endpoint branch from 845e334 to 0bba8c8 Sep 22, 2017

@varshavaradarajan
Copy link
Contributor

left a comment

As discussed, the isApproved check is not necessary as it leads to some confusing inferences (cases like - agent was registered and then deleted, server is down ). Users can use the isConnectedToServer and the agent api to figure out the agent status.

if (isPassed()) {
return NanoHTTPD.newFixedLengthResponse(Status.OK, "text/plain; charset=utf-8", "OK!");
} else {
return NanoHTTPD.newFixedLengthResponse(Status.SERVICE_UNAVAILABLE, "text/plain; charset=utf-8", "Bad!");

This comment has been minimized.

Copy link
@varshavaradarajan

varshavaradarajan Sep 25, 2017

Contributor

I don't think this should be 503. If the agent is in pending state (i.e if /health/isConnectedToServer is false), then the service is available, its just that the agent is not connected to the server.

@ketan ketan force-pushed the ketan:feature/agent-status-endpoint branch from 28f4eed to 292e0de Sep 25, 2017

@moritzheiber

This comment has been minimized.

Copy link

commented Sep 25, 2017

Honestly, the latest changes completely disregard the intent behind the issue. Now, again, one has to check in two different places at once to figure out whether an agent is healthy..

@ketan

This comment has been minimized.

Copy link
Member Author

commented Sep 25, 2017

Honestly, the latest changes completely disregard the intent behind the issue. Now, again, one has to check in two different places at once to figure out whether an agent is healthy.

I don't believe that's the case. Let me clarify:

Based on your comment (#3074 (comment)):

but that would just be icing on the cake, honestly. For now, a simple endpoint telling me "I'm alive, connected to a server and ready to execute builds (i.e. "active"), would suffice.

That's precisely what this PR will now do:

/isConnected endpoint that answers whether the agent is ready to build, i.e. it is:

  • able to connect over HTTP(s) AND
  • is approved to run builds, either by an admin or via auto-registration

The /isApproved which was added in the first commit has been removed because there's no way to answer that question until the agent is able to establish contact with the server. We can re-look at other endpoints based on feedback.

Is this not what you're expecting?

@ketan ketan force-pushed the ketan:feature/agent-status-endpoint branch from 292e0de to 5d71aa2 Sep 25, 2017

@ketan

This comment has been minimized.

Copy link
Member Author

commented Sep 25, 2017

I've updated my initial comment(#3781 (comment)) to clearly document the behavior.

@ketan ketan force-pushed the ketan:feature/agent-status-endpoint branch from 5d71aa2 to 6125e3b Sep 25, 2017

Health check api for agent (#3074)
This adds 2 endpoints for testing the health of the agent <-> server comm

* Perform a `GET` on `/health/{v1,latest}/isConnectedToServer`, will
  return status `200` along with text `OK!` if the agent is:
  - able to establish HTTP(s) contact
  - is authorized by the server (by an admin, or by auto-registration)
  The endpoint will return status `500` in any other cases.
  Two consecutive failing pings will be treated as having failed healthcheck.

These health checks will (by default) bind to port 8152 on localhost,
but can be configured by the following system properties:

* `go.agent.status.api.enabled`. Defaults to `true`. Set to false to
  disable health check api endpoint
* `go.agent.status.api.bind.host`. Defaults to `localhost`. Set to a
  specific ip address or hostname to bind to that host. Set to `0.0.0.0`
  to bind to all network interfaces.
* `go.agent.status.api.bind.port`. Defaults to `8152`. Set to `0` to
  use an ephemeral port, which will be displayed in a log statement.

In case the http server is unable to bind (usually because of port
conflicts, or multiple agents running on the same macchine), a warning
will be emitted to the console log and agent startup will continue.

@ketan ketan force-pushed the ketan:feature/agent-status-endpoint branch from 6125e3b to 97df2bf Sep 25, 2017

@moritzheiber

This comment has been minimized.

Copy link

commented Sep 25, 2017

Okay, that makes sense, thank you for clarifying.

@varshavaradarajan varshavaradarajan merged commit e22537b into gocd:master Sep 27, 2017

8 checks passed

build-linux-PR/build-non-server
Details
build-linux-PR/build-server
Details
build-windows-PR/build-non-server
Details
build-windows-PR/build-server
Details
installers-PR/dist
Details
license/cla Contributor License Agreement is signed.
Details
plugins-PR/build
Details
trigger/do-nothing
Details

@ketan ketan deleted the ketan:feature/agent-status-endpoint branch Sep 27, 2017

ketan added a commit to ketan/api.go.cd that referenced this pull request Oct 4, 2017

ketan added a commit to ketan/api.go.cd that referenced this pull request Oct 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.