Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Jolokia based readiness and liveness probes #170

Merged
merged 1 commit into from
Feb 6, 2018

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Dec 6, 2017

  • Load Jolokia agent when Cassandra node starts.
  • A golang client for getting local Cassandra node status
  • Linked to the Pilot readiness and liveness health functions.

You can see the source of nodetool status, which this replaces, here:

Part of #169

Release note:

NONE

@jetstack-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
We suggest the following additional approver: wallrj

Assign the PR to them by writing /assign @wallrj in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@wallrj wallrj changed the title Install and load Jolokia WIP: Jolokia based readiness and liveness probes Dec 6, 2017
@jetstack-ci-bot
Copy link
Contributor

@wallrj PR needs rebase

@wallrj wallrj changed the base branch from 23-cassandra to master December 14, 2017 11:18
@munnerz
Copy link
Contributor

munnerz commented Jan 8, 2018

/area cassandra

@jetstack-ci-bot
Copy link
Contributor

@wallrj PR needs rebase

@jetstack-ci-bot
Copy link
Contributor

@wallrj PR needs rebase

wallrj added a commit to wallrj/navigator that referenced this pull request Jan 31, 2018
* Use the Docker maintained Cassandra 3 image.
* Temporarily use TCP connect based readiness and liveness probes because this image doesn't contain a nodetool based `/ready-probe.sh` script.
* I found that these TCP connection checks succeed even when the process is SIGSTOPped; so I now simulate a node failure using `nodetool decommission` which makes cassandra stop listening on its CQL port.
* Cluster status aware readiness probes will be restored jetstack#170 is merged.

Fixes: jetstack#222
jetstack-ci-bot added a commit that referenced this pull request Jan 31, 2018
Automatic merge from submit-queue.

Use the Docker maintained Cassandra 3 image

* Use the Docker maintained Cassandra 3 image.
* Use TCP connect based readiness and liveness probe because this image doesn't contain a nodetool based `/ready-probe.sh` script.
* These TCP connection checks will succeed even though the process is stopped; so I now simulate a node failure using `nodetool decommission` which makes cassandra stop listening on its CQL port.
* I can improve on this when #170 is merged. 

Fixes: #222 

**Release note**:
```release-note
NONE
```
@jetstack-ci-bot
Copy link
Contributor

@wallrj PR needs rebase

@jetstack-ci-bot
Copy link
Contributor

/lgtm cancel //PR changed after LGTM, removing LGTM. @kragniz @munnerz @wallrj

Host string
ID uuid.UUID
State NodeState
Status NodeStatus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between State and Status? I see they have different values (above), but not sure what each of them mean. A brief comment here would help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added comments in the source code.

}

type Interface interface {
Status() (NodeMap, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename Status to Nodes? Status is becoming quite an overloaded term in this package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but again, I wanted to try and match nodetool as closely as possible.
This is intended to give a structured version of the nodetool status output.
Let's leave it as is for now and I can refactor later when I've got a better feeling for what operations we need to perform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

NodeStatusUnknown NodeStatus = "Unknown"
NodeStatusUp NodeStatus = "Up"
NodeStatusDown NodeStatus = "Down"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these strings be surfaced via our API (i.e. on a pilot.status field?).

If so, we should earmark them to move into pkg/apis/navigator/v1alpha1. Not important for merge though as afaict, they are not being used on API types currently.

jolokiaPort,
jolokiaContext,
),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay for now, but I think this environment variable should be set/managed by Pilot in future. We've got a similar question/issue coming up in the ES controller, as we need to see the heap size (via env var) too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(line comment went weird - this is specifically around setting the JAVA_OPTS env var)

@jetstack-ci-bot
Copy link
Contributor

@wallrj PR needs rebase

* Load Jolokia agent when Cassandra node starts.
* A golang client for getting local Cassandra node status
* Linked to the Pilot readiness and liveness health functions.
@munnerz
Copy link
Contributor

munnerz commented Feb 6, 2018

/lgtm
/approve

@munnerz
Copy link
Contributor

munnerz commented Feb 6, 2018

/lgtm
/approve

@jetstack-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kragniz, munnerz

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@munnerz
Copy link
Contributor

munnerz commented Feb 6, 2018

/retest

@jetstack-ci-bot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@jetstack-ci-bot
Copy link
Contributor

Automatic merge from submit-queue.

@jetstack-ci-bot jetstack-ci-bot merged commit 6b49423 into jetstack:master Feb 6, 2018
@wallrj wallrj deleted the 169-install-jolokia branch February 7, 2018 09:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants