Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cassandra: JMX prometheus exporter #3228

Merged
merged 1 commit into from
Feb 3, 2020

Conversation

nabokihms
Copy link
Contributor

@nabokihms nabokihms commented May 28, 2019

Export prometheus metrics for cassandra in sidecar container

Signed-off-by: Maksim Nabokikh maksim.nabokikh@flant.com

Description of your changes:
Add jmx exporter with standard configuration to sidecar container.

Test steps:

  • Start minikube with tests/scripts/minikube.sh up
  • Deploy cassandra from cluster/examples/kubernetes/cassandra
  • Get the metrics with curl

This dashboard seems ok https://grafana.com/dashboards/5408 for monitoring

Which issue is resolved by this Pull Request:
Resolves #2530

Checklist:

  • Documentation has been updated, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.
  • Comments have been added or updated based on the standards set in CONTRIBUTING.md

[test cassandra]

@nabokihms nabokihms force-pushed the cassandra-jmx-metrics branch 3 times, most recently from 7415fb1 to d02f91f Compare July 4, 2019 13:09
Copy link
Member

@galexrt galexrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if you could verify the Prometheus annotations position with, e.g., prometheus-operator to see that Prometheus picks the Pods up.

pkg/operator/cassandra/controller/util/resource.go Outdated Show resolved Hide resolved
@nabokihms
Copy link
Contributor Author

There is new option in cluster.cassandra CRD, which provides the ability to specify ConfigMap for jmx-prometheus-exporter for Cassandra Rack.

@nabokihms
Copy link
Contributor Author

@travisn @galexrt I will appreciate any feedback for this PR.

@yanniszark
Copy link
Contributor

@nabokihms this is excellent!
Thanks a lot for your effort, I just found out about this PR.
I will put some cycles of my spare time in reviewing this PR so we can get it merged asap 😄

Copy link
Contributor

@yanniszark yanniszark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nabokihms thanks a lot for your work!
I have a few comments on the sidecar code, but other than that it looks great.

In addition, ConfigMap updates are currently not handled. This means that if a ConfigMap is updated, changes will be propagated but the Pods won't be restarted (which I assume is necessary for jmx_exporter to pick up the changes).

We can either:

  1. Implement an update mechanism. For example, save the ConfigMap SHA-256 hash in a label of each StatefulSet's PodTemplate. This way, a change in the ConfigMap hash will trigger a rolling update across all Racks.
  2. Leave that for later and instead document how to change the ConfigMap manually. The process is
    • Apply new ConfigMap.
    • Rolling restart all Racks:
      statefulsets=[...]
      for name in statefulsets:
          kubectl rollout restart statefulset $name -n $namespace

pkg/operator/cassandra/sidecar/config.go Outdated Show resolved Hide resolved
pkg/operator/cassandra/test/test.go Outdated Show resolved Hide resolved
@yanniszark
Copy link
Contributor

@nabokihms any update on this?
Do you plan to continue the work on this PR? Would be awesome to get this in! 😄

@nabokihms
Copy link
Contributor Author

@yanniszark I will finish this till weekends.

@travisn
Copy link
Member

travisn commented Dec 3, 2019

@nabokihms How is this looking? We are starting to lock down for the 1.2 release by the end of the week. Thanks!

@nabokihms
Copy link
Contributor Author

Thanks for informing me. We want to merge it in 1.2. I will try my best to end work on the PR till tomorrow.

@nabokihms nabokihms force-pushed the cassandra-jmx-metrics branch 2 times, most recently from 5ae2029 to 24f17c7 Compare December 4, 2019 23:17
@nabokihms
Copy link
Contributor Author

I made the necessary changes and also updated the docs to make difficult cases clear.

@paulcostinean
Copy link

paulcostinean commented Jan 16, 2020

Is this getting any more love soon? Would be nice to have the option to use the jmx exporter and it looks like 90% of the work is already done 🚀

@yanniszark
Copy link
Contributor

@paulcostinean thanks for the ping! :)
I'll try to take a look this weekend.

@yanniszark
Copy link
Contributor

yanniszark commented Jan 25, 2020

@nabokihms thanks a lot for your effort!
Overall, this looks ready to merge, only very small things remaining:

  • Rebase on latest master and resolve merge conflict of Release Notes.

  • Since this PR introduces a new field, you have to regenerate the Cassandra client code (Deepcopy methods). The procedure I followed is:

rm -rf vendor
GO111MODULE=off make vendor
GO111MODULE=off make codegen
  • The test.go file has a newline has changed (newline) while it shouldn't be touched by this PR. Can you revert that change?

  • Add some commands to help the user restart their Cassandra Cluster in the event of a ConfigMap change:

NAMESPACE=<namespace>
CLUSTER=<cluster_name>

RACKS=$(kubectl get sts -n ${NAMESPACE} -l "cassandra.rook.io/cluster=${CLUSTER}")
echo ${RACKS} | xargs -n1 kubectl rollout restart -n ${NAMESPACE}

To make it more comfortable, I included these changes in a review branch, so you can see what I mean:
https://github.com/yanniszark/rook/tree/cassandra-jmx-metrics-review

After that, we should be ready to merge!

@ghost
Copy link

ghost commented Jan 25, 2020

There were the following issues with this Pull Request

  • Commit: 7d9a62b
    • ✖ message may not be empty
    • ✖ type may not be empty

You may need to change the commit messages to comply with the repository contributing guidelines.


🤖 This comment was generated by commitlint[bot]. Please report issues here.

Happy coding!

@nabokihms
Copy link
Contributor Author

@yanniszark Thanks a lot for the PR review and especially for the review branch! It helps a lot. I merged all the changes to the current branch.

Copy link
Contributor

@yanniszark yanniszark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nabokihms thanks for all the great work!
I tested and everything looks good.
You just have a small bug that was caught by a unit test (that's why the CI was failing).

More specifically, you need to check if the jmxExporterConfigMapName field is nil before dereferencing.
I have included comments in the relevant parts.

Other than that, we're good to go!
/lgtm
/approve

If needed, I will approve again after the necessary changes.
Let's get this through the door!

pkg/operator/cassandra/controller/util/resource.go Outdated Show resolved Hide resolved
pkg/operator/cassandra/controller/util/resource.go Outdated Show resolved Hide resolved
@yanniszark
Copy link
Contributor

I see errors in two tests: TestCephHelmSuite and TestCephBlockSuite.
I suspect those are flakes and shouldn't be failing as the code changes in this PR are irrelevant to them.
@nabokihms can you make sure this is rebased on latest master?

cc @travisn @galexrt

Export prometheus metrics for cassandra in sidecar container

Signed-off-by: Maksim Nabokikh <maksim.nabokikh@flant.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[cassandra] Add support for Prometheus/Grafana monitoring
5 participants