New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding prometheus endpoints on port 7777 for worker and master #35
Conversation
i'm curious if there is an advantage to using agent-bond over the prometheus official jmx exporter? it seems like the prometheus version has more recent activity: is there a detail i'm missing here? |
Its the same thing. It exposes all jmx beans to export the csv format that prometheus can scrap. |
should we be using the official prometheus exporter then since it appears to have more recent work? (it seems like agent-bond hasn't been touched in awhile) |
I could swap out the agent. Let me see. |
i'm just curious if we should be using one or the other, i don't know much about the upstream development. thanks for checking! |
I was thinking to go with agent-bond cause the maintainer is someone we can ping internally. Would it be good to send an email to ask him? |
i think it would be worthwhile to get in touch, if only to understand the state of the projects. edit: also, i agree, having someone to ping is a great feature =) |
@rhuss What is the state of the agent-bond project. Can we use it for our project which will expose prometheus endpoint in openshift pod? |
@zmhassan agent-bond is pretty stable and used in our community base images at https://github.com/fabric8io-images/java and it works quite nicely. Its community supported but afair agent-bond did not made it to the product, where jolokia only is used. |
Could be that an update of the included agents in agent-bond might be needed, but thats should not be a big deal. |
@rhuss thanks for the responses, do you know if there is any difference between the agent-bond exporter and the prometheus exporter? and is there any advantage to one over the other? |
Agent-bond includes jmx_exporter. Actually agent-bond's sole purpose is to have only a single agent instead of two when using both jolokia and jmx_exporter, simplifying the configuration. Its a simple delegate to the entry points to each of the agents, with a configuration which also dispatches to both agents. Agent-bond could be easily extended to more agents. So if you want tmore flexibility you should add both agents separately (so you can update them separately and cutting down deps). If you want a simplified setup, you can use agent-bond (but you would need to update agent-bond when you want to update either jolokia or jmx_exporter). |
ok, makes perfect sense to me now. thanks for the in-depth explanation @rhuss! |
this PR lgtm, i have not tested against a prometheus setup but it seems reasonable. thanks @zmhassan |
@elmiko I've tested this against prometheus and it works. If there is nothing else left then please merge. If you would like to give it a test run. You can use this template: |
@elmiko if there is nothing else left then lets merge this PR |
lgtm too. I think the only issue here is to figure out how to tag and merge it. If we have folks already using jolokia-based metrics with the openshift-spark image, then merging this would break their application, no, because we swith from jolokia to prometheus and switch ports but by default it would still be tagged opensihft-spark:2.2-latest. Our images have been identified primarily by spark version (2.2), we need to figure out how to label this. |
build-spark-push.sh
Outdated
@@ -0,0 +1,6 @@ | |||
docker build -t openshift-spark . | |||
export SPARK_IMAGE=docker.io/analyticsci/openshift-spark:v2.1.0-metrics-prometheus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this has PR has been waiting for awhile, spark has moved to 2.2.0. This tag should probably change to 2.2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make the following changes I think:
- remove the jolokia jar, it's not used anymore
- metrics/metrics/properties is not needed, the identical file already exists in scripts/spark/added
Do the following so that the metrics configuration can be overridden as part of the spark config using a config map (these changes will cause the following files to ultimately live in $SPARK_HOME/conf where they can be overridden using the configmap mechanism)
- move metrics/agent.properties to scripts/spark/added
- move metrics/config.yaml to scripts/spark/added/agent-config.yaml
This will make the configuration files that govern prometheus metrics configurable via a configmap containing spark configuration files (just like other spark configs) Also, remove the jolokia jar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks reasonable to me
Carry jolokia as the default with a deprecation message and support prometheus as well. This will be backward compatible for any users currently using jolokia metrics.
willb and I discussed and decided the best course was to default but deprecate jolokia for some time, then ultimately carry only prometheus going forward. This should accomplish that (and the jar size is small). I tested this with a locally built image and passed it to the template with -p SPARK_IMAGE, it worked in all cases. The only outstanding question is default image values in the template and in the build/push script. |
spark-metrics-template.yaml
Outdated
parameters: | ||
- name: SPARK_IMAGE | ||
description: Name of the Spark master/worker image | ||
value: analyticsci/openshift-spark:v2.1.0-metrics-jolokia | ||
value: analyticsci/openshift-spark:v2.1.0-metrics-prometheus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should change to just a analyticsci/openshift-spark:v2.2.0 value I think, since metrics will be carried in all of our images. In fact, why not radanalytics.io/openshift-spark:2.2-latest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that
Since these templates are here for general consumption, set the default image value to our official upstream image (it can always be overridden locally)
Added agent to expose prometheus endpoint for spark master and workers when environment variable "SPARK_METRICS_ON" is enabled.