Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding prometheus endpoints on port 7777 for worker and master #35

Merged
merged 7 commits into from Jan 2, 2018
Merged

Adding prometheus endpoints on port 7777 for worker and master #35

merged 7 commits into from Jan 2, 2018

Conversation

zak-hassan
Copy link
Member

Added agent to expose prometheus endpoint for spark master and workers when environment variable "SPARK_METRICS_ON" is enabled.

@elmiko
Copy link
Contributor

elmiko commented Nov 9, 2017

i'm curious if there is an advantage to using agent-bond over the prometheus official jmx exporter?

it seems like the prometheus version has more recent activity:
https://github.com/fabric8io/agent-bond
https://github.com/prometheus/jmx_exporter

is there a detail i'm missing here?

@zak-hassan
Copy link
Member Author

Its the same thing. It exposes all jmx beans to export the csv format that prometheus can scrap.

@elmiko
Copy link
Contributor

elmiko commented Nov 9, 2017

should we be using the official prometheus exporter then since it appears to have more recent work? (it seems like agent-bond hasn't been touched in awhile)

@zak-hassan
Copy link
Member Author

I could swap out the agent. Let me see.

@elmiko
Copy link
Contributor

elmiko commented Nov 9, 2017

i'm just curious if we should be using one or the other, i don't know much about the upstream development. thanks for checking!

@zak-hassan
Copy link
Member Author

I was thinking to go with agent-bond cause the maintainer is someone we can ping internally. Would it be good to send an email to ask him?

@elmiko
Copy link
Contributor

elmiko commented Nov 9, 2017

i think it would be worthwhile to get in touch, if only to understand the state of the projects.

edit: also, i agree, having someone to ping is a great feature =)

@zak-hassan
Copy link
Member Author

@rhuss What is the state of the agent-bond project. Can we use it for our project which will expose prometheus endpoint in openshift pod?

@rhuss
Copy link

rhuss commented Nov 10, 2017

@zmhassan agent-bond is pretty stable and used in our community base images at https://github.com/fabric8io-images/java and it works quite nicely. Its community supported but afair agent-bond did not made it to the product, where jolokia only is used.

@rhuss
Copy link

rhuss commented Nov 10, 2017

Could be that an update of the included agents in agent-bond might be needed, but thats should not be a big deal.

@elmiko
Copy link
Contributor

elmiko commented Nov 10, 2017

@rhuss thanks for the responses, do you know if there is any difference between the agent-bond exporter and the prometheus exporter?

and is there any advantage to one over the other?

@rhuss
Copy link

rhuss commented Nov 10, 2017

Agent-bond includes jmx_exporter. Actually agent-bond's sole purpose is to have only a single agent instead of two when using both jolokia and jmx_exporter, simplifying the configuration. Its a simple delegate to the entry points to each of the agents, with a configuration which also dispatches to both agents.

Agent-bond could be easily extended to more agents.

So if you want tmore flexibility you should add both agents separately (so you can update them separately and cutting down deps). If you want a simplified setup, you can use agent-bond (but you would need to update agent-bond when you want to update either jolokia or jmx_exporter).

@elmiko
Copy link
Contributor

elmiko commented Nov 10, 2017

ok, makes perfect sense to me now. thanks for the in-depth explanation @rhuss!

@elmiko
Copy link
Contributor

elmiko commented Nov 28, 2017

this PR lgtm, i have not tested against a prometheus setup but it seems reasonable. thanks @zmhassan

@zak-hassan
Copy link
Member Author

zak-hassan commented Nov 29, 2017

@elmiko I've tested this against prometheus and it works. If there is nothing else left then please merge. If you would like to give it a test run. You can use this template:

https://raw.githubusercontent.com/zmhassan/openshift-spark/0e06f634c62fc08443892dd378a67d8abab33628/spark-metrics-template.yaml

@zak-hassan
Copy link
Member Author

@elmiko if there is nothing else left then lets merge this PR

@tmckayus
Copy link
Collaborator

tmckayus commented Dec 5, 2017

lgtm too. I think the only issue here is to figure out how to tag and merge it. If we have folks already using jolokia-based metrics with the openshift-spark image, then merging this would break their application, no, because we swith from jolokia to prometheus and switch ports but by default it would still be tagged opensihft-spark:2.2-latest.

Our images have been identified primarily by spark version (2.2), we need to figure out how to label this.

@zak-hassan zak-hassan changed the title Adding prometheus endpoints on port 9779 for worker and master Adding prometheus endpoints on port 7777 for worker and master Dec 12, 2017
@zak-hassan
Copy link
Member Author

@tmckayus @elmiko I've changed to port to 7777 that way when you deploy metrics then we don't need to change the template.

@@ -0,0 +1,6 @@
docker build -t openshift-spark .
export SPARK_IMAGE=docker.io/analyticsci/openshift-spark:v2.1.0-metrics-prometheus
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this has PR has been waiting for awhile, spark has moved to 2.2.0. This tag should probably change to 2.2.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

Copy link
Collaborator

@tmckayus tmckayus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make the following changes I think:

  • remove the jolokia jar, it's not used anymore
  • metrics/metrics/properties is not needed, the identical file already exists in scripts/spark/added

Do the following so that the metrics configuration can be overridden as part of the spark config using a config map (these changes will cause the following files to ultimately live in $SPARK_HOME/conf where they can be overridden using the configmap mechanism)

  • move metrics/agent.properties to scripts/spark/added
  • move metrics/config.yaml to scripts/spark/added/agent-config.yaml

This will make the configuration files that govern prometheus
metrics configurable via a configmap containing spark
configuration files (just like other spark configs)

Also, remove the jolokia jar
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks reasonable to me

Carry jolokia as the default with a deprecation message and
support prometheus as well. This will be backward compatible
for any users currently using jolokia metrics.
@tmckayus
Copy link
Collaborator

tmckayus commented Dec 14, 2017

@elmiko @willb @zmhassan ptal

willb and I discussed and decided the best course was to default but deprecate jolokia for some time, then ultimately carry only prometheus going forward. This should accomplish that (and the jar size is small).

I tested this with a locally built image and passed it to the template with -p SPARK_IMAGE, it worked in all cases. The only outstanding question is default image values in the template and in the build/push script.

parameters:
- name: SPARK_IMAGE
description: Name of the Spark master/worker image
value: analyticsci/openshift-spark:v2.1.0-metrics-jolokia
value: analyticsci/openshift-spark:v2.1.0-metrics-prometheus
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should change to just a analyticsci/openshift-spark:v2.2.0 value I think, since metrics will be carried in all of our images. In fact, why not radanalytics.io/openshift-spark:2.2-latest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that

Since these templates are here for general consumption, set
the default image value to our official upstream image (it
can always be overridden locally)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants