Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upDocker engine swarm api service discovery #1766
Comments
brian-brazil
added
the
kind/enhancement
label
Jul 13, 2016
This comment has been minimized.
This comment has been minimized.
This will need to be bypassed for Prometheus service discovery. We may want to wait for a release or two for this to stabilise before adding it, and ensuring there's sufficient interest to justify the maintenance effort of another SD. |
This comment has been minimized.
This comment has been minimized.
bvis
commented
Aug 2, 2016
|
This feature would be amazing. It would allow us to simplify some dependencies now we need to manage to maintain a dynamic Prometheus environment |
This comment has been minimized.
This comment has been minimized.
michaelharrer
commented
Sep 7, 2016
|
You could use dns_sd_configs.
Its a workaround but functional. |
This comment has been minimized.
This comment has been minimized.
Cas-pian
commented
Dec 30, 2016
|
Any progress on this? I'm really looking forward to use this feature. |
This comment has been minimized.
This comment has been minimized.
genki
commented
Dec 30, 2016
|
@michaelharrer Unfortunately, there is no way to determine the node_exporter is running on which node. Only node_exporter itself knows it, but there's no option to provide it into its metrics (prometheus/node_exporter#319) |
This comment has been minimized.
This comment has been minimized.
joonas-fi
commented
Dec 30, 2016
|
I just hacked in a proof-of-concept that syncs tasks from Swarm manager to Prometheus: https://github.com/function61/prometheus-docker-swarm Current limitation is that Prometheus has to be running on a Swarm manager node. |
This comment has been minimized.
This comment has been minimized.
bvis
commented
Dec 31, 2016
•
|
@genki, @joonas-fi: I've updated the description of the image I created for getting the metrics: https://github.com/bvis/docker-prometheus-swarm. It's not perfect but it is very useful and the best I've seen until now.
@joonas-fi I'll try your solution when I get some time, probably it's a better alternative. And you don't need to have it running in a swarm manager node if you expose the metrics to the cluster thanks a a proxy. A similar approach to:
Or:
That gets the docker swarm events and expose them in the On other hand I've tried it but I didn't get it work as it tries to obtain the data from the ingress network instead of the specific network where both services are attached, I think you should allow to define it as well, do you want me to open an issue in your project? |
This comment has been minimized.
This comment has been minimized.
genki
commented
Jan 3, 2017
•
|
@bvis I have implemented your second suggestion genki@2f49d37 This inject meta labels "__domain", "__service", "__task" and "__host" after the query execution time using docker API. |
This comment has been minimized.
This comment has been minimized.
bvis
commented
Jan 4, 2017
•
|
@genki Do you have a prometheus image ready for use? It works! At least it's a first approach to a system that provides the host! Nice work! What I've seen is that these values do not appear in the "Console" column, that's why I didn't saw them. In case you fix it it would be nice to have a public image with your changes. Could this be acceptable as PR in this project? |
This comment has been minimized.
This comment has been minimized.
genki
commented
Jan 4, 2017
•
|
@bvis Thank you for reporting :) |
This comment has been minimized.
This comment has been minimized.
joonas-fi
commented
Jan 4, 2017
|
@bvis: oh man, thanks for tip regarding creating a service that exposes the manager Docker socket (via constraint) over TCP, I didn't think of that as a way to loosen the requirement of running it on a Swarm node. :) I will make the Docker URL given to Docker client configurable, as you pointed out! I'm not sure what you mean by "as it tries to obtain the data from the ingress network". To my understanding the ingress network is only for published ports and the routing mesh? So if you publish the socat port, it will be public and therefore will be visible both from the ingress network AND the container's IP itself. Publishing seems unnecessary as the port shouldn't be public anyway (security issue) and you can reach the socat service just by its name without the port being public (provided that socat service and monitoring are on the same network), if I understand correctly. :) I haven't given much thought/researched into services running on different networks (business services and monitoring on different networks). Currently my assumption is that everything's running on the same network. I'll document that caveat. It might be easy to implement, I just don't know it yet. Just to be super clear to everyone, mine and @bvis 's projects achieve different things:
|
This comment has been minimized.
This comment has been minimized.
bvis
commented
Jan 4, 2017
|
@genki The problem I see with your solution is that it does not allow to filter any query based on these values, then I cannot use it in my dashboard to get values from one or some hosts. @joonas-fi You are right when you say that it's unnecessary to publish the ports of the exporters in the routing mesh, I have used it just for debugging purposes. At the moment I removed the "--publish" option in cadvisor and node-exporter your system started to scrape the values correctly. But for use it under different environments and conditions I suggest to you to implement the networking selection feature. Another suggestion is that it would be better to split your "docker-prometheus-bridge" binary in another image to allow process isolation, with both services running in the same container some problems could come. Or try to add it in the prometheus itself. On other hand my dashboard shows the container metrics cadvisor provides. And it's easy to extend it. It would be good to allow me to create issues in your project to do a better follow-up. And a 3rd option: create a prometheus fork adding your both features: @joonas-fi and @genki. It could be very useful until the Prometheus project adds support to Docker swarm service discovery, or maybe they could accept your changes, it's one of the best things of the open source model. ;) |
This comment has been minimized.
This comment has been minimized.
genki
commented
Jan 4, 2017
•
|
@bvis Injected labels are only be usable for such as legend labels because it is not real labels in the scope of query. Prometheus is using the labels as identifier of targets, so if insert something in it causes duplication of targets while recreation of containers. My motivation was just using injected labels for legend label in Grafana such like "{{__host}}". |
Tharnas
referenced this issue
Nov 22, 2017
Merged
Service discovery for cadvisor and node-exporter #80
This comment has been minimized.
This comment has been minimized.
jmendiara
commented
Nov 23, 2017
•
|
Based on the Swarm discovery from @ContainerSolutions, I've coded that PoC that is working ok in our stagging env: Takes some of the great ideas from the original solution, but tries to fit best in a deployment where prometheus is executed in a swarm worker (dedicated) without mounting shared volumes between workers/masters (that is fairly complex in some cloud providers) and provides more swarm metadata. It also removes the "Autoconnection" to swarm networks feature, leaving this responsibility to the swarm operator that interconnects services. (although this feature can be easily brought back) The original motivation was using the hostname from the worker as the That client/server duality needed could be simplified dropping completly the client if prometheus implements a generic Please, let me know what do you think about this approach |
This comment has been minimized.
This comment has been minimized.
llitfkitfk
commented
Dec 30, 2017
This comment has been minimized.
This comment has been minimized.
cuigh
commented
Jan 8, 2018
•
|
After several months of waiting, I have implemented a simple Swarm discovery in my fork rep, maybe you guys also need it: Or download image directly
ConfigurationFor prometheus: - job_name: swarm
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
swarm_sd_configs:
- api_server: http://docker-proxy:2375
# api_version: 1.32
# group: xxx
# network: xxx
# refresh_interval: 10s
# timeout: 15s
relabel_configs:
# Add a service label
- source_labels: [__meta_swarm_service]
target_label: service
# Add a node ip label
- source_labels: [__meta_swarm_node_ip]
target_label: node_ip
# Add a node name label
- source_labels: [__meta_swarm_node_name]
target_label: node_nameFor Swarm service, You can add several labels to control scraping:
|
This comment has been minimized.
This comment has been minimized.
krjensen
commented
Jan 18, 2018
|
We really need this as well, could we get an indication from prometheus, if they want to include the functionality provided by @cuigh ? |
This comment has been minimized.
This comment has been minimized.
|
I'm closing the issue as unfortunately, we are currently not accepting new integrations. ContainerSolutions/prometheus-swarm-discovery is listed in the Prometheus documentation to integrate Docker Swarm with the file service discovery. We can only provide the stability and performance we want to provide if we can properly maintain the codebase. This includes, amongst others, test integrations in an automated and scalable fashion. For this reason, we are suggesting people integrate with the help of our generic interfaces. We have an integrations page on which integrations using our generic interfaces are listed. Even if existing integrations can not be tested in an automated fashion, we will not remove them for reasons of compatibility. This also means that any additions we take on, or any changes to existing integrations we make or accept, will mean maintaining and testing these until at least the next major version, realistically even beyond that. Feel free to question this answer on our developer mailing list, but be aware it's unlikely that you will get a different answer. |
simonpasquier
closed this
Aug 1, 2018
This comment has been minimized.
This comment has been minimized.
bborysenko
commented
Aug 2, 2018
•
|
Be aware that ContainerSolutions/prometheus-swarm-discovery is not yet ready for production usage - due to file descriptors leaks ContainerSolutions/prometheus-swarm-discovery#9. |
This comment has been minimized.
This comment has been minimized.
joonas-fi
commented
Dec 20, 2018
•
|
I updated my old proof of concept to use a better strategy: https://github.com/function61/promswarmconnect Previously it used the file service discovery type to dynamically write the file to disk based on the info in Swarm. Its drawback was that we had to make changes to the Prometheus container, overriding the entrypoint and launching the file synchronizer binary AND Prometheus. This is not robust, because we would have had to write logic to deal with either of the binaries crashing. My new approach emulates the API of the existing Triton service discovery, so we can run the released Prometheus container from Docker Hub 100 % unchanged. All you have to do is write configuration for the Triton SD in the Prometheus config file. |
SuperQ
reopened this
Dec 22, 2018
This comment has been minimized.
This comment has been minimized.
|
Docker Swarm Mode is popular enough that we can make an exception to the SD moratorium. We also previously discussed adding support for it according to @brian-brazil. |
This comment has been minimized.
This comment has been minimized.
|
I see no reason to make any exceptions, we continue to have issues maintaining what we already have. We also previously decided not to support it, and it sounds like what exists now is not what existed then. |
This comment has been minimized.
This comment has been minimized.
pdambrauskas
commented
Feb 12, 2019
|
Why continue supporting other SDs then? @brian-brazil , can you close this issue, if your team is not considering accepting implementation for it? I'm going with your suggested file-based configuration for now :) |
This comment has been minimized.
This comment has been minimized.
|
I highly disagree with this all or nothing approach and the indefinite moratorium on features/improvements to one of our core features. I think we should accept an implementation of swarm SD as long as it's reasonably well done. |
This comment has been minimized.
This comment has been minimized.
pdambrauskas
commented
Feb 12, 2019
|
It is not like "all or nothing", I see it more like responsibility split, you can keep existing integrations in default bundle of prometheus, but make it possible to add other integrations as plugins (Like I mentioned in kafka's case you just add your implementation to classpath and set config |
This comment has been minimized.
This comment has been minimized.
joonas-fi
commented
Feb 12, 2019
|
I think the file-based approach suffers from the fact that you need to have the SD agent write files in the same container as Prometheus is in. Problems:
A cleaner approach would be to make the file SD support also http URLs from which the discovery file is fetched. Whether this is a new SD plugin or a change to the file SD plugin is beside the point.. this remote file (maybe JSON) support would get us very far, so the actual SD implementation would be a separate app/container. |
This comment has been minimized.
This comment has been minimized.
pdambrauskas
commented
Feb 12, 2019
|
I agree API sd source would be much easier to integrate than file-based. However it forces you to run extra adapter-service for your integration. Having adapter on Prometheus side looks a bit nicer, but as you was mentioned, then we'd need to have custom image or mounted libs folder. |
This comment has been minimized.
This comment has been minimized.
|
There was talk a while back about a "standard discovery API" within the CNCF. AFAIK this work is stalled. Having file_sd_configs support URLs is something I've proposed several times. |
This comment has been minimized.
This comment has been minimized.
@pdambrauskas unfortunately there is no practical option in Go otherwise I guess that pluggable SD would have been done a long time ago... |
This comment has been minimized.
This comment has been minimized.
darkl0rd
commented
Feb 23, 2019
|
Have you guys seen cuigh's post above? His implementation (https://github.com/cuigh/prometheus) is complete, fully integrated and confirmed working. Considering that he already did all the heavy lifting, why not simply integrate his implementation? By the looks of things he even seems more than happy to maintain it.. . |
This comment has been minimized.
This comment has been minimized.
|
I think it would be great. @cuigh Would you be willing to open a PR to add it? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
WTFKr0
commented
Mar 5, 2019
|
Hey, just test the @cuigh fork and it fill my needs, but i prefed to stay in the main prom repo As i understand, the good solution is to create a new custom sd mechanism like the example in here : https://github.com/prometheus/prometheus/tree/master/documentation/examples/custom-sd With the code of @cuigh fork in Anybody have started working on this ? |
This comment has been minimized.
This comment has been minimized.
|
I would propose re-submitting #3687 as a new PR. We can take an official vote on the Prometheus developers list to decide if it's good enough to merge, rather than having one person on prometheus-team object. |
This comment has been minimized.
This comment has been minimized.
cuigh
commented
Mar 6, 2019
I still don't think it's a good idea to implement swarm_sd based on file_sd, unless HTTP is supported in file_sd. |
This comment has been minimized.
This comment has been minimized.
WTFKr0
commented
Mar 6, 2019
|
Yeah agree
See https://prometheus.io/blog/2018/07/05/implementing-custom-sd/ |
This comment has been minimized.
This comment has been minimized.
|
No, we don't want to remove SD from the core. We do want to make it easier to add new methods outside the core. |
This comment has been minimized.
This comment has been minimized.
WTFKr0
commented
Mar 11, 2019
|
OK So who can resubmit PR for a vote ? @cuigh I would like to improve a bit some code in your fork, can you enable issues on your fork so we can echange on that ? |
This comment has been minimized.
This comment has been minimized.
|
We discussed this at our monthly meeting today, the moratorium remains. Currently we're awaiting integration testing for a good swathe of our existing SDs, which any new SD would be expected to follow in the steps of. |
This comment has been minimized.
This comment has been minimized.
joonas-fi
commented
Mar 11, 2019
|
@brian-brazil could you then please add http support to the file SD (so the SD JSON can be fetched over HTTP), so we'd at least get a clean point of integration for adding SD agents running outside of Prometheus' container? See use case of https://github.com/function61/promswarmconnect - this would be much cleaner if it could produce JSON compatible with the file SD agent! |
This comment has been minimized.
This comment has been minimized.
|
We have a moratorium on new SDs, and we already have a clean generic interface for integrations. |
This comment has been minimized.
This comment has been minimized.
joonas-fi
commented
Mar 11, 2019
That interface is just passing complexity management to the users. With that interface I need to have a the SD binary (let's say promswarmconnect) running either:
I ask again, is all this complexity justified just because you don't want to add remote JSON support to the file SD? I can totally understand not wanting to add 4 138 different SD plugins you have to maintain, for trendiest service platform of the week, but we're asking for an olive branch here because what you're suggesting is far from elegant and especially not of a microservice philosophy which Prometheus in other regards so elegantly fits in. TL;DR: generic HTTP based SD integration is the only elegant way we'll be able to build SD integrations outside of Prometheus' tree. |
This comment has been minimized.
This comment has been minimized.
The sidecar model is pretty standard, and not something you can really avoid if you're using Prometheus. We assume a POSIX system, and that includes processes being able to share filesystems, send each other signals etc.
I've done it in the past, the bash scripting is a little finicky, but it's quite doable. Especially if you can use a non-ancient version of bash.
I disagree here, and there's many out there that build fine on what we have. Writing code and deploying it are separate concerns, and I don't think we should be adding features just because one particular deployment system happens to lack a basic feature. |
This comment has been minimized.
This comment has been minimized.
cuigh
commented
Mar 12, 2019
•
PR was already merged and I enabled issues setting also, thanks. |

F21 commentedJun 26, 2016
•
edited
In docker 1.12, the docker engine will ship with swarm mode built in. This means that it is now possible to stand up a swarm cluster using a bunch of nodes with just docker installed. In addition, swarm mode will come with dns and health checks built-in, negating the need to run consul or some other service discovery mechanism. More info here: https://docs.docker.com/engine/swarm/
It would be nice if prometheus can directly use the new services API to discover services running in a swarm cluster: https://docs.docker.com/engine/reference/api/docker_remote_api_v1.24/#3-8-services
Perhaps the config option could be called
docker_swarm_sd.