Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD interface and binary for spitting out file-sd #3658

Closed
gouthamve opened this Issue Jan 5, 2018 · 33 comments

Comments

Projects
None yet
6 participants
@gouthamve
Copy link
Member

gouthamve commented Jan 5, 2018

We get a lot of requests and PRs for new SD mechanisms and the current moratorium doesn't help people wanting new SD mechanisms.

We need a simple daemon that people can just plug in their SD implementation and it spits out file-sd. For example, the main file could be simply:

// boilerplate that initializes everything like flags.

sd.Register(NoopSD())

sd.Run(outputfile, other options)

And people can just run: ./sd --output.file=<path>

And they can write their implementation and swap out sd.Register(NoopSD()) with something like:

// More flags for the SD configs.
sdImpl := NewCustomSD(options...)
sd.Register(sdImpl)

And then they can have file-sd like: ./sd <custom-flags for SD config> --output.file=<path>

Now if this file-sd mechanism is being used by enough people, adding it to prometheus-server would be simple as its the same interface.

/cc @cstyan

@gouthamve gouthamve changed the title SD interface and binary SD interface and binary for spitting out file-sd Jan 5, 2018

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Jan 5, 2018

started

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 5, 2018

The SD provider/Discoverer should send only the updates and not all target groups on every update. in my tests currently Consul sends all updates where k8s send only the updates.

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Jan 5, 2018

We still want to keep the existing SDs compiled in to the release binary, right? This is primarily to have a better answer than "implement a file sd generator", if I understand correctly.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 5, 2018

Yes. Yes, this is a potential path to make things easier to swap in/out of Prometheus.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Jan 14, 2018

Should the SD implementations that people write as part of using the sd tool follow the same rule of only sending updates on the channel, and not the entire list of targets/labels?

@gouthamve

This comment has been minimized.

Copy link
Member Author

gouthamve commented Jan 14, 2018

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 28, 2018

how about using go plugins for this?
https://golang.org/pkg/plugin/

just pass the plugin name when you start prometheus using an argument like --discovery-plugin='docker-swarm.so'
In theory the prometheus codebase shouldn't need any changes after the initial implementation.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 28, 2018

or another idea is to use local socket communication and the Discoverer can be written on any language and just send its targets on the socket.

Not sure how it will work on Windows though.
maybe tcp sockets can be used for a cross platform solution.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 28, 2018

and another idea is to use grpc

@gouthamve

This comment has been minimized.

Copy link
Member Author

gouthamve commented Jan 28, 2018

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 28, 2018

I don't know just seems a bit awkward approach.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 28, 2018

might be worth some cons and pros for the different approaches before taking a final decision.

one con I see is that file based SD assumes the Discoverer has access to the same file system as Prometheus.

one Pro for grps is that it is quite popular and this might make it more clear how to use and implement.
Just an example proto file which can be used to generate client for any language that support grpc.

prometheus can provide a single grpc endpoint and any Discoverer would call that to update the targets list.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 28, 2018

What do you think if I start a google doc and ask for some feedback from all users that have opened a PR/issue to request for a new Discoverer implementation?

The idea is to get some real use-cases and ideas how to solve this with the absolute minimum work needed to implement and minimum support overhead.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 28, 2018

The plugin discussion is over in #2037. I remain against plugins, having a clear demarcation point makes support much easier. Files are what we should use here, not inventing another generic integration point.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 30, 2018

@cstyan the way I understand @gouthamve's suggestion and my personal preference is that this tool would just spit out a file with the targets and should not be coupled with the Discovery manager in any way.

The only resemblance or a requirement would be to implement the Discoverer interface so that if it gets popular it can be plugged in with a minimal effort.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Jan 30, 2018

@krasi-georgiev we discussed further on irc, but I had initially implemented the way you're referring to. However, we realized that Manager was already doing a lot of the heavy lifting to track and store target groups/labels properly, and dedup some things. Note that the adapter makes use of it's own instance of Manager and not the same Manager as your Prometheus process. The file that the adapter writes can then be picked up by Prometheus via the config.

IMO doing all of this properly without using Discovery.Manager would result in a lot of duplicated logic. Unless there's something I'm missing, I think we should proceed with #3720

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 30, 2018

@cstyan yes I see what you mean.
I think we need to think a bit more if there could be a better way to solve this.

I opened a group post so lets wait a bit to read some more ideas.
https://groups.google.com/forum/#!topic/prometheus-developers/JcE5nSbCe4k

@CAFxX

This comment has been minimized.

Copy link

CAFxX commented Feb 8, 2018

Wouldn't it make more sense, and be more in line with the Prometheus pull model, to have a generic-purpose HTTP SD? Something identical to the file SD (maybe even easier, just pick one of JSON/YAML/TOML), but over HTTP(S) instead of the local file system. This would effectively decouple Prometheus from the infrastructure being monitored (the file SD requires colocating something to Prometheus, HTTP would lift this requirement)

This ticket would simply transition to provide the skeleton of a daemon providing the HTTP endpoint to query (instead of the daemon that spits out files)

Alternatively, if the pull model is not deemed a good fit for SD, then an HTTP push SD endpoint exposed by Prometheus would do just fine.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Feb 8, 2018

@CAFxX that's been discussed and decided against already #3602 afaik, the consensus is that we'll support common SD mechanisms that already exist (Consul, k8s, zookeeper, etc.) plus file sd.

@CAFxX

This comment has been minimized.

Copy link

CAFxX commented Feb 8, 2018

@cstyan the solution @brian-brazil is suggesting still assumes it's ok to colocate stuff on the prometheus server... beside this there is no "discussion" in that issue :/

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Feb 8, 2018

it was also discussed in #2514 and #1675, as well as multiple dev mailing list threads

@gouthamve

This comment has been minimized.

Copy link
Member Author

gouthamve commented Feb 8, 2018

@CAFxX The decision is to support one generic mechanism and not multiple. Having a HTTP based one would require people run a daemon while file_sd poses no such requirements. Having restrictions on the access to the prometheus server seems like a self-induced issue :)

Having said that, there is a slight chance we would also support for HTTP based mechanism, but imo, file_sd works pretty well.

@CAFxX

This comment has been minimized.

Copy link

CAFxX commented Feb 8, 2018

Having restrictions on the access to the prometheus server seems like a self-induced issue :)

It wouldn't be an issue if a more generic method that didn't require file system access was supported :)

Having said that, there is a slight chance we would also support for HTTP based mechanism, but imo, file_sd works pretty well.

I'm pretty sure that, when applicable, file_sd works well. The problem is that it's not always applicable (as our "self-induced" issue clearly proves)

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Feb 8, 2018

You already have to colocate the Prometheus configuration file, why is a file_sd file an issue? Is there something I'm missing?

@CAFxX

This comment has been minimized.

Copy link

CAFxX commented Feb 8, 2018

@cstyan off the top of my head, setting a URL in a configuration file (assuming we're talking about HTTP pull) and having a daemon (albeit a simple one) running are two pretty different things, no?

In addition, if instead we used an HTTP push model (more similar to all existing SDs) that URL in the configuration would not even be required, and we would get as close as possible to full decoupling.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Feb 8, 2018

Sorry, I thought you were referring to colocation of the files, not the processes. You can still run the file_sd adapter within the same container or k8s pod.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

In addition, if instead we used an HTTP push model (more similar to all existing SDs) that URL in the configuration would not even be required, and we would get as close as possible to full decoupling.

That would require clients to know all Prometheus servers that want to monitor them which loses one of the big benefits of the top-down SD approach we take. If you want that there's already Consul&friends.

@CAFxX

This comment has been minimized.

Copy link

CAFxX commented Feb 8, 2018

@brian-brazil I wasn't talking about clients. I'm talking about service discovery. We would like to plug in our service discovery mechanisms without colocating processes on the prometheus server. So only the service discovery "system" would need to be aware of the url of the prometheus server. You don't lose any benefit by doing this.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

That has the same disadvantages, someone can't spin up a random Prometheus and have SD work. I'm also not aware of any SD system that works this way, and we don't want to invent one. There's file_sd if you want to do this yourself for some reason.

@CAFxX

This comment has been minimized.

Copy link

CAFxX commented Feb 8, 2018

@brian-brazil at risk of repeating myself, that would work if I was allowed to write a file to the filesystem of that specific prometheus server every time the services to be monitored change. In our organization that unfortunately is not an option, yet we have to make this work somehow.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 8, 2018

As Goutham says, this is a self-induced issue that I'm afraid we can't help you with.

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Jun 5, 2018

I think we can close this since #3720 is merged

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.