Support sharing registry across multiple workers (where possible) #30

discordianfish · 2015-04-22T18:41:08Z

Similar to prometheus/client_ruby#9, when using the python client library on workers that get load balanced by something like gunicorn or uwsgi, each scrape it hits only one worker since they can't share state with others.

At least uwsgi supports sharing memory: http://uwsgi-docs.readthedocs.org/en/latest/SharedArea.html
This should be used to share the registry across all workers. Maybe gunicorn supports something similar.

brian-brazil · 2015-04-22T20:27:06Z

Thinking on this, we'd want to implement the sharing mechanism ourselves so it could be used across any system. It will be limited to counter-based metrics.

brian-brazil · 2015-05-05T22:44:02Z

https://github.com/brian-brazil/client_python/tree/multi-process is a first pass at this, which seems to work. It only really does the countery types at the moment, and comes with a lot of caveats.

justyns · 2015-07-30T16:42:28Z

Have you come up with a solution to this yet by any chance? We're just starting to look at using prometheus in some of or python apps, and most of them are all using uwsgi with multiple workers/processes.

brian-brazil · 2015-07-30T16:54:14Z

You can see how this will work in my previous update, that works but needs cleaning up to handle all the edge cases.

justyns · 2015-07-30T19:58:50Z

I'm still looking through the code to see how everything works, but I haven't been able to get the multi-process branch you linked to to work yet.

I tried something like this:

from prometheus_client import Summary, Counter, Gauge, start_http_server, REGISTRY, CollectorRegistry, MultiProcessCollector
registry = MultiProcessCollector()
# Prometheus stats
http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'], registry=registry)

and get this error:

  File "./app/metrics.py", line 4, in <module>
    http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'], registry=registry)
  File "/usr/lib/python2.7/site-packages/prometheus_client/__init__.py", line 257, in init
    registry.register(collector)
AttributeError: 'MultiProcessCollector' object has no attribute 'register'

I did set the prometheus_multiproc_dir env variable.

Am I attempting to do this correctly by creating a new registry from MultiProcessCollector and passing that to the individual metrics?

Or is MultiProcessCollector supposed to be used automatically if the prometheus_multiproc_dir env variable is set?

I could be doing something else wrong, but I tried this originally:

from prometheus_client import Summary, Counter, Gauge, start_http_server

# Prometheus stats
http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'])
...
start_http_server(9080, '0.0.0.0')

When I tried the above, I saw the *.db shelve files being created, but when I went to port 9080, only the ProcessCollector metrics showed up.

EDIT In case it wasn't obvious, I am using http_requests.labels(request.method, request.path, resp.status_code).inc() in the code as well. I see all of the metrics if I run the flask app in a single process without the multiproc env variable instead of using uwsgi.

brian-brazil · 2015-07-30T22:19:03Z

The MultiProcessCollector() is a collector, you need to add it on it's own to a registry and use that to generate the metric text output. MultiProcessCollector() is how you get the metrics out,

justyns · 2015-08-03T16:16:47Z

Thanks for the help, I was able to get it working in one app as a test. The only caveat so far is that I had to add a command to clean out $prometheus_multiproc_dir/*.db when the service stops.

brian-brazil · 2015-08-03T16:35:46Z

Yeah, that's the intended way to use it.

brian-brazil · 2015-10-26T16:27:45Z

I've done some additional work which is up at https://github.com/prometheus/client_python/tree/multiproc

In terms of metrics, everything except gauges are supported.

brian-brazil · 2015-10-26T18:34:14Z

I've completed gauge support and added docs. Please try out the multiproc branch.

justyns · 2015-10-26T20:48:52Z

Thanks for the update @brian-brazil! I'll test it out when I get a chance soon

rvrignaud · 2015-10-27T16:44:38Z

Is this specific to gunicorn ? We are using uwsgi.

brian-brazil · 2015-10-27T16:48:23Z

It shouldn't be, I've only tested it with gunicorn.

grobie · 2015-10-27T17:09:45Z

What are the performance implications? Have you done any benchmarks?

On Tue, Oct 27, 2015 at 12:48 PM, Brian Brazil notifications@github.com
wrote:

It shouldn't be, I've only tested it with gunicorn.

—
Reply to this email directly or view it on GitHub
#30 (comment)
.

brian-brazil · 2015-10-27T17:17:39Z

I haven't done any benchmarks, there's a fdatasync in there that I need to eliminate though.

taynaud · 2015-10-28T13:43:49Z

Hi,

We are trying tu use your multi process implementation with an application in flask behind uwsgi.

As I understand your example for Gunicorn, you define two functions:

worker_exit -> perform some cleaning at the end of the process
app -> expose the metrics to replace the start_http_server

I have discarded the work_exit for my poc, I use only basic metrics. I have implemented a route in my application to display the metrics:

@main.route('/metrics')
def metrics():
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    data = generate_latest(registry)
    response_headers = [
        ('Content-type', CONTENT_TYPE_LATEST),
        ('Content-Length', str(len(data)))
    ]
    return Response(response=data, status=200, headers=response_headers)

There is an issue in generate_latest. I get exception "_gdbm.error: [Errno 11] Resource temporarily unavailable". This is because the .db files are opened. I put a try/except around that and try to open the db as read only and it presents metric, but only those of the dead process. As long as the db stay open, I get the exception and thus cannot get metrics of current processes.

It cannot work, the next snippet present the same behavior:

>>> import shelve as s
>>> fw = s.open("./test.db")                                                                                                                                                                                                   
>>> fw["test"] = "test"
>>> fw.sync()
>>> fr = s.open("./test.db")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/shelve.py", line 243, in open
    return DbfilenameShelf(filename, flag, protocol, writeback)
  File "/usr/lib/python3.5/shelve.py", line 227, in __init__
    Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
  File "/usr/lib/python3.5/dbm/__init__.py", line 94, in open
    return mod.open(file, flag, mode)
_gdbm.error: [Errno 11] Resource temporarily unavailable
>>> fw.close()
>>> fr = s.open("./test.db")
>>> fr
<shelve.DbfilenameShelf object at 0x7fe65de82d30>

Do you have ideas how to work around that ? Am I doing something wrong ?

Best,

brian-brazil · 2015-10-28T21:00:00Z

It looks like you're using Python 3.5 and gdbm, while I tested on Python 2.7 using hashdb. It looks like this change will need some work to also work on Python 3.5.

taynaud · 2015-10-29T09:46:27Z

My last snippet works indeed with python 2.7 but not with python 3.4.3.

The documentation of shelve, in python 2 and 3 states that:
"The shelve module does not support concurrent read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing. Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used."

Thus I think you should use another DB.

I have also try to replace the default shelve with https://www.jcea.es/programacion/pybsddb.htm
Basically, I have replaced
import shelve
with
import bsddb3.dbshelve as shelve

after a pip install bsddb3 since it is not in python library anymore.
I also have to encode/decode the keys from str to bytes.

It seems to work, but I do not know if the documentation statement about concurrency holds.

I may provide a patch if required.

brian-brazil · 2015-10-29T10:31:03Z

That bit of documentation only applied to shelve, I was thinking we could use bsddb directly which also will give us a bit more control performance wise.

gjcarneiro · 2016-06-08T18:01:02Z

Excuse me, I am just looking at Prometheus for the first time, but my first impression is that I think this whole "sharing" design you guys are attempting is just over-engineered.

In my opinion, there should be no sharing. Instead, each gunicorn (or whatever) worker should start listening on a random port, and the prometheus server would (somehow) discover those ports and start scraping individual workers. Aggregation would then be done at a higher level.

I wish the client libraries would be allowed to register themselves with a prometheus server, to tell the server that they are alive and what is the port they can be scraped on. Hm, maybe that is what a service like Consul is all about, but I haven't looked in detail yet...

brian-brazil · 2016-06-08T18:16:02Z

That's how Prometheus generally works, but isn't compatible with gunicorn designs which tend to bring processes up/down too often for that to work well.

gjcarneiro · 2016-06-08T19:54:55Z

If Prometheus isn't compatible with this, then I think I'm better off with statsd.

But I'm not convinced! Gunicorn doesn't bring processes up and down too often. Worker processes tend to remain up for very long time.

I really think this (using Consul for discovering worker processes) has a fair chance of working, and I'm willing to invest some time to see if I can really make it work well.

brian-brazil · 2016-06-08T20:46:32Z

Worker processes tend to remain up for very long time.

From my research support for dynamically changing the number of processes such as in response to load is not uncommon. There's also the max_requests setting.

The approach we use here will be applied to other languages such as Ruby.

gjcarneiro · 2016-06-09T10:41:01Z

From the Prometheus docs:

CAUTION: Remember that every unique key-value label pair represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

So maybe you have a point. My approach of having per process metrics would require to have the pid number or some such as a label, which seems to be a against the recommendation above.

OTOH, processes sharing data via a db file is a terrible solution, as well as any WSGI-specific solution (we have not only gunicorn workers, but also celery workers, and even simple multiprocessing module subprocesses).

Unfortunately I can't think of any good, efficient, and simple solution. Here's an idea I just had, but take it more as a thought process than a finished idea:

Each worker tries to bind its well known fixed http port;
If it successfully binds, he becomes the representative of all the workers and will be scraped by Prometheus;
If it cannot bind that port, it means another worker has become representative first, so, instead, it will bind a random port, and then register with the representative worker telling it "hey, I am another sibling worker, and here's my port";
Whenever Prometheus sends an HTTP request to scrape the main worker, it knows how many "siblings" it has and so, scrapes each sibling as well. It aggregates the siblings' metrics with its own metrics, before returning them to Prom.
To allow for the representative worker to die and be replaced, as well as additional workers dying, each sibling worker will attempt to become main worker periodically: bind the fixed port and becomes main worker, or else register itself with the main worker again. The main worker removes from its list of siblings the other workers that have not registered with it in a long time.

I think this solution scales pretty well, since aggregation of data is done only at each scrape interval, not for every sample. The downside is that it is a bit complicated. :(

brian-brazil · 2016-06-09T10:43:37Z

It aggregates the siblings' metrics with its own metrics, before returning them to Prom.

This is where things fall down. How do you aggregate data from processes that no longer exist?

A file per process avoids this problem, as the files survive the process.

gjcarneiro · 2016-06-09T10:50:26Z

Well, if a process dies, the http socket to it will fail with connection refused and we ignore it. We will not aggregate data from it for the last scrape interval. So you lose a few seconds of the metric data of that worker, in this case, not a big deal. Again, I am not convinced worker processes start and die very often, they tend to remain rather stable.

brian-brazil · 2016-06-09T11:05:59Z

We will not aggregate data from it for the last scrape interval.

That would break all counters, you need that data.

gjcarneiro · 2016-06-09T11:22:13Z

So what if in an web app you don't count the last few http requests that come through one of the workers; it's not a big deal if it happens rarely. It's not like Prometheus will be used for banking transactions or anything... If it is still a problem, an atexit handler that would proactively send the last data to the elected worker could solve it. It's still better than saving a DB file every time you update a metric, that is terrible for performance...

Anyway, I should probably stop spamming this thread, I don't want to be annoying. A poor solution is better than no solution at all. If I find time to prototype something, we can reopen the discussion. It's usually easy to think of solutions that often break down when you try to code them, I know...

oliver006 · 2016-09-06T15:50:14Z

Any updates or solutions to this problem?
I'm running into this issue when trying to add metrics to a flask/uwsgi python 2.7 app.
Has anyone tried using e.g. the statsd exporter as a work-around?

bendemaree · 2016-09-07T04:06:12Z

@oliver006 After abandoning #70 moving to the statsd exporter is exactly what we did; works just fine!

oliver006 · 2016-09-07T16:48:01Z

Thanks Ben. I went the statsd_exporter route, deploying one locally with each of my python apps and so far that's been working well.

rud · 2016-09-27T13:42:32Z

https://github.com/korfuri/django-prometheus/blob/master/documentation/exports.md#exporting-metrics-in-a-wsgi-application-with-multiple-processes is a design for letting each worker-process listen on a distinct prometheus port. Does assume the collector will scrape all the allowed reporting ports to catch 'em all.

gjcarneiro · 2016-09-28T14:44:29Z

Cool, I quite like the django-prometheus solution!

hynek mentioned this issue Oct 22, 2015

Expose http endpoint details #63

Closed

brian-brazil mentioned this issue Oct 27, 2015

Add multi-process support #66

Merged

bendemaree mentioned this issue Jan 3, 2016

WIP: uWSGI Multiprocess Support #70

Closed

4 tasks

brian-brazil closed this as completed in #66 Oct 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support sharing registry across multiple workers (where possible) #30

Support sharing registry across multiple workers (where possible) #30

discordianfish commented Apr 22, 2015

brian-brazil commented Apr 22, 2015

brian-brazil commented May 5, 2015

justyns commented Jul 30, 2015

brian-brazil commented Jul 30, 2015

justyns commented Jul 30, 2015

brian-brazil commented Jul 30, 2015

justyns commented Aug 3, 2015

brian-brazil commented Aug 3, 2015

brian-brazil commented Oct 26, 2015

brian-brazil commented Oct 26, 2015

justyns commented Oct 26, 2015

rvrignaud commented Oct 27, 2015

brian-brazil commented Oct 27, 2015

grobie commented Oct 27, 2015

brian-brazil commented Oct 27, 2015

taynaud commented Oct 28, 2015

brian-brazil commented Oct 28, 2015

taynaud commented Oct 29, 2015

brian-brazil commented Oct 29, 2015

gjcarneiro commented Jun 8, 2016

brian-brazil commented Jun 8, 2016

gjcarneiro commented Jun 8, 2016

brian-brazil commented Jun 8, 2016

gjcarneiro commented Jun 9, 2016

brian-brazil commented Jun 9, 2016

gjcarneiro commented Jun 9, 2016

brian-brazil commented Jun 9, 2016

gjcarneiro commented Jun 9, 2016

oliver006 commented Sep 6, 2016

bendemaree commented Sep 7, 2016

oliver006 commented Sep 7, 2016

rud commented Sep 27, 2016

gjcarneiro commented Sep 28, 2016

Support sharing registry across multiple workers (where possible) #30

Support sharing registry across multiple workers (where possible) #30

Comments

discordianfish commented Apr 22, 2015

brian-brazil commented Apr 22, 2015

brian-brazil commented May 5, 2015

justyns commented Jul 30, 2015

brian-brazil commented Jul 30, 2015

justyns commented Jul 30, 2015

brian-brazil commented Jul 30, 2015

justyns commented Aug 3, 2015

brian-brazil commented Aug 3, 2015

brian-brazil commented Oct 26, 2015

brian-brazil commented Oct 26, 2015

justyns commented Oct 26, 2015

rvrignaud commented Oct 27, 2015

brian-brazil commented Oct 27, 2015

grobie commented Oct 27, 2015

brian-brazil commented Oct 27, 2015

taynaud commented Oct 28, 2015

brian-brazil commented Oct 28, 2015

taynaud commented Oct 29, 2015

brian-brazil commented Oct 29, 2015

gjcarneiro commented Jun 8, 2016

brian-brazil commented Jun 8, 2016

gjcarneiro commented Jun 8, 2016

brian-brazil commented Jun 8, 2016

gjcarneiro commented Jun 9, 2016

brian-brazil commented Jun 9, 2016

gjcarneiro commented Jun 9, 2016

brian-brazil commented Jun 9, 2016

gjcarneiro commented Jun 9, 2016

oliver006 commented Sep 6, 2016

bendemaree commented Sep 7, 2016

oliver006 commented Sep 7, 2016

rud commented Sep 27, 2016

gjcarneiro commented Sep 28, 2016