New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sharing registry across multiple workers (where possible) #30

Closed
discordianfish opened this Issue Apr 22, 2015 · 33 comments

Comments

Projects
None yet
10 participants
@discordianfish
Member

discordianfish commented Apr 22, 2015

Similar to prometheus/client_ruby#9, when using the python client library on workers that get load balanced by something like gunicorn or uwsgi, each scrape it hits only one worker since they can't share state with others.

At least uwsgi supports sharing memory: http://uwsgi-docs.readthedocs.org/en/latest/SharedArea.html
This should be used to share the registry across all workers. Maybe gunicorn supports something similar.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Apr 22, 2015

Member

Thinking on this, we'd want to implement the sharing mechanism ourselves so it could be used across any system. It will be limited to counter-based metrics.

Member

brian-brazil commented Apr 22, 2015

Thinking on this, we'd want to implement the sharing mechanism ourselves so it could be used across any system. It will be limited to counter-based metrics.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil May 5, 2015

Member

https://github.com/brian-brazil/client_python/tree/multi-process is a first pass at this, which seems to work. It only really does the countery types at the moment, and comes with a lot of caveats.

Member

brian-brazil commented May 5, 2015

https://github.com/brian-brazil/client_python/tree/multi-process is a first pass at this, which seems to work. It only really does the countery types at the moment, and comes with a lot of caveats.

@justyns

This comment has been minimized.

Show comment
Hide comment
@justyns

justyns Jul 30, 2015

Have you come up with a solution to this yet by any chance? We're just starting to look at using prometheus in some of or python apps, and most of them are all using uwsgi with multiple workers/processes.

justyns commented Jul 30, 2015

Have you come up with a solution to this yet by any chance? We're just starting to look at using prometheus in some of or python apps, and most of them are all using uwsgi with multiple workers/processes.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Jul 30, 2015

Member

You can see how this will work in my previous update, that works but needs cleaning up to handle all the edge cases.

Member

brian-brazil commented Jul 30, 2015

You can see how this will work in my previous update, that works but needs cleaning up to handle all the edge cases.

@justyns

This comment has been minimized.

Show comment
Hide comment
@justyns

justyns Jul 30, 2015

I'm still looking through the code to see how everything works, but I haven't been able to get the multi-process branch you linked to to work yet.

I tried something like this:

from prometheus_client import Summary, Counter, Gauge, start_http_server, REGISTRY, CollectorRegistry, MultiProcessCollector
registry = MultiProcessCollector()
# Prometheus stats
http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'], registry=registry)

and get this error:

  File "./app/metrics.py", line 4, in <module>
    http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'], registry=registry)
  File "/usr/lib/python2.7/site-packages/prometheus_client/__init__.py", line 257, in init
    registry.register(collector)
AttributeError: 'MultiProcessCollector' object has no attribute 'register'

I did set the prometheus_multiproc_dir env variable.

Am I attempting to do this correctly by creating a new registry from MultiProcessCollector and passing that to the individual metrics?

Or is MultiProcessCollector supposed to be used automatically if the prometheus_multiproc_dir env variable is set?

I could be doing something else wrong, but I tried this originally:

from prometheus_client import Summary, Counter, Gauge, start_http_server

# Prometheus stats
http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'])
...
start_http_server(9080, '0.0.0.0')

When I tried the above, I saw the *.db shelve files being created, but when I went to port 9080, only the ProcessCollector metrics showed up.

EDIT In case it wasn't obvious, I am using http_requests.labels(request.method, request.path, resp.status_code).inc() in the code as well. I see all of the metrics if I run the flask app in a single process without the multiproc env variable instead of using uwsgi.

justyns commented Jul 30, 2015

I'm still looking through the code to see how everything works, but I haven't been able to get the multi-process branch you linked to to work yet.

I tried something like this:

from prometheus_client import Summary, Counter, Gauge, start_http_server, REGISTRY, CollectorRegistry, MultiProcessCollector
registry = MultiProcessCollector()
# Prometheus stats
http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'], registry=registry)

and get this error:

  File "./app/metrics.py", line 4, in <module>
    http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'], registry=registry)
  File "/usr/lib/python2.7/site-packages/prometheus_client/__init__.py", line 257, in init
    registry.register(collector)
AttributeError: 'MultiProcessCollector' object has no attribute 'register'

I did set the prometheus_multiproc_dir env variable.

Am I attempting to do this correctly by creating a new registry from MultiProcessCollector and passing that to the individual metrics?

Or is MultiProcessCollector supposed to be used automatically if the prometheus_multiproc_dir env variable is set?

I could be doing something else wrong, but I tried this originally:

from prometheus_client import Summary, Counter, Gauge, start_http_server

# Prometheus stats
http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'])
...
start_http_server(9080, '0.0.0.0')

When I tried the above, I saw the *.db shelve files being created, but when I went to port 9080, only the ProcessCollector metrics showed up.

EDIT In case it wasn't obvious, I am using http_requests.labels(request.method, request.path, resp.status_code).inc() in the code as well. I see all of the metrics if I run the flask app in a single process without the multiproc env variable instead of using uwsgi.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Jul 30, 2015

Member

The MultiProcessCollector() is a collector, you need to add it on it's own to a registry and use that to generate the metric text output. MultiProcessCollector() is how you get the metrics out,

Member

brian-brazil commented Jul 30, 2015

The MultiProcessCollector() is a collector, you need to add it on it's own to a registry and use that to generate the metric text output. MultiProcessCollector() is how you get the metrics out,

@justyns

This comment has been minimized.

Show comment
Hide comment
@justyns

justyns Aug 3, 2015

Thanks for the help, I was able to get it working in one app as a test. The only caveat so far is that I had to add a command to clean out $prometheus_multiproc_dir/*.db when the service stops.

justyns commented Aug 3, 2015

Thanks for the help, I was able to get it working in one app as a test. The only caveat so far is that I had to add a command to clean out $prometheus_multiproc_dir/*.db when the service stops.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Aug 3, 2015

Member

Yeah, that's the intended way to use it.

Member

brian-brazil commented Aug 3, 2015

Yeah, that's the intended way to use it.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Oct 26, 2015

Member

I've done some additional work which is up at https://github.com/prometheus/client_python/tree/multiproc

In terms of metrics, everything except gauges are supported.

Member

brian-brazil commented Oct 26, 2015

I've done some additional work which is up at https://github.com/prometheus/client_python/tree/multiproc

In terms of metrics, everything except gauges are supported.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Oct 26, 2015

Member

I've completed gauge support and added docs. Please try out the multiproc branch.

Member

brian-brazil commented Oct 26, 2015

I've completed gauge support and added docs. Please try out the multiproc branch.

@justyns

This comment has been minimized.

Show comment
Hide comment
@justyns

justyns Oct 26, 2015

Thanks for the update @brian-brazil! I'll test it out when I get a chance soon

justyns commented Oct 26, 2015

Thanks for the update @brian-brazil! I'll test it out when I get a chance soon

@rvrignaud

This comment has been minimized.

Show comment
Hide comment
@rvrignaud

rvrignaud Oct 27, 2015

Is this specific to gunicorn ? We are using uwsgi.

rvrignaud commented Oct 27, 2015

Is this specific to gunicorn ? We are using uwsgi.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Oct 27, 2015

Member

It shouldn't be, I've only tested it with gunicorn.

Member

brian-brazil commented Oct 27, 2015

It shouldn't be, I've only tested it with gunicorn.

@grobie

This comment has been minimized.

Show comment
Hide comment
@grobie

grobie Oct 27, 2015

Member

What are the performance implications? Have you done any benchmarks?

On Tue, Oct 27, 2015 at 12:48 PM, Brian Brazil notifications@github.com
wrote:

It shouldn't be, I've only tested it with gunicorn.


Reply to this email directly or view it on GitHub
#30 (comment)
.

Member

grobie commented Oct 27, 2015

What are the performance implications? Have you done any benchmarks?

On Tue, Oct 27, 2015 at 12:48 PM, Brian Brazil notifications@github.com
wrote:

It shouldn't be, I've only tested it with gunicorn.


Reply to this email directly or view it on GitHub
#30 (comment)
.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Oct 27, 2015

Member

I haven't done any benchmarks, there's a fdatasync in there that I need to eliminate though.

Member

brian-brazil commented Oct 27, 2015

I haven't done any benchmarks, there's a fdatasync in there that I need to eliminate though.

@taynaud

This comment has been minimized.

Show comment
Hide comment
@taynaud

taynaud Oct 28, 2015

Hi,

We are trying tu use your multi process implementation with an application in flask behind uwsgi.

As I understand your example for Gunicorn, you define two functions:

  • worker_exit -> perform some cleaning at the end of the process
  • app -> expose the metrics to replace the start_http_server

I have discarded the work_exit for my poc, I use only basic metrics. I have implemented a route in my application to display the metrics:

@main.route('/metrics')
def metrics():
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    data = generate_latest(registry)
    response_headers = [
        ('Content-type', CONTENT_TYPE_LATEST),
        ('Content-Length', str(len(data)))
    ]
    return Response(response=data, status=200, headers=response_headers)

There is an issue in generate_latest. I get exception "_gdbm.error: [Errno 11] Resource temporarily unavailable". This is because the .db files are opened. I put a try/except around that and try to open the db as read only and it presents metric, but only those of the dead process. As long as the db stay open, I get the exception and thus cannot get metrics of current processes.

It cannot work, the next snippet present the same behavior:

>>> import shelve as s
>>> fw = s.open("./test.db")                                                                                                                                                                                                   
>>> fw["test"] = "test"
>>> fw.sync()
>>> fr = s.open("./test.db")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/shelve.py", line 243, in open
    return DbfilenameShelf(filename, flag, protocol, writeback)
  File "/usr/lib/python3.5/shelve.py", line 227, in __init__
    Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
  File "/usr/lib/python3.5/dbm/__init__.py", line 94, in open
    return mod.open(file, flag, mode)
_gdbm.error: [Errno 11] Resource temporarily unavailable
>>> fw.close()
>>> fr = s.open("./test.db")
>>> fr
<shelve.DbfilenameShelf object at 0x7fe65de82d30>

Do you have ideas how to work around that ? Am I doing something wrong ?

Best,

taynaud commented Oct 28, 2015

Hi,

We are trying tu use your multi process implementation with an application in flask behind uwsgi.

As I understand your example for Gunicorn, you define two functions:

  • worker_exit -> perform some cleaning at the end of the process
  • app -> expose the metrics to replace the start_http_server

I have discarded the work_exit for my poc, I use only basic metrics. I have implemented a route in my application to display the metrics:

@main.route('/metrics')
def metrics():
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    data = generate_latest(registry)
    response_headers = [
        ('Content-type', CONTENT_TYPE_LATEST),
        ('Content-Length', str(len(data)))
    ]
    return Response(response=data, status=200, headers=response_headers)

There is an issue in generate_latest. I get exception "_gdbm.error: [Errno 11] Resource temporarily unavailable". This is because the .db files are opened. I put a try/except around that and try to open the db as read only and it presents metric, but only those of the dead process. As long as the db stay open, I get the exception and thus cannot get metrics of current processes.

It cannot work, the next snippet present the same behavior:

>>> import shelve as s
>>> fw = s.open("./test.db")                                                                                                                                                                                                   
>>> fw["test"] = "test"
>>> fw.sync()
>>> fr = s.open("./test.db")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/shelve.py", line 243, in open
    return DbfilenameShelf(filename, flag, protocol, writeback)
  File "/usr/lib/python3.5/shelve.py", line 227, in __init__
    Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
  File "/usr/lib/python3.5/dbm/__init__.py", line 94, in open
    return mod.open(file, flag, mode)
_gdbm.error: [Errno 11] Resource temporarily unavailable
>>> fw.close()
>>> fr = s.open("./test.db")
>>> fr
<shelve.DbfilenameShelf object at 0x7fe65de82d30>

Do you have ideas how to work around that ? Am I doing something wrong ?

Best,

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Oct 28, 2015

Member

It looks like you're using Python 3.5 and gdbm, while I tested on Python 2.7 using hashdb. It looks like this change will need some work to also work on Python 3.5.

Member

brian-brazil commented Oct 28, 2015

It looks like you're using Python 3.5 and gdbm, while I tested on Python 2.7 using hashdb. It looks like this change will need some work to also work on Python 3.5.

@taynaud

This comment has been minimized.

Show comment
Hide comment
@taynaud

taynaud Oct 29, 2015

My last snippet works indeed with python 2.7 but not with python 3.4.3.

The documentation of shelve, in python 2 and 3 states that:
"The shelve module does not support concurrent read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing. Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used."

Thus I think you should use another DB.

I have also try to replace the default shelve with https://www.jcea.es/programacion/pybsddb.htm
Basically, I have replaced
import shelve
with
import bsddb3.dbshelve as shelve

after a pip install bsddb3 since it is not in python library anymore.
I also have to encode/decode the keys from str to bytes.

It seems to work, but I do not know if the documentation statement about concurrency holds.

I may provide a patch if required.

taynaud commented Oct 29, 2015

My last snippet works indeed with python 2.7 but not with python 3.4.3.

The documentation of shelve, in python 2 and 3 states that:
"The shelve module does not support concurrent read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing. Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used."

Thus I think you should use another DB.

I have also try to replace the default shelve with https://www.jcea.es/programacion/pybsddb.htm
Basically, I have replaced
import shelve
with
import bsddb3.dbshelve as shelve

after a pip install bsddb3 since it is not in python library anymore.
I also have to encode/decode the keys from str to bytes.

It seems to work, but I do not know if the documentation statement about concurrency holds.

I may provide a patch if required.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Oct 29, 2015

Member

That bit of documentation only applied to shelve, I was thinking we could use bsddb directly which also will give us a bit more control performance wise.

Member

brian-brazil commented Oct 29, 2015

That bit of documentation only applied to shelve, I was thinking we could use bsddb directly which also will give us a bit more control performance wise.

@bendemaree bendemaree referenced this issue Jan 3, 2016

Closed

WIP: uWSGI Multiprocess Support #70

0 of 4 tasks complete
@gjcarneiro

This comment has been minimized.

Show comment
Hide comment
@gjcarneiro

gjcarneiro Jun 8, 2016

Excuse me, I am just looking at Prometheus for the first time, but my first impression is that I think this whole "sharing" design you guys are attempting is just over-engineered.

In my opinion, there should be no sharing. Instead, each gunicorn (or whatever) worker should start listening on a random port, and the prometheus server would (somehow) discover those ports and start scraping individual workers. Aggregation would then be done at a higher level.

I wish the client libraries would be allowed to register themselves with a prometheus server, to tell the server that they are alive and what is the port they can be scraped on. Hm, maybe that is what a service like Consul is all about, but I haven't looked in detail yet...

gjcarneiro commented Jun 8, 2016

Excuse me, I am just looking at Prometheus for the first time, but my first impression is that I think this whole "sharing" design you guys are attempting is just over-engineered.

In my opinion, there should be no sharing. Instead, each gunicorn (or whatever) worker should start listening on a random port, and the prometheus server would (somehow) discover those ports and start scraping individual workers. Aggregation would then be done at a higher level.

I wish the client libraries would be allowed to register themselves with a prometheus server, to tell the server that they are alive and what is the port they can be scraped on. Hm, maybe that is what a service like Consul is all about, but I haven't looked in detail yet...

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Jun 8, 2016

Member

That's how Prometheus generally works, but isn't compatible with gunicorn designs which tend to bring processes up/down too often for that to work well.

Member

brian-brazil commented Jun 8, 2016

That's how Prometheus generally works, but isn't compatible with gunicorn designs which tend to bring processes up/down too often for that to work well.

@gjcarneiro

This comment has been minimized.

Show comment
Hide comment
@gjcarneiro

gjcarneiro Jun 8, 2016

If Prometheus isn't compatible with this, then I think I'm better off with statsd.

But I'm not convinced! Gunicorn doesn't bring processes up and down too often. Worker processes tend to remain up for very long time.

I really think this (using Consul for discovering worker processes) has a fair chance of working, and I'm willing to invest some time to see if I can really make it work well.

gjcarneiro commented Jun 8, 2016

If Prometheus isn't compatible with this, then I think I'm better off with statsd.

But I'm not convinced! Gunicorn doesn't bring processes up and down too often. Worker processes tend to remain up for very long time.

I really think this (using Consul for discovering worker processes) has a fair chance of working, and I'm willing to invest some time to see if I can really make it work well.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Jun 8, 2016

Member

Worker processes tend to remain up for very long time.

From my research support for dynamically changing the number of processes such as in response to load is not uncommon. There's also the max_requests setting.

The approach we use here will be applied to other languages such as Ruby.

Member

brian-brazil commented Jun 8, 2016

Worker processes tend to remain up for very long time.

From my research support for dynamically changing the number of processes such as in response to load is not uncommon. There's also the max_requests setting.

The approach we use here will be applied to other languages such as Ruby.

@gjcarneiro

This comment has been minimized.

Show comment
Hide comment
@gjcarneiro

gjcarneiro Jun 9, 2016

From the Prometheus docs:

CAUTION: Remember that every unique key-value label pair represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

So maybe you have a point. My approach of having per process metrics would require to have the pid number or some such as a label, which seems to be a against the recommendation above.

OTOH, processes sharing data via a db file is a terrible solution, as well as any WSGI-specific solution (we have not only gunicorn workers, but also celery workers, and even simple multiprocessing module subprocesses).

Unfortunately I can't think of any good, efficient, and simple solution. Here's an idea I just had, but take it more as a thought process than a finished idea:

  • Each worker tries to bind its well known fixed http port;
  • If it successfully binds, he becomes the representative of all the workers and will be scraped by Prometheus;
  • If it cannot bind that port, it means another worker has become representative first, so, instead, it will bind a random port, and then register with the representative worker telling it "hey, I am another sibling worker, and here's my port";
  • Whenever Prometheus sends an HTTP request to scrape the main worker, it knows how many "siblings" it has and so, scrapes each sibling as well. It aggregates the siblings' metrics with its own metrics, before returning them to Prom.
  • To allow for the representative worker to die and be replaced, as well as additional workers dying, each sibling worker will attempt to become main worker periodically: bind the fixed port and becomes main worker, or else register itself with the main worker again. The main worker removes from its list of siblings the other workers that have not registered with it in a long time.

I think this solution scales pretty well, since aggregation of data is done only at each scrape interval, not for every sample. The downside is that it is a bit complicated. :(

gjcarneiro commented Jun 9, 2016

From the Prometheus docs:

CAUTION: Remember that every unique key-value label pair represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

So maybe you have a point. My approach of having per process metrics would require to have the pid number or some such as a label, which seems to be a against the recommendation above.

OTOH, processes sharing data via a db file is a terrible solution, as well as any WSGI-specific solution (we have not only gunicorn workers, but also celery workers, and even simple multiprocessing module subprocesses).

Unfortunately I can't think of any good, efficient, and simple solution. Here's an idea I just had, but take it more as a thought process than a finished idea:

  • Each worker tries to bind its well known fixed http port;
  • If it successfully binds, he becomes the representative of all the workers and will be scraped by Prometheus;
  • If it cannot bind that port, it means another worker has become representative first, so, instead, it will bind a random port, and then register with the representative worker telling it "hey, I am another sibling worker, and here's my port";
  • Whenever Prometheus sends an HTTP request to scrape the main worker, it knows how many "siblings" it has and so, scrapes each sibling as well. It aggregates the siblings' metrics with its own metrics, before returning them to Prom.
  • To allow for the representative worker to die and be replaced, as well as additional workers dying, each sibling worker will attempt to become main worker periodically: bind the fixed port and becomes main worker, or else register itself with the main worker again. The main worker removes from its list of siblings the other workers that have not registered with it in a long time.

I think this solution scales pretty well, since aggregation of data is done only at each scrape interval, not for every sample. The downside is that it is a bit complicated. :(

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Jun 9, 2016

Member

It aggregates the siblings' metrics with its own metrics, before returning them to Prom.

This is where things fall down. How do you aggregate data from processes that no longer exist?

A file per process avoids this problem, as the files survive the process.

Member

brian-brazil commented Jun 9, 2016

It aggregates the siblings' metrics with its own metrics, before returning them to Prom.

This is where things fall down. How do you aggregate data from processes that no longer exist?

A file per process avoids this problem, as the files survive the process.

@gjcarneiro

This comment has been minimized.

Show comment
Hide comment
@gjcarneiro

gjcarneiro Jun 9, 2016

Well, if a process dies, the http socket to it will fail with connection refused and we ignore it. We will not aggregate data from it for the last scrape interval. So you lose a few seconds of the metric data of that worker, in this case, not a big deal. Again, I am not convinced worker processes start and die very often, they tend to remain rather stable.

gjcarneiro commented Jun 9, 2016

Well, if a process dies, the http socket to it will fail with connection refused and we ignore it. We will not aggregate data from it for the last scrape interval. So you lose a few seconds of the metric data of that worker, in this case, not a big deal. Again, I am not convinced worker processes start and die very often, they tend to remain rather stable.

@brian-brazil

This comment has been minimized.

Show comment
Hide comment
@brian-brazil

brian-brazil Jun 9, 2016

Member

We will not aggregate data from it for the last scrape interval.

That would break all counters, you need that data.

Member

brian-brazil commented Jun 9, 2016

We will not aggregate data from it for the last scrape interval.

That would break all counters, you need that data.

@gjcarneiro

This comment has been minimized.

Show comment
Hide comment
@gjcarneiro

gjcarneiro Jun 9, 2016

So what if in an web app you don't count the last few http requests that come through one of the workers; it's not a big deal if it happens rarely. It's not like Prometheus will be used for banking transactions or anything... If it is still a problem, an atexit handler that would proactively send the last data to the elected worker could solve it. It's still better than saving a DB file every time you update a metric, that is terrible for performance...

Anyway, I should probably stop spamming this thread, I don't want to be annoying. A poor solution is better than no solution at all. If I find time to prototype something, we can reopen the discussion. It's usually easy to think of solutions that often break down when you try to code them, I know...

gjcarneiro commented Jun 9, 2016

So what if in an web app you don't count the last few http requests that come through one of the workers; it's not a big deal if it happens rarely. It's not like Prometheus will be used for banking transactions or anything... If it is still a problem, an atexit handler that would proactively send the last data to the elected worker could solve it. It's still better than saving a DB file every time you update a metric, that is terrible for performance...

Anyway, I should probably stop spamming this thread, I don't want to be annoying. A poor solution is better than no solution at all. If I find time to prototype something, we can reopen the discussion. It's usually easy to think of solutions that often break down when you try to code them, I know...

@oliver006

This comment has been minimized.

Show comment
Hide comment
@oliver006

oliver006 Sep 6, 2016

Any updates or solutions to this problem?
I'm running into this issue when trying to add metrics to a flask/uwsgi python 2.7 app.
Has anyone tried using e.g. the statsd exporter as a work-around?

oliver006 commented Sep 6, 2016

Any updates or solutions to this problem?
I'm running into this issue when trying to add metrics to a flask/uwsgi python 2.7 app.
Has anyone tried using e.g. the statsd exporter as a work-around?

@bendemaree

This comment has been minimized.

Show comment
Hide comment
@bendemaree

bendemaree Sep 7, 2016

@oliver006 After abandoning #70 moving to the statsd exporter is exactly what we did; works just fine!

bendemaree commented Sep 7, 2016

@oliver006 After abandoning #70 moving to the statsd exporter is exactly what we did; works just fine!

@oliver006

This comment has been minimized.

Show comment
Hide comment
@oliver006

oliver006 Sep 7, 2016

Thanks Ben. I went the statsd_exporter route, deploying one locally with each of my python apps and so far that's been working well.

oliver006 commented Sep 7, 2016

Thanks Ben. I went the statsd_exporter route, deploying one locally with each of my python apps and so far that's been working well.

@rud

This comment has been minimized.

Show comment
Hide comment
@rud

rud Sep 27, 2016

https://github.com/korfuri/django-prometheus/blob/master/documentation/exports.md#exporting-metrics-in-a-wsgi-application-with-multiple-processes is a design for letting each worker-process listen on a distinct prometheus port. Does assume the collector will scrape all the allowed reporting ports to catch 'em all.

rud commented Sep 27, 2016

https://github.com/korfuri/django-prometheus/blob/master/documentation/exports.md#exporting-metrics-in-a-wsgi-application-with-multiple-processes is a design for letting each worker-process listen on a distinct prometheus port. Does assume the collector will scrape all the allowed reporting ports to catch 'em all.

@gjcarneiro

This comment has been minimized.

Show comment
Hide comment
@gjcarneiro

gjcarneiro Sep 28, 2016

Cool, I quite like the django-prometheus solution!

gjcarneiro commented Sep 28, 2016

Cool, I quite like the django-prometheus solution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment