wsgi daemon mode #5

thatcher · 2018-04-06T20:38:02Z

I run my stateless flask apps with mod_wsgi/apache using daemon mode like:

WSGIDaemonProcess foo-services python-home=/opt/my_org/foo-services/_env processes=8 threads=48 maximum-requests=10000 display-name=%{GROUP}
WSGIApplicationGroup %{GLOBAL}
WSGISocketPrefix /var/run/wsgi


Alias /image-services "/opt/my_org/foo-services/wsgi.py"
<Location "/for-services">
SetHandler wsgi-script
Options +ExecCGI
FileETag None
ExpiresActive On
ExpiresDefault "access plus 1 year"
WSGIProcessGroup image-services
</Location>

Which means when a request gets to the service it could be hitting 1 of 8 daemon processes each of which have their own memory in isolation of the others. Does the metrics endpoint store the prometheus data in a way that is shared across these daemons?

I can create some tests to verify if thats the case or not, just curious if the answer is already known.

Thanks,
Thatcher

The text was updated successfully, but these errors were encountered:

rycus86 · 2018-04-06T20:42:38Z

Hi Thatcher,

I think I've seen both thread and process based aggregation in the official Prometheus Python library that's used here, so I think it might work, but I'll have to check.
If you beat me to finding the answer, it would be great if you could share it on this issue! :)

Thanks,
Viktor

thatcher · 2018-04-06T22:46:43Z

I can definitely verify i can start my micro-service, run a handful of request, and when i continue to refresh my metrics endpoints i get different responses from each daemon, which makes sense.

It looks like promethues_client added support for a multi process collectors as in the examples provided here:

prometheus/client_python#122

I think we could patch this by looking for if os.environ.get('prometheus_multiproc_dir') at configuration time and instead of using the default registry do something like:

from prometheus_client import multiprocess
from prometheus_client import generate_latest, CollectorRegistry

...
registry = DEFAULT_REGISTRY
if os.environ.get('prometheus_multiproc_dir'):
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)

...

@app.route('/metrics')
def metrics():
    data = generate_latest(registry)
    return data

It looks like I can test this out just by passing my own registry to PrometheusMetrics. Ill let you know.

thatcher · 2018-04-06T23:32:38Z

Yup it worked! heres the gist:

wsgy.py

import os
...
os.environ["prometheus_multiproc_dir"] = "/tmp/my_app.stats"
...
from my_app import app as application

in my flask app:

import os
...
from prometheus_client import multiprocess
from prometheus_client import CollectorRegistry
from prometheus_flask_exporter import PrometheusMetrics
from prometheus_flask_exporter import DEFAULT_REGISTRY
...
app = Flask(__name__)

registry = DEFAULT_REGISTRY
if os.environ.get('prometheus_multiproc_dir'):
    stats_dir = os.environ.get('prometheus_multiproc_dir')
    if not os.path.exists(stats_dir):
        os.makedirs(stats_dir)
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)

metrics = PrometheusMetrics(app, registry=registry)

Thats it! You could add this as an introspected feature or just document the recipe for others.

Thanks rycus86!

Thatcher

rycus86 · 2018-04-07T06:22:52Z

Awesome, thanks a lot!
I'll give it a go to see if I can add this as a configurable feature.
If that fails, I'll add this info to README or some docs.
Very useful to know, thanks for the investigation!

snegivulcan · 2018-05-03T22:21:58Z

@rycus86 & @thatcher

I was playing with this library and even with a single instance, when i hit the /metrics endpoint i get 3 different results.

I am following the steps outlined in the conversation here.

registry = CollectorRegistry()
if not os.path.exists(config.PROMETHEUS_MULTIPROC_DIR):
    os.makedirs(config.PROMETHEUS_MULTIPROC_DIR)
multiprocess.MultiProcessCollector(registry, path=config.PROMETHEUS_MULTIPROC_DIR)
metrics = PrometheusMetrics(app, registry=registry)

@app.route('/metrics')
def metrics():
    data = generate_latest(registry)
    return data

So why "3" different responses ? And how to resolve the issue ? I am using 0.2.2 version of prometheus_flask_exporter

thatcher · 2018-05-03T23:12:25Z

@snegivulcan I discovered the same issue and realized the issue is related to some of the code in the prometheus_flask_exporter and I didnt have time to dig into it. I'd like to eventually figure out how to get it to play nicely with prometheus_flask_exporter. I can confirm that the issue does not occur if I use the prometheus client directly. Here is how i am using it for now in production:

from flask import Flask
from flask import request
from prometheus_client import CONTENT_TYPE_LATEST
from prometheus_client import CollectorRegistry
from prometheus_client import Counter
from prometheus_client import Histogram
from prometheus_client import generate_latest
from prometheus_client import multiprocess

app = Flask(__name__)

stats_dir = os.environ.get('prometheus_multiproc_dir')
if not os.path.exists(stats_dir):
    os.makedirs(stats_dir)
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)
histogram = Histogram(
    'flask_http_request_duration_seconds',
    'Flask HTTP request duration in seconds',
    ('method', 'endpoint', 'status'),
)
counter = Counter(
    'flask_http_request_total',
    'Total number of HTTP requests',
    ('method', 'status')
)

def before_request():
    request.start_time = default_timer()


def after_request(response):
    total_time = max(default_timer() - request.start_time, 0)
    histogram.labels(
        request.method,
        request.endpoint,
        response.status_code
    ).observe(total_time)

    counter.labels(request.method, response.status_code).inc()

    return response


app.before_request(before_request)
app.after_request(after_request)


@app.route('/metrics')
def metrics():
    headers = {'Content-Type': CONTENT_TYPE_LATEST}
    return generate_latest(registry), 200, headers

rycus86 · 2018-05-04T19:55:07Z

Hi @snegivulcan and @thatcher ,

Have you had a look at the small example in https://github.com/rycus86/prometheus_flask_exporter/tree/master/examples/wsgi ?
I don't have much experience with running wsgi apps, but the results seem to have indicated it's working as intented.
Please let me know if that's not the case and I'll try to have another look.

Thanks!

jpds · 2018-05-17T15:19:20Z

I'm using the same code as @snegivulcan from his Flask example and I noticed from the comment headers that what's being exposed relates to one of the workers and the other belongs to the multiprocess metrics:

# HELP flask_http_request_duration_seconds Flask HTTP request duration in seconds
# TYPE flask_http_request_duration_seconds histogram
flask_http_request_duration_seconds_bucket{le="0.005",method="GET",path="/_healthcheck/",status="200"} 17.0
flask_http_request_duration_seconds_bucket{le="0.01",method="GET",path="/_healthcheck/",status="200"} 17.0
flask_http_request_duration_seconds_bucket{le="0.025",method="GET",path="/_healthcheck/",status="200"} 18.0
flask_http_request_duration_seconds_bucket{le="0.05",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="0.075",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="0.1",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="0.25",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="0.5",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="0.75",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="1.0",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="2.5",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="5.0",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="7.5",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="10.0",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="+Inf",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_count{method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_sum{method="GET",path="/_healthcheck/",status="200"} 0.07988429069519043
# HELP flask_http_request_duration_seconds Multiprocess metric
# TYPE flask_http_request_duration_seconds histogram
flask_http_request_duration_seconds_bucket{le="0.01",method="GET",path="/_healthcheck/",status="200"} 67.0
flask_http_request_duration_seconds_bucket{le="0.1",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="5.0",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_count{method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="0.075",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="1.0",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="0.5",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_sum{method="GET",path="/_healthcheck/",status="200"} 0.375255823135376
flask_http_request_duration_seconds_bucket{le="0.25",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="0.75",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="+Inf",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="2.5",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="0.05",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="7.5",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="10.0",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="0.005",method="GET",path="/_healthcheck/",status="200"} 63.0
flask_http_request_duration_seconds_bucket{le="0.025",method="GET",path="/_healthcheck/",status="200"} 76.0

jpds · 2018-05-17T15:33:55Z

I found a fix from:

https://github.com/jonashaag/prometheus-multiprocessing-example/blob/master/yourapp.py

Where I define the following as a /metrics endpoint:

    @app.route('/metrics')
    def metrics():
        registry = CollectorRegistry()
        multiprocess.MultiProcessCollector(registry)
        data = generate_latest(registry)
        return Response(data, mimetype=CONTENT_TYPE_LATEST)

rycus86 · 2018-05-17T20:39:12Z

Hi @jpds,

Thanks for sharing your findings! The project you mentioned also links the multiprocessing section of the official Prometheus Python client library: https://github.com/prometheus/client_python#multiprocess-mode-gunicorn

I'll try to have a look if this could be supported somehow better with this library.

Thanks!

rycus86 · 2018-08-25T11:39:16Z

Hi,

There's a new release with some multiprocessing changes from @elephantum - https://pypi.org/project/prometheus-flask-exporter/0.2.3/

It would be great if you could give it a look if the previous problem still happens using this version.

Thanks!

rycus86 · 2018-12-17T11:50:59Z

I've added some more multiprocessing changes to version 0.5.0, mainly targeted for Gunicorn and uWSGI, but should work in a generic way - see the README for more info.

float34 · 2019-09-09T16:39:09Z

@rycus86
Doesn't seem to work for me with uwsgi and latest versions of prometheus_client/prometheus_flask_exporter (0.7.1/0.9.1).
I just don't get any metrics when I try to implement it as

registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry, path='/tmp')

metrics = PrometheusMetrics(app, registry=registry)

But I do get them when I don't try to use multiprocess module (albeit with different values for each process of course)

rycus86 · 2019-09-09T21:36:18Z

Hi @Torquerrr
Have a look at https://github.com/rycus86/prometheus_flask_exporter/blob/master/README.md - there are some additional support classes for uwsgi integration with this library.
See also https://github.com/rycus86/prometheus_flask_exporter/blob/master/examples/uwsgi/server.py for a small example with uwsgi.

Let me know if you still think it's not working, things may have changed since the last release.

float34 · 2019-09-10T13:30:18Z

@rycus86 I was able to narrow the problem down to registry = CollectorRegistry(). When I use this class I don't have any metrics. But if I use simply REGISTRY from prometheus_client library, I have metrics available (but with values from different processes). Also, the .db file for Counter/Gauge (my additional metrics) is not created in /tmp for some reason, and permissions seem fine.
Probably it's my specific problem.

rycus86 · 2019-09-10T13:38:05Z

Have you tried the multiprocess support classes from this library?

from prometheus_flask_exporter.multiprocess import UWsgiPrometheusMetrics

That should take care of setting up the multiprocessing-ready registries and such, you can see it in the example I linked above.

float34 · 2019-09-10T15:32:26Z

@rycus86 Yes, now I am running it on the same host as main Flask app (host explicitly specified) and I get

root@xxxxxxxxxxx:/app# curl -X GET http://localhost:9100/metrics
curl: (7) Failed to connect to localhost port 9100: Connection refused

rycus86 · 2019-09-10T20:17:25Z

@Torquerrr see the example I linked above, the uwsgi metrics object needs an explicit call to enable the metrics endpoint on a port:

metrics = UWsgiPrometheusMetrics(app)
metrics.start_http_server(9100)

float34 · 2019-09-11T08:15:55Z

@rycus86 I've added that call to start_http_server, the only difference is that my Flask app is initialized inside another function in the same module (sort of lazy loading). So I call that start_http_server before app.run()

rycus86 · 2019-09-11T08:17:48Z

Something like https://github.com/rycus86/prometheus_flask_exporter/blob/master/examples/uwsgi-lazy-apps/server.py perhaps?

float34 · 2019-09-11T08:20:16Z

@rycus86 Does this example imply that I also need to set my uwsgi config to lazy-apps = true ?

rycus86 · 2019-09-11T08:21:57Z

I tested it like that, not a 100% sure if it's required or not.

rycus86 · 2019-09-11T08:23:01Z

Also, I wanted to say, if you can provide a stripped down example, I'm happy to try and have a look if the library needs changing perhaps?

float34 · 2019-09-11T08:23:27Z

@rycus86 Thank you for your help, I will try this approach and return with results :)
Yes, I will try to prepare such example. I want to reproduce that locally because I see this issue on Kubernetes pod only (

float34 · 2019-09-11T08:59:37Z

@rycus86 By the way - I can't find examples of how to add custom metrics like Counter/Gauge, are there any?
What I do is simply declaring them in the flask app module scope, and then increase/set their value.
And it sort of works, i.e. metrics are produced and exported (when testing locally of course), but I don't think I understand how those Counter/Gauge should be connected with multiprocess.MultiProcessCollector or UWsgiPrometheusMetrics, the connection seems not clear to me :-)

rycus86 · 2019-09-11T21:38:36Z

You can decorate your functions with the metrics helper, as shown in the readme: https://github.com/rycus86/prometheus_flask_exporter/blob/master/README.md

@app.route('/something')
@metrics.gauge('in_progress', 'Something in progress')
def some_handler():
    pass

rycus86 · 2019-09-11T21:40:20Z

I don't think I understand how those Counter/Gauge should be connected with multiprocess.MultiProcessCollector or UWsgiPrometheusMetrics, the connection seems not clear to me :-)

The UWsgiPrometheusMetrics and other multiprocess classes in this library are meant to handle the multiprocessing related boilerplate and setup, so you shouldn't need to use multiprocess.MultiProcessCollector and similar directly.
The examples folder in this repo has some samples on how to use them with different multiprocessing-capable systems, like uwsgi.

float34 · 2019-09-12T10:10:57Z

@rycus86
Turns out I had to specify prometheus_multiproc_dir: /tmp in Kubernetes Pod's spec.
After I did that everything works as expected, .db files are created in the expected location and metrics from multiprocess seem to be grouped by process_id.
Thank you so much for your help!

rycus86 · 2019-09-12T10:45:41Z

Oh, so sorry I forgot to mention that! :/
Yes, you do need that, maybe I should make that fail more aggressively, currently it's only checked in a few places, and obviously your code path didn't run into it.
https://github.com/rycus86/prometheus_flask_exporter/blob/master/prometheus_flask_exporter/multiprocess.py#L12

Glad you managed to work it out! 👍

float34 · 2019-09-12T10:52:39Z

@rycus86 In fact I was using multiprocess.MultiProcessCollector class, and it checks for that path differently. So when I simply init the class with path='/tmp' that other check passes - but the env var is still missing and .db files with metrics are not created :)

thatcher closed this as completed Apr 6, 2018

rycus86 reopened this May 4, 2018

rycus86 mentioned this issue Aug 24, 2018

Detect multiprocess environment and use MultiProcessCollector #7

Merged

rycus86 closed this as completed Dec 17, 2018

santi-eidu mentioned this issue Sep 28, 2022

Metrics in wsgi mode idealista/prom2teams#309

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wsgi daemon mode #5

wsgi daemon mode #5

thatcher commented Apr 6, 2018

rycus86 commented Apr 6, 2018

thatcher commented Apr 6, 2018 •

edited

thatcher commented Apr 6, 2018 •

edited

rycus86 commented Apr 7, 2018

snegivulcan commented May 3, 2018

thatcher commented May 3, 2018

rycus86 commented May 4, 2018

jpds commented May 17, 2018 •

edited

jpds commented May 17, 2018

rycus86 commented May 17, 2018

rycus86 commented Aug 25, 2018

rycus86 commented Dec 17, 2018

float34 commented Sep 9, 2019 •

edited

rycus86 commented Sep 9, 2019

float34 commented Sep 10, 2019 •

edited

rycus86 commented Sep 10, 2019

float34 commented Sep 10, 2019

rycus86 commented Sep 10, 2019

float34 commented Sep 11, 2019

rycus86 commented Sep 11, 2019

float34 commented Sep 11, 2019 •

edited

rycus86 commented Sep 11, 2019

rycus86 commented Sep 11, 2019

float34 commented Sep 11, 2019 •

edited

float34 commented Sep 11, 2019 •

edited

rycus86 commented Sep 11, 2019

rycus86 commented Sep 11, 2019 •

edited

float34 commented Sep 12, 2019

rycus86 commented Sep 12, 2019

float34 commented Sep 12, 2019

wsgi daemon mode #5

wsgi daemon mode #5

Comments

thatcher commented Apr 6, 2018

rycus86 commented Apr 6, 2018

thatcher commented Apr 6, 2018 • edited

thatcher commented Apr 6, 2018 • edited

rycus86 commented Apr 7, 2018

snegivulcan commented May 3, 2018

thatcher commented May 3, 2018

rycus86 commented May 4, 2018

jpds commented May 17, 2018 • edited

jpds commented May 17, 2018

rycus86 commented May 17, 2018

rycus86 commented Aug 25, 2018

rycus86 commented Dec 17, 2018

float34 commented Sep 9, 2019 • edited

rycus86 commented Sep 9, 2019

float34 commented Sep 10, 2019 • edited

rycus86 commented Sep 10, 2019

float34 commented Sep 10, 2019

rycus86 commented Sep 10, 2019

float34 commented Sep 11, 2019

rycus86 commented Sep 11, 2019

float34 commented Sep 11, 2019 • edited

rycus86 commented Sep 11, 2019

rycus86 commented Sep 11, 2019

float34 commented Sep 11, 2019 • edited

float34 commented Sep 11, 2019 • edited

rycus86 commented Sep 11, 2019

rycus86 commented Sep 11, 2019 • edited

float34 commented Sep 12, 2019

rycus86 commented Sep 12, 2019

float34 commented Sep 12, 2019

thatcher commented Apr 6, 2018 •

edited

thatcher commented Apr 6, 2018 •

edited

jpds commented May 17, 2018 •

edited

float34 commented Sep 9, 2019 •

edited

float34 commented Sep 10, 2019 •

edited

float34 commented Sep 11, 2019 •

edited

float34 commented Sep 11, 2019 •

edited

float34 commented Sep 11, 2019 •

edited

rycus86 commented Sep 11, 2019 •

edited