Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wsgi daemon mode #5

Closed
thatcher opened this issue Apr 6, 2018 · 30 comments
Closed

wsgi daemon mode #5

thatcher opened this issue Apr 6, 2018 · 30 comments

Comments

@thatcher
Copy link

thatcher commented Apr 6, 2018

I run my stateless flask apps with mod_wsgi/apache using daemon mode like:

WSGIDaemonProcess foo-services python-home=/opt/my_org/foo-services/_env processes=8 threads=48 maximum-requests=10000 display-name=%{GROUP}
WSGIApplicationGroup %{GLOBAL}
WSGISocketPrefix /var/run/wsgi


Alias /image-services "/opt/my_org/foo-services/wsgi.py"
<Location "/for-services">
SetHandler wsgi-script
Options +ExecCGI
FileETag None
ExpiresActive On
ExpiresDefault "access plus 1 year"
WSGIProcessGroup image-services
</Location>

Which means when a request gets to the service it could be hitting 1 of 8 daemon processes each of which have their own memory in isolation of the others. Does the metrics endpoint store the prometheus data in a way that is shared across these daemons?

I can create some tests to verify if thats the case or not, just curious if the answer is already known.

Thanks,
Thatcher

@rycus86
Copy link
Owner

rycus86 commented Apr 6, 2018

Hi Thatcher,

I think I've seen both thread and process based aggregation in the official Prometheus Python library that's used here, so I think it might work, but I'll have to check.
If you beat me to finding the answer, it would be great if you could share it on this issue! :)

Thanks,
Viktor

@thatcher
Copy link
Author

thatcher commented Apr 6, 2018

I can definitely verify i can start my micro-service, run a handful of request, and when i continue to refresh my metrics endpoints i get different responses from each daemon, which makes sense.

It looks like promethues_client added support for a multi process collectors as in the examples provided here:

prometheus/client_python#122

I think we could patch this by looking for if os.environ.get('prometheus_multiproc_dir') at configuration time and instead of using the default registry do something like:

from prometheus_client import multiprocess
from prometheus_client import generate_latest, CollectorRegistry

...
registry = DEFAULT_REGISTRY
if os.environ.get('prometheus_multiproc_dir'):
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)

...

@app.route('/metrics')
def metrics():
    data = generate_latest(registry)
    return data

It looks like I can test this out just by passing my own registry to PrometheusMetrics. Ill let you know.

@thatcher
Copy link
Author

thatcher commented Apr 6, 2018

Yup it worked! heres the gist:

wsgy.py

import os
...
os.environ["prometheus_multiproc_dir"] = "/tmp/my_app.stats"
...
from my_app import app as application

in my flask app:

import os
...
from prometheus_client import multiprocess
from prometheus_client import CollectorRegistry
from prometheus_flask_exporter import PrometheusMetrics
from prometheus_flask_exporter import DEFAULT_REGISTRY
...
app = Flask(__name__)

registry = DEFAULT_REGISTRY
if os.environ.get('prometheus_multiproc_dir'):
    stats_dir = os.environ.get('prometheus_multiproc_dir')
    if not os.path.exists(stats_dir):
        os.makedirs(stats_dir)
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)

metrics = PrometheusMetrics(app, registry=registry)

Thats it! You could add this as an introspected feature or just document the recipe for others.

Thanks rycus86!

Thatcher

@thatcher thatcher closed this as completed Apr 6, 2018
@rycus86
Copy link
Owner

rycus86 commented Apr 7, 2018

Awesome, thanks a lot!
I'll give it a go to see if I can add this as a configurable feature.
If that fails, I'll add this info to README or some docs.
Very useful to know, thanks for the investigation!

@snegivulcan
Copy link

@rycus86 & @thatcher

I was playing with this library and even with a single instance, when i hit the /metrics endpoint i get 3 different results.

I am following the steps outlined in the conversation here.

registry = CollectorRegistry()
if not os.path.exists(config.PROMETHEUS_MULTIPROC_DIR):
    os.makedirs(config.PROMETHEUS_MULTIPROC_DIR)
multiprocess.MultiProcessCollector(registry, path=config.PROMETHEUS_MULTIPROC_DIR)
metrics = PrometheusMetrics(app, registry=registry)

@app.route('/metrics')
def metrics():
    data = generate_latest(registry)
    return data

So why "3" different responses ? And how to resolve the issue ? I am using 0.2.2 version of prometheus_flask_exporter

@thatcher
Copy link
Author

thatcher commented May 3, 2018

@snegivulcan I discovered the same issue and realized the issue is related to some of the code in the prometheus_flask_exporter and I didnt have time to dig into it. I'd like to eventually figure out how to get it to play nicely with prometheus_flask_exporter. I can confirm that the issue does not occur if I use the prometheus client directly. Here is how i am using it for now in production:

from flask import Flask
from flask import request
from prometheus_client import CONTENT_TYPE_LATEST
from prometheus_client import CollectorRegistry
from prometheus_client import Counter
from prometheus_client import Histogram
from prometheus_client import generate_latest
from prometheus_client import multiprocess

app = Flask(__name__)

stats_dir = os.environ.get('prometheus_multiproc_dir')
if not os.path.exists(stats_dir):
    os.makedirs(stats_dir)
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)
histogram = Histogram(
    'flask_http_request_duration_seconds',
    'Flask HTTP request duration in seconds',
    ('method', 'endpoint', 'status'),
)
counter = Counter(
    'flask_http_request_total',
    'Total number of HTTP requests',
    ('method', 'status')
)

def before_request():
    request.start_time = default_timer()


def after_request(response):
    total_time = max(default_timer() - request.start_time, 0)
    histogram.labels(
        request.method,
        request.endpoint,
        response.status_code
    ).observe(total_time)

    counter.labels(request.method, response.status_code).inc()

    return response


app.before_request(before_request)
app.after_request(after_request)


@app.route('/metrics')
def metrics():
    headers = {'Content-Type': CONTENT_TYPE_LATEST}
    return generate_latest(registry), 200, headers

@rycus86
Copy link
Owner

rycus86 commented May 4, 2018

Hi @snegivulcan and @thatcher ,

Have you had a look at the small example in https://github.com/rycus86/prometheus_flask_exporter/tree/master/examples/wsgi ?
I don't have much experience with running wsgi apps, but the results seem to have indicated it's working as intented.
Please let me know if that's not the case and I'll try to have another look.

Thanks!

@rycus86 rycus86 reopened this May 4, 2018
@jpds
Copy link

jpds commented May 17, 2018

I'm using the same code as @snegivulcan from his Flask example and I noticed from the comment headers that what's being exposed relates to one of the workers and the other belongs to the multiprocess metrics:

# HELP flask_http_request_duration_seconds Flask HTTP request duration in seconds
# TYPE flask_http_request_duration_seconds histogram
flask_http_request_duration_seconds_bucket{le="0.005",method="GET",path="/_healthcheck/",status="200"} 17.0
flask_http_request_duration_seconds_bucket{le="0.01",method="GET",path="/_healthcheck/",status="200"} 17.0
flask_http_request_duration_seconds_bucket{le="0.025",method="GET",path="/_healthcheck/",status="200"} 18.0
flask_http_request_duration_seconds_bucket{le="0.05",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="0.075",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="0.1",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="0.25",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="0.5",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="0.75",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="1.0",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="2.5",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="5.0",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="7.5",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="10.0",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_bucket{le="+Inf",method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_count{method="GET",path="/_healthcheck/",status="200"} 19.0
flask_http_request_duration_seconds_sum{method="GET",path="/_healthcheck/",status="200"} 0.07988429069519043
# HELP flask_http_request_duration_seconds Multiprocess metric
# TYPE flask_http_request_duration_seconds histogram
flask_http_request_duration_seconds_bucket{le="0.01",method="GET",path="/_healthcheck/",status="200"} 67.0
flask_http_request_duration_seconds_bucket{le="0.1",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="5.0",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_count{method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="0.075",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="1.0",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="0.5",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_sum{method="GET",path="/_healthcheck/",status="200"} 0.375255823135376
flask_http_request_duration_seconds_bucket{le="0.25",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="0.75",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="+Inf",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="2.5",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="0.05",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="7.5",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="10.0",method="GET",path="/_healthcheck/",status="200"} 80.0
flask_http_request_duration_seconds_bucket{le="0.005",method="GET",path="/_healthcheck/",status="200"} 63.0
flask_http_request_duration_seconds_bucket{le="0.025",method="GET",path="/_healthcheck/",status="200"} 76.0

@jpds
Copy link

jpds commented May 17, 2018

I found a fix from:

https://github.com/jonashaag/prometheus-multiprocessing-example/blob/master/yourapp.py

Where I define the following as a /metrics endpoint:

    @app.route('/metrics')
    def metrics():
        registry = CollectorRegistry()
        multiprocess.MultiProcessCollector(registry)
        data = generate_latest(registry)
        return Response(data, mimetype=CONTENT_TYPE_LATEST)

@rycus86
Copy link
Owner

rycus86 commented May 17, 2018

Hi @jpds,

Thanks for sharing your findings! The project you mentioned also links the multiprocessing section of the official Prometheus Python client library: https://github.com/prometheus/client_python#multiprocess-mode-gunicorn

I'll try to have a look if this could be supported somehow better with this library.

Thanks!

@rycus86
Copy link
Owner

rycus86 commented Aug 25, 2018

Hi,

There's a new release with some multiprocessing changes from @elephantum - https://pypi.org/project/prometheus-flask-exporter/0.2.3/

It would be great if you could give it a look if the previous problem still happens using this version.

Thanks!

@rycus86
Copy link
Owner

rycus86 commented Dec 17, 2018

I've added some more multiprocessing changes to version 0.5.0, mainly targeted for Gunicorn and uWSGI, but should work in a generic way - see the README for more info.

@rycus86 rycus86 closed this as completed Dec 17, 2018
@float34
Copy link

float34 commented Sep 9, 2019

@rycus86
Doesn't seem to work for me with uwsgi and latest versions of prometheus_client/prometheus_flask_exporter (0.7.1/0.9.1).
I just don't get any metrics when I try to implement it as

registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry, path='/tmp')

metrics = PrometheusMetrics(app, registry=registry)

But I do get them when I don't try to use multiprocess module (albeit with different values for each process of course)

@rycus86
Copy link
Owner

rycus86 commented Sep 9, 2019

Hi @Torquerrr
Have a look at https://github.com/rycus86/prometheus_flask_exporter/blob/master/README.md - there are some additional support classes for uwsgi integration with this library.
See also https://github.com/rycus86/prometheus_flask_exporter/blob/master/examples/uwsgi/server.py for a small example with uwsgi.

Let me know if you still think it's not working, things may have changed since the last release.

@float34
Copy link

float34 commented Sep 10, 2019

@rycus86 I was able to narrow the problem down to registry = CollectorRegistry(). When I use this class I don't have any metrics. But if I use simply REGISTRY from prometheus_client library, I have metrics available (but with values from different processes). Also, the .db file for Counter/Gauge (my additional metrics) is not created in /tmp for some reason, and permissions seem fine.
Probably it's my specific problem.

@rycus86
Copy link
Owner

rycus86 commented Sep 10, 2019

Have you tried the multiprocess support classes from this library?

from prometheus_flask_exporter.multiprocess import UWsgiPrometheusMetrics

That should take care of setting up the multiprocessing-ready registries and such, you can see it in the example I linked above.

@float34
Copy link

float34 commented Sep 10, 2019

@rycus86 Yes, now I am running it on the same host as main Flask app (host explicitly specified) and I get

root@xxxxxxxxxxx:/app# curl -X GET http://localhost:9100/metrics
curl: (7) Failed to connect to localhost port 9100: Connection refused

@rycus86
Copy link
Owner

rycus86 commented Sep 10, 2019

@Torquerrr see the example I linked above, the uwsgi metrics object needs an explicit call to enable the metrics endpoint on a port:

metrics = UWsgiPrometheusMetrics(app)
metrics.start_http_server(9100)

@float34
Copy link

float34 commented Sep 11, 2019

@rycus86 I've added that call to start_http_server, the only difference is that my Flask app is initialized inside another function in the same module (sort of lazy loading). So I call that start_http_server before app.run()

@rycus86
Copy link
Owner

rycus86 commented Sep 11, 2019

@float34
Copy link

float34 commented Sep 11, 2019

@rycus86 Does this example imply that I also need to set my uwsgi config to lazy-apps = true ?

@rycus86
Copy link
Owner

rycus86 commented Sep 11, 2019

I tested it like that, not a 100% sure if it's required or not.

@rycus86
Copy link
Owner

rycus86 commented Sep 11, 2019

Also, I wanted to say, if you can provide a stripped down example, I'm happy to try and have a look if the library needs changing perhaps?

@float34
Copy link

float34 commented Sep 11, 2019

@rycus86 Thank you for your help, I will try this approach and return with results :)
Yes, I will try to prepare such example. I want to reproduce that locally because I see this issue on Kubernetes pod only (

@float34
Copy link

float34 commented Sep 11, 2019

@rycus86 By the way - I can't find examples of how to add custom metrics like Counter/Gauge, are there any?
What I do is simply declaring them in the flask app module scope, and then increase/set their value.
And it sort of works, i.e. metrics are produced and exported (when testing locally of course), but I don't think I understand how those Counter/Gauge should be connected with multiprocess.MultiProcessCollector or UWsgiPrometheusMetrics, the connection seems not clear to me :-)

@rycus86
Copy link
Owner

rycus86 commented Sep 11, 2019

You can decorate your functions with the metrics helper, as shown in the readme: https://github.com/rycus86/prometheus_flask_exporter/blob/master/README.md

@app.route('/something')
@metrics.gauge('in_progress', 'Something in progress')
def some_handler():
    pass

@rycus86
Copy link
Owner

rycus86 commented Sep 11, 2019

I don't think I understand how those Counter/Gauge should be connected with multiprocess.MultiProcessCollector or UWsgiPrometheusMetrics, the connection seems not clear to me :-)

The UWsgiPrometheusMetrics and other multiprocess classes in this library are meant to handle the multiprocessing related boilerplate and setup, so you shouldn't need to use multiprocess.MultiProcessCollector and similar directly.
The examples folder in this repo has some samples on how to use them with different multiprocessing-capable systems, like uwsgi.

@float34
Copy link

float34 commented Sep 12, 2019

@rycus86
Turns out I had to specify prometheus_multiproc_dir: /tmp in Kubernetes Pod's spec.
After I did that everything works as expected, .db files are created in the expected location and metrics from multiprocess seem to be grouped by process_id.
Thank you so much for your help!

@rycus86
Copy link
Owner

rycus86 commented Sep 12, 2019

Oh, so sorry I forgot to mention that! :/
Yes, you do need that, maybe I should make that fail more aggressively, currently it's only checked in a few places, and obviously your code path didn't run into it.
https://github.com/rycus86/prometheus_flask_exporter/blob/master/prometheus_flask_exporter/multiprocess.py#L12

Glad you managed to work it out! 👍

@float34
Copy link

float34 commented Sep 12, 2019

@rycus86 In fact I was using multiprocess.MultiProcessCollector class, and it checks for that path differently. So when I simply init the class with path='/tmp' that other check passes - but the env var is still missing and .db files with metrics are not created :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants