Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-process support #66

Merged
merged 3 commits into from
Oct 10, 2016
Merged

Add multi-process support #66

merged 3 commits into from
Oct 10, 2016

Conversation

brian-brazil
Copy link
Contributor

This works by having a shelve per process that are continously updated
with metrics, and a collector that reads from them.

Histogram buckets need accumulation and special handling for _count,
and for gauges we need to offer a number of options.

Fixes #30

# This needs to be chosen before the first metric is constructed,
# and as that may be in some arbitrary library the user/admin has
# no control over we use an enviroment variable.
if 'prometheus_multiproc_dir' in os.environ:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than hard coding the choice of _ValueClass here, could the class be defined by the users and loaded from anywhere on the path? #70 could be implemented in a separate package entirely and be developed/updated on it's own schedule. The shelf implementation can certainly ship with client_python, but users of other multi-process python servers could be free to experiment without needing to land code here.

Something like the following, shamelessly stolen from django/utils/module_loading.py with error handling stripped dangerously:

module_path, class_name = os.environ.get(PROMETHEUS_VALUE_CLASS, 'prometheus_client.core._MutexValue').rsplit('.', 1)
module = import_module(module_path)
_ValueClass = getattr(module, class_name)

Then if #70 was in it's own package prometheus_uwsgi, you might start the server with PROMETHEUS_VALUE_CLASS=prometheus_uwsgi.UwsgiCacheValue or PROMETHEUS_VALUE_CLASS=prometheus_client.core.MultiProcessValueandprometheus_multiproc_dir=/path/to/multiproc_dir` to use the shelf implementation described in this PR (removing the _ since it would become part of the public API)

Those _ValueClass implementations would be responsible for configuring themselves from the environment or wherever, which feels non-pythonic. Somehow configuring the system explicitly at process startup seems like the best path, and the developers could pass in whatever class they want, but I haven't dug in enough to the rest of the implementation to suggest what I would want that API to look like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable idea.

Somehow configuring the system explicitly at process startup seems like the best path

That won't work, as by the time your code is running all this code has likely already been imported and many metrics already exist. That's why I'm using environment variables, as they're set at exec() time before any python code has a chance to run.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't work, as by the time your code is running all this code has likely already been imported and many metrics already exist.

That's true assuming you want to set the _ValueClass once and only once and never have it specified in code. You could circumvent this by using a client pattern or parameterizing metric instantiation with a _ValueClass as well. I think it would make the interface nicer to use the environment variable by default, but also allow:

REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request', value_class=prometheus_client.ShelveValue)

or something of the sort.

@tlc
Copy link

tlc commented Apr 11, 2016

What is the status of this? Is it going to be pulled?

I need the functionality and will have to dl the branch if not.

Thanks!

@brian-brazil
Copy link
Contributor Author

The core functionality is fine, however the performance isn't good enough to be used in production. I haven't had the time to spare to find a better backing store for that.

This works by having a shelve per process that are continously updated
with metrics, and a collector that reads from them.

Histogram buckets need accumulation and special handling for _count,
and for gauges we need to offer a number of options.
This uses an mmaped file to store the data.
Fsync is not called, and concurrent readers are permitted (which
some shelve backends did not permit).

Microbenchmarks indicate 1.2-1.5us per instrumentation event.
This compares to 3-13ms from the shelve solution, so 3 orders
of magnitude better.

The 1.2-1.5us is about double normal multi-process instrumentation, due
to the 2nd mutex required. Python doesn't offer an RWLock which would
let this be reduced.

Overall this performance is acceptable, as if someone was doing
something extremely performance critical with instrumentation they
wouldn't be using Python.
@brian-brazil brian-brazil deleted the multiproc branch October 10, 2016 15:33
@brian-brazil brian-brazil restored the multiproc branch October 10, 2016 15:33
@brian-brazil brian-brazil reopened this Oct 10, 2016
@brian-brazil brian-brazil merged commit ac77d9b into master Oct 10, 2016
@brian-brazil brian-brazil deleted the multiproc branch October 10, 2016 15:34
@jonashaag
Copy link

FWIW, I started to work on performance improvement of your branch last week and simply went with JSON instead of shelve/BSDdb, which also gave an improvement of a few orders of magnitude. (I don't know the exact number right now but it was somewhere around a 2x slowdown compared to non-multiprocessing.) Maybe that's an alternative that involves less custom code.

@brian-brazil
Copy link
Contributor Author

That's about the same slowdown as this code, which is presumably dominated by the 2 mutexes in use (why does using a mutex take most of a microsecond?). Hopefully we'll get that faster some time, but I doubt you got too much better than the 80 lines of code I managed it in.

@jonashaag
Copy link

I doubt you got too much better than the 80 lines of code I managed it in.

Sorry, I did not mean to say that your code was bad or something. Actually, the binary format is very simple and the code is easy to understand. My point is that if it does not make a difference in terms of performance, maybe it's more future-proof and pragmatic to simply go with JSON -- if only so that future readers of the code don't have to worry about learning about a new serialisation format. It's not a big deal at this point, but it might become one if features are added to serialisation format; that's what I'm saying.

@brian-brazil
Copy link
Contributor Author

Let's see where it goes. It's all self-contained, and we don't need to worry about backwards compatibility. The tricky stuff is all in the filenames anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants