-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: uWSGI Multiprocess Support #70
Conversation
Files may be in contention without this depending on the platform and DBM implementation.
Still a large number of issues to build out, but several new concepts: - Store floats as ints in the uWSGI cache framework with a certain "resolution" so we can cram floats into ints. - Use the same "parition" idea to divide uWSGI cache values by pid as well. This is largely because there is no numeric, atomic uWSGI cache set op. - Rename things that were "multiprocess" to "partitioned". - Stores a list of uWSGI workers in a magic cache key; this will likely be unused as the entire cache is crawled for relevant keys. - Dead or alive PIDs are checked against the uWSGI worker status dict. - An extremely awkward test harness was added for uWSGI that currently must be run _under uWSGI_ as an app... Notes: - uWSGI cache operations in the Python interface are largely undocumented. There is no way to "set" a number in the cache. - The metrics framework is unable to declare new metrics at runtime.
Extends the uWSGI test suite to spawn uWSGI, then run that same test file as a WSGI app, then make an HTTP request against that running instance, asserting the response code, where the response code is 204 if the uWSGI tests pass and 500 if they fail. This is done because there does not seem to be a way to mock the uWSGI Python module beyond running uWSGI itself. The uWSGI tests must be run within a virtual env, which is used to determine the location of pytest and uwsgi.
Also, #66 should go in before this. |
Also fixes the locking around the value so the lock is instantiated outside of the class. Additionally, bumped up the resolution multiplier to be significantly more useful/have greater precision.
Thanks for the change, this is an important area for the python client. The goal here is to have something that'll work for all the major uses of multi-process instrumentation in Python, with semantics that are as close as we can get them to the threaded equivalent. I'm not too familiar with uWSGI's cache. My initial research indicates that it's not a good match. Firstly, it's a cache not a datastore so we need to worry about expiry. Secondly it appears to support modes meant for multi-machine synchronisation including UDP and Django's database. If a user is using one of these then the semantics and performance will not be as desired. Finally it only works for those using uWSGI so even if we can handle the previous issues, we're left with all the other use cases. We do need to support floats. As it can only store a 64bit integer and we're using 64bit floats, one approach is to put the float's machine representation in the cache. At a high level I think this could be an option for some of our users, but it's not a full solution for the multi-process problem. As it stands the only thing preventing #66 being a full solution is getting a shelve equivalent that support concurrency and that doesn't call fsync on every write to disk. |
I don't feel that this moves away from this goal. It offers an adapter to bring a "built-in" speed and efficiency increase given that users are running their Python WSGI app under one of the main WSGI servers used in production environments. Were Gunicorn to have similar features, it would make sense to optimize for them as well, I feel. That said, I see now that the uWSGI "adapter" should be opt-in, rather than automatic if The thesis of this PR is just: the tradeoff of the universality of using
The default settings for the uWSGI cache has expiration disabled. Additionally, I think if a user is concerned with the number of cache entries, cache block size, etc., that's a deployment concern for the user that they take on. They should be considering that just as they consider the implications of writing shelve files to disk.
It does, but there's no reason to use these features in a Prometheus setup. If you have multiple nodes exposing metrics, it's better to have your Prometheus server scrape multiple nodes than side-aggregating all of the metrics using a uWSGI feature, and expose metrics on a single node. These features are not on by default, and caches can be named and distinct. The user could opt to have one named cache use UDP sync and not another. I will update the PR to support a named cache; that's an oversight.
Again, I think this should be viewed as an optimization, not a catch-all. For those using uWSGI (I am assuming that's not a small number), it should be much faster and won't hit disk.
This is what I'd like to do. The multiplication thing is a hack.
I spent some time thinking this through before I embarked on the PR. What are the ways other programs cross the process divide? There's a standalone database, shared disk, shared memory, or a shared network resource, more or less. Databases and shared network resources are no good because they implicate a significant dependency for the end user for what should be a lightweight library. Shared disk is well-known, but slow. Shared memory is fast but requires management and allocation. Arguably, shared disk is one of the few things that comes somewhat universally available in a Python implementation. You don't want to assume you have shared memory across the user's processes because that would be managed "above" Python. So, I think the disk-based route is your baseline, and it serves your broadest audience, unless a shared-memory solution is available (and uWSGI provides this). There's still caveats with disk, even. Google AppEngine's Python runtime doesn't allow writing to disk, for example. My opinion is that in this scenario, it makes sense to offer additional "adapters" to optimize for a certain environment so that some users can benefit from better performance, but nearly everyone can use it one way or another.
I don't think I know of one of these that isn't a database and not a persistence utility. You essentially need to buffer before disk, and you want concurrent access to that buffer; there are options here, like locking on the shelve database and flush periodically, but at that point you're looking like a database again and it still has to flush periodically. And, between the flush interval, the exported metrics will be stale. |
What's needed is a database that will write to the disk via mmap, but not call fsync. Worst case we can implement it on top of mmap ourselves. I used shelve initially to avoid having to come up with a binary file format for the proof of concept. |
I still think those are optimizations to a file-based "metric backend." Why not a uWSGI metric backend? If a user would prefer |
It'll work the roughly same as now, they're files on disk. The only new problem is dealing with growing the file.
It's a possibility, but it only covers some users. This PR is also far bigger than adding a uWSGI backend. I'm very hesitant to accept a significant re-architecture that provides only a partial solution. If it worked with the existing code I'd be willing to accept it (though we'll need to figure out something for the tests).
That should never be a backend. Talking to the network on each instrumentation call would be very limiting performance wise. |
I won't argue there; said it myself at the outset. 😉 Would you be open to a different PR that removes any uWSGI concerns? The bulk of the work is to open up the logic of combining metrics from different PIDs to be more accessible and not If you're open to an architectural PR as a critique aimed at being able to extend what comes with the base client, that might help others out too. For example, if that can get in, I'd be able to make my own subclass that uses the uWSGI cache as storage and it wouldn't need to be in this project.
I don't know if that's true. A local UNIX socket would be very fast. I won't assume that's right it until I can test it, though. |
Your code is just under 4 times longer than what it's replacing. I think a straight factoring out of the merging code in
It's intended to be extensible, though there's currently no way to hook in
That's a syscall for every event versus a mutex for mmap. There'll be orders of magnitude difference in latency. |
We've been looking into Prometheus and were almost immediately bit by #30. We brainstormed some solutions and I've made some major changes on top of @brian-brazil's work in the multiproc branch that are pretty large, so I'm opening now, unfinished, to gather feedback and (hopefully) finish this out.
High-level list of changes:
MultiProcess
has been swapped forPartitioned
, likePartitionedCollector
for example.Metric
andSubmetric
as classes that know how to recreate themselves from otherwise mushy data (the key JSON, for example). I'm still unhappy with these, as they confound theMetric
in the main part of the library, and I think the two should be merged into a single kind, but in trying to limit scope, I held off.PartitionedMetric
s hold the logic for building their list of samples, rather than having branches and further inner loops to handle each case. I think there is more that can be done here.shelve
so that it is more generic. This opens the door to...shelve
. There are numerous sticky things about this implementation still, and I think this is more of a POC than anything, but I'm now convinced it will work. See the uWSGI notes at the bottom for more...Minor changes:
Still left to do:
UWSGIValue
constructor)So, again, I'm hoping for feedback on the direction this is going, other ideas, critiques, and nits. I do apologize for the huge PR, but I hope it's welcome!
uWSGI notes: