-
-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async file I/O #20
Comments
Because I've used and am very familiar with aiofiles, I thought I'd work on this. Instead of duplicating aiofiles though, I thought I'd try to be fancy and dynamically recreate the entire IOBase class hierarchy. I came up with this monstrosity, but it doesn't work because:
I think I learned enough from this experiment that I can come up with something much better and actually-working. I also agree with aiofiles that async wrapper methods should probably be generated at class creation time. Attributes, however, should be handled in Other ideas:
I welcome feedback on any of this. |
Should this be a new |
Hey, sorry for the slow response! I'm still getting caught up again after PyCon. In general, my preference is to avoid magic introspection, dispatch, subclassing, and similar things... I like delegation and explicit lists :-). In theory, it's extra typing and they can get out-of-sync with Python, but in practice I think it's worth it because even if it does go wrong sometimes, it's much easier to fix than complicated magic. Example: in trio's socket wrapper, here are the attributes that get delegated directly and here's one that gets wrapped. I guess in this case the problem is that the I think this is probably a fundamental and only-one-way-to-do-it enough feature that it makes sense to put into trio itself. As public API, maybe all we need is |
Note for posterity: One of the basic design questions here is whether we want to support native async I/O primitives when they're available, or whether we want to commit to using threads for everything all the time. This makes a big difference, because using native async I/O primitives requires pretty much throwing out the stdlib The options are: Windows has some async file I/O options (basically just Microsoft says that "async" file I/O still sometimes blocks (a 2014 comment reporting the same thing). I've heard that the same is true for Linux's native AIO (I think what happens is the actual block writing can be async, but stuff like walking the extent tree to figure out where the blocks will go is still synchronous – basically Linux's AIO is really only intended for use on raw block devices by the kind of mini-operating system that masquerades as an RDBMS). In addition, various reports suggest that doing synchronous I/O from a thread-pool is actually faster than doing native async I/O. There's nothing magic about kernel async I/O implementations; a kernel thread and a kernel state machine are essentially the same thing, plus the synchronous paths get way more attention from maintainers. (You may have heard that internally, the Windows kernel is all async, so async operations will always be just as fast as synchronous ones. It turns out that this isn't true – they have a special fast path for synchronous I/O on ordinary files!) User space threads are a bit more expensive than kernel threads, but with an efficient thread pool it doesn't make much difference. Probably the main benefit that native kernel support could potentially provide is that the kernel can detect when the data being requested is already in RAM, and return it immediately without any kind of thread overhead at all. But in practice there aren't any actually-usable APIs that reliably work like this. Presumably for these reasons, libuv uses a thread pool for disk I/O in all configurations, and this seems to be universally agreed to be the right solution when writing C programs. Does trio being written in Python make a difference? Well, our thread synchronization overhead is much (much) worse than a well-tuned C thread pool. Right now it's especially silly because we actually spawn a new thread for every operation instead of caching them like a thread pool does, but then, the reason we do that is that finding a thread in the cache and scheduling a job to it is expensive enough from Python that spawning a new thread each time is nearly as good. In any case, we can certainly improve OTOH, the stdlib In conclusion, I think going all-in on interesting reference: http://blog.libtorrent.org/2012/10/asynchronous-disk-io/ |
I'd actually love tinkering with this, once we get the catch-all thread implementation done.
I haven't read too deep into the implementation, but isn't the actual buffer hidden from normal python code? How would this work?
Would it be especially crazy to integrate with libuv, if they already do this well?
This sounds like it could be fun. |
Here's another update on async I/O in Linux: https://lwn.net/Articles/724198/ Basically it confirms that the existing native async I/O routines are currently useless for our purposes. |
Done in #180 . |
Maybe a wrapper around
pathlib.Path
that wraps everything inrun_in_worker_thread
, and whereopen
returns a file-like object whose methods have again been wrapped inrun_in_worker_thread
? (#10 and #6 are relevant.)On Windows it'd be nice to go the extra step and use the IOCP methods for actual file I/O.
The text was updated successfully, but these errors were encountered: