Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asyncio support #837

Closed
subiol opened this issue Jan 28, 2017 · 5 comments
Closed

Asyncio support #837

subiol opened this issue Jan 28, 2017 · 5 comments

Comments

@subiol
Copy link

subiol commented Jan 28, 2017

Is there any chance that h5py will add asyncio support?

I have searched here and in the mailing list and there is not even mention of asyncio support. Given that now python has an official async library, asyncio, it would make sense for h5py to support it, given the I/O nature of h5py.

@aragilar
Copy link
Member

So there are two separate issues with adding asyncio support:

  1. asyncio explicitly does not support filesystem I/O at this time, see e.g. https://github.com/python/asyncio/wiki/ThirdParty#filesystem, https://groups.google.com/forum/#!topic/python-tulip/MvpkQeetWZA, https://stackoverflow.com/questions/87892/what-is-the-status-of-posix-asynchronous-i-o-aio, and https://github.com/Tinche/aiofiles which is the closest to want you'd want.
  2. All I/O is done through HDF5 (the library), so whatever async support you'd want to add would need support in HDF5 (the library)

This basically means that h5py is unlikely to ever support asyncio. If you've got a specific use-case you may want to explain it on the h5py mailing list (or even the hdf5 mailing lists), to see if other people have run into similar problems and have suggestions.

@subiol
Copy link
Author

subiol commented Jan 29, 2017

Thanks for the answer. After reading the links you provided, what I get is:

  1. There is limited support for async file io and on top of that inconsistent through the different operating systems, some do not even have it.

  2. Given that, the realistic option to handle file io within an async loop is to use threads. So for pyh5, when I have to read from or write to a hdf5 file I should put that operation inside a thread to not block my asyncio thread. Any other possibility to not block the asyncio thread?

@aragilar
Copy link
Member

You could try running things in a thread, no guarantees it will work well though, as I mentioned, HDF5 controls the I/O, and you will want to make sure you don't run into any of its locking controls. You probably will want to understand which file mode mentioned at http://docs.h5py.org/en/latest/high/file.html#file-drivers will work best for you. I've got no other useful advice (other than ask on the mailing lists), as using h5py/HDF5 asynchronously isn't something I've specifically something I've dug into, but I'd suggest thinking about if your current approach is the best way, e.g. if you're using HDF5 just for message passing, there are quite a few other alternatives (e.g. https://en.wikipedia.org/wiki/MessagePack) which are simpler than HDF5, or if you're using asyncio just because it's asyncio, maybe you could consider other alternatives such as multiprocessing or concurrent.futures?

@subiol
Copy link
Author

subiol commented Jan 30, 2017

Concurrent.futures are threads, that is what I was thinking on using until asyncio support (that now I know will probably not happen). The issue is that even using a pool of threads, you incur in some overhead over polling a file descriptor.

@zhangyingmath
Copy link

It looks like there are some new progress being made in the community, https://github.com/hpc-io/vol-async
Is there any chance h5py can engage this and make it easier for python users to take advantage of it? I am thinking about a use case where I am writing/reading a list of datasets in "async" mode. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants