New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modernize multiprocessing (convert to asyncio) #643
Comments
|
Yes that's definitely possible since our effort is mainly bounded by I/O, and previously my pr #186 could be optimized from both algorithm and python3-specific aspect. I'll help if anyone is interested. |
|
The following resources might be helpful for this: |
|
Possible optimizations:
Do we still want to maintain 2 versions? 1) for multithreading and 2) for single threaded. |
|
Nice. I think you'll find that running the checkers is something that might be good to have threaded. But the rest will probably be fine within the main event loop. I'd recommend sticking with ahiohttp over parfive, having a larger userbase means we can count on it to stay maintained. |
|
I feel like running checkers are CPU bound task because it involves finding regex pattern in every line of the lines array(parsed by strings module). There won't be any benefit in running it with multiple threads because of python GIL. It would be nice if we can run checkers in truly parallel manner. So, I think process pool is right choice. |
|
In case of aiohttp vs parfive, I am going to benchmark performance of both of the module. If there won't be that much difference, I will stick with aiohttp. In case of parfive, we probably won't have to worry about maintenance because it's a submodule of sunpy. |
Yes, process pool is right choice |
|
Do we still want to provide an option for multithreaded? Since now we are going to use asyncio everywhere. It won't be practical to maintain two versions and I think it won't serve any purpose. Why one want to use slower synchronous version? I propose just maintain async functions for everything. |
|
I agree, make everything |
|
+1 to not maintaining non-async variants of the same code. We should try to be self-aware about making sure the debugging log messages are easy to follow, as people often have more trouble debugging async code, but that's probably a given anyhow. |
|
Update: extractor, strings, file and cvedb is asynchronous now. I will continue working on it in my free time and I am hoping to complete it before version 2.0 release but it can take long since my academics has resumed now. |
|
I'm listing this as "future" in case parts of it don't get done before 2.0, but anything that can be done for this release would be great. |
|
I think most of this has been done and this issue can be closed. If anyone notices specific areas that can be improved, we can open separate issues for them. |
I can't remember the details, but I know @wzao1515 had to make some compromises with the multiprocessing in order to get it to behave on python 2.7. If someone's feeling ambitious, now that we support only 3.6+, it's possible that we can do the multiprocessing more elegantly.
The text was updated successfully, but these errors were encountered: