Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for an async api of the package #77

Closed
oldani opened this issue Mar 3, 2018 · 2 comments
Closed

Add support for an async api of the package #77

oldani opened this issue Mar 3, 2018 · 2 comments

Comments

@oldani
Copy link
Member

oldani commented Mar 3, 2018

When scraping we look for performance when doing it on a large scale, asyncio makes improvements on this, and since this library is >python3.6 support, we could implement this without hack around.

Since the project it's already quite used by a lot of people, the idea is that anyone can use the package in async and sync ways. How to support both versions without duplicate codebase tens of debates. So like the codebase on this case is not that large what I think we could do is rewrite everything in async and then add wrappers to the API for sync support, then users in sync mode can use the library like normal even though behind the scene it will be running asynchronously; I have achieved this in others projects creating a sync fn wich within call the async version inside a decorator fn that handles the loop and any other async stuff.

Since this library depends on request which does not support async yet I see two options if we choose the way proposed above, keep using requests and running it in a ThreadPoolExecutor (yet this won't allow actually hight concurrency) or use aiohttp which interface is barely similar to requests.

@kennethreitz let me know what you think.

@kennethreitz
Copy link
Collaborator

+1 ThreadPoolExecutor

See https://github.com/requests/requests-threads

@oldani
Copy link
Member Author

oldani commented Mar 5, 2018

Hey @kennethreitz

I started by adding the async session #101 , I ran some tests locally and it's a bit faster (for 100 requests sync: 30s, async: 25s (with uvloop 22s). When requests get async support I can result even faster.

Keep this issue open so I will continue with the HTMLResponse async interface. Don't know if you already want to include the async session in the docs if it gets past the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants