AioSpider

This is a little toy I write when I learn python's asynchron.
By reference to the nodejs package node-spider, I write this "aiospider".They have similar apis.

But because of the difference between python and javascript, I can't convert splice to remove simplily when I want to control the maximum of tasks.

Fortunately I found this crawler-demo. It just uses a list of which length exactly is the maximum concurrent number. Just look like this:

workers = [asyncio.Task(work(), loop=loop) for _ in range(max_tasks)]
await Queue.join()
for w in workers:
    w.cancel()

Install:

    python3 setup.py install
    pip install -r requirements.txt

Example:

from aiospider import Spider
with Spider() as ss:
    async def parse_page(response):
        '''
        callback
        The response is just an aiohttp.ClientResponse object now.
        '''
        print("request url is %s, response status is %d"%(response.url,response.status))
    ss.start('https://www.python.org/',parse_page)
'''
my result : request url is https://www.python.org/, response status is 200
'''

TODO

~~request and callback exception handle~~
~~taskqueue call task with multi-parameter~~
wrap request , add proxy, etc.
wrap response

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
aiospider		aiospider
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
TODO		TODO
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AioSpider

Install:

Example:

TODO

About

Releases

Packages

Languages

License

hxzhao527/aiospider

Folders and files

Latest commit

History

Repository files navigation

AioSpider

Install:

Example:

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages