Donkey

Donkey is a simple but extensible web scraper.

##Installation

Install via pip:

pip install donkey_scraper

Embarrasingly, I still don't 100% understand Pypi and distutils, especially for complex modules like lxml, so you'll need to install those dependencies seperately.

Dependencies needed:

lxml
jmespath

which should both be available on pip. For lxml, good luck...

##Usage

Core Donkey library covers the most simple of scraping workflows:

perform a HTTP request
do some kind of processing

###Basic Usage

By default, the Query object uses the request grabber (the only one which comes as standard), and the XPATH handler.

>>> from donkey import query
>>> q = query.Query()
>>> q.fetch(
...     url='http://example.com'
... ).handle(
...     title = '//title//text()'
... ).data
0: {'title': ['Example Domain']}
>>>

The other standard handler is the JMESPATH handler, for querying JSON objects. Without any handling arguments, it will return the full JSON object:

>>> q = query.Query(
...     handler='JMESPATH'
... )
... q.fetch(
...     url='http://echo.jsontest.com/insert-key-here/insert-value-here/key/value',
... ).handle(
... ).data
1: {u'insert-key-here': u'insert-value-here', u'key': u'value'}
>>> q = query.Query(
...     handler='JMESPATH'
... )
... q.fetch(
...     url='http://echo.jsontest.com/insert-key-here/insert-value-here/key/value',
... ).handle(
...     a='key'
... ).data
2: {'a': u'value'}
>>>

donkey caches requests in a SQLite database. How far back in the cache to look for a valid response is controlled by the freshness parameter when instanciating a query.

##Coming soon!

More grabbers
More handlers
Web interface
Automated Scraping jobs

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
donkey		donkey
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.txt		LICENSE.txt
MANIFEST		MANIFEST
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Donkey

About

Releases

Packages

Languages

License

gregroberts/Donkey

Folders and files

Latest commit

History

Repository files navigation

Donkey

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages