Skip to content
A client interface for Scrapinghub's API
Python Shell
Latest commit 0fefb69 Jun 6, 2016 @dangra dangra Merge pull request #24 from jesuslosada/update-url
Update API base URL


Client interface for Scrapinghub API



The scrapinghub module is a Python library for communicating with the Scrapinghub API.

First, you connect to Scrapinghub:

>>> from scrapinghub import Connection
>>> conn = Connection('APIKEY')
>>> conn

You can list the projects available to your account:

>>> conn.project_ids()
[123, 456]

And select a particular project to work with:

>>> project = conn[123]
>>> project
Project(Connection('APIKEY'), 123)

To schedule a spider run (it returns the job id):

>>> project.schedule('myspider', arg1='val1')

To get the list of spiders in the project:

>>> project.spiders()
  {u'id': u'spider1', u'tags': [], u'type': u'manual', u'version': u'123'},
  {u'id': u'spider2', u'tags': [], u'type': u'manual', u'version': u'123'}

To get all finished jobs:

>>> jobs ='finished')

jobs is a JobSet. JobSet objects are iterable and, when iterated, return an iterable of Job objects, so you typically use it like this:

>>> for job in jobs:
...     # do something with job

Or, if you just want to get the job ids:

>>> [ for x in jobs]
[u'123/1/1', u'123/1/2', u'123/1/3']

To select a specific job:

>>> job = project.job(u'123/1/2')

To retrieve all scraped items from a job:

>>> for item in job.items():
...     # do something with item (it's just a dict)

To retrieve all log entries from a job:

>>> for logitem in job.log():
...     # logitem is a dict with logLevel, message, time

To get job info:


To mark a job with tag consumed:

>>> job.update(add_tag='consumed')

To mark several jobs with tag consumed (JobSet also supports the update() method):


To delete a job:

>>> job.delete()

To delete several jobs (JobSet also supports the update() method):

Something went wrong with that request. Please try again.