Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 59 additions & 54 deletions docs/client/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ for access to client projects.
Projects
--------

You can list the projects available to your account::
You can list the :class:`~scrapinghub.client.projects.Projects` available to your
account::

>>> client.projects.list()
[123, 456]
Expand Down Expand Up @@ -67,31 +68,6 @@ For example, to schedule a spider run (it returns a
<scrapinghub.client.Job at 0x106ee12e8>>


Settings
--------

You can work with project settings via :class:`~scrapinghub.client.projects.Settings`.

To get a list of the project settings::

>>> project.settings.list()
[(u'default_job_units', 2), (u'job_runtime_limit', 24)]]

To get a project setting value by name::

>>> project.settings.get('job_runtime_limit')
24

To update a project setting value by name::

>>> project.settings.set('job_runtime_limit', 20)

Or update a few project settings at once::

>>> project.settings.update({'default_job_units': 1,
... 'job_runtime_limit': 20})


Spiders
-------

Expand Down Expand Up @@ -160,17 +136,17 @@ Use ``run`` method to run a new job for project/spider::

Scheduling logic supports different options, like

- job_args to provide arguments for the job
- units to specify amount of units to run the job
- job_settings to pass additional settings for the job
- priority to set higher/lower priority of the job
- add_tag to create a job with a set of initial tags
- meta to pass additional custom metadata
- **job_args** to provide arguments for the job
- **units** to specify amount of units to run the job
- **job_settings** to pass additional settings for the job
- **priority** to set higher/lower priority of the job
- **add_tag** to create a job with a set of initial tags
- **meta** to pass additional custom metadata

For example, to run a new job for a given spider with custom params::

>>> job = spider.jobs.run(units=2, job_settings={'SETTING': 'VALUE'},
priority=1, add_tag=['tagA','tagB'], meta={'custom-data': 'val1'})
>>> job = spider.jobs.run(units=2, job_settings={'SETTING': 'VALUE'}, priority=1,
... add_tag=['tagA','tagB'], meta={'custom-data': 'val1'})

Note that if you run a job on project level, spider name is required::

Expand Down Expand Up @@ -216,7 +192,7 @@ ones::
>>> job_summary = next(project.jobs.iter())
>>> job_summary.get('spider', 'missing')
'foo'
>>> jobs_summary = project.jobs.iter(jobmeta=['scheduled_by', ])
>>> jobs_summary = project.jobs.iter(jobmeta=['scheduled_by'])
>>> job_summary = next(jobs_summary)
>>> job_summary.get('scheduled_by', 'missing')
'John'
Expand All @@ -235,8 +211,9 @@ To get jobs filtered by tags::

>>> jobs_summary = project.jobs.iter(has_tag=['new', 'verified'], lacks_tag='obsolete')

List of tags has ``OR`` power, so in the case above jobs with 'new' or
'verified' tag are expected.
List of tags in **has_tag** has ``OR`` power, so in the case above jobs with
``new`` or ``verified`` tag are expected (while list of tags in **lacks_tag**
has ``AND`` power).

To get certain number of last finished jobs per some spider::

Expand All @@ -250,10 +227,10 @@ for filtering by state:
- finished
- deleted

Dict entries returned by ``iter`` method contain some additional meta,
but can be easily converted to ``Job`` instances with::
Dictionary entries returned by ``iter`` method contain some additional meta,
but can be easily converted to :class:`~scrapinghub.client.jobs.Job` instances with::

>>> [Job(x['key']) for x in jobs]
>>> [Job(client, x['key']) for x in jobs]
[
<scrapinghub.client.Job at 0x106e2cc18>,
<scrapinghub.client.Job at 0x106e260b8>,
Expand Down Expand Up @@ -290,6 +267,25 @@ It's also possible to get last jobs summary (for each spider)::

Note that there can be a lot of spiders, so the method above returns an iterator.


update_tags
^^^^^^^^^^^

Tags is a convenient way to mark specific jobs (for better search, postprocessing etc).


To mark all spider jobs with tag ``consumed``::

>>> spider.jobs.update_tags(add=['consumed'])

To remove existing tag ``existing`` for all spider jobs::

>>> spider.jobs.update_tags(remove=['existing'])

Modifying tags is available on :class:`~scrapinghub.client.spiders.Spider`/
:class:`~scrapinghub.client.jobs.Job` levels.


Job
---

Expand All @@ -310,6 +306,10 @@ To delete a job::

>>> job.delete()

To mark a job with tag ``consumed``::

>>> job.update_tags(add=['consumed'])

.. _job-metadata:

Metadata
Expand Down Expand Up @@ -422,13 +422,12 @@ To post a new activity event::
Or post multiple events at once::

>>> events = [
{'event': 'job:completed', 'job': '123/2/5', 'user': 'john'},
{'event': 'job:cancelled', 'job': '123/2/6', 'user': 'john'},
]
... {'event': 'job:completed', 'job': '123/2/5', 'user': 'john'},
... {'event': 'job:cancelled', 'job': '123/2/6', 'user': 'john'},
... ]
>>> project.activity.add(events)



Collections
-----------

Expand Down Expand Up @@ -559,24 +558,30 @@ Frontiers are available on project level only.

.. _job-tags:

Tags
----

Tags is a convenient way to mark specific jobs (for better search, postprocessing etc).
Settings
--------

To mark a job with tag ``consumed``::
You can work with project settings via :class:`~scrapinghub.client.projects.Settings`.

>>> job.update_tags(add=['consumed'])
To get a list of the project settings::

To mark all spider jobs with tag ``consumed``::
>>> project.settings.list()
[(u'default_job_units', 2), (u'job_runtime_limit', 24)]]

>>> spider.jobs.update_tags(add=['consumed'])
To get a project setting value by name::

To remove existing tag ``existing`` for all spider jobs::
>>> project.settings.get('job_runtime_limit')
24

>>> spider.jobs.update_tags(remove=['existing'])
To update a project setting value by name::

>>> project.settings.set('job_runtime_limit', 20)

Modifying tags is available on spider/job levels.
Or update a few project settings at once::

>>> project.settings.update({'default_job_units': 1,
... 'job_runtime_limit': 20})


Exceptions
Expand Down
10 changes: 5 additions & 5 deletions docs/legacy/hubstorage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ If it used, then it's up to the user to list all the required fields, so only fe
>>> metadata = next(project.jobq.list())
>>> metadata.get('spider', 'missing')
u'foo'
>>> jobs_metadata = project.jobq.list(jobmeta=['scheduled_by', ])
>>> jobs_metadata = project.jobq.list(jobmeta=['scheduled_by'])
>>> metadata = next(jobs_metadata)
>>> metadata.get('scheduled_by', 'missing')
u'John'
Expand All @@ -150,7 +150,7 @@ List of tags has ``OR`` power, so in the case above jobs with 'new' or 'verified

To get certain number of last finished jobs per some spider::

>>> jobs_metadata = project.jobq.list(spider='foo', state='finished' count=3)
>>> jobs_metadata = project.jobq.list(spider='foo', state='finished', count=3)

There are 4 possible job states, which can be used as values for filtering by state:

Expand All @@ -167,7 +167,7 @@ To iterate through items::

>>> items = job.items.iter_values()
>>> for item in items:
# do something, item is just a dict
... # do something, item is just a dict

Logs
^^^^
Expand All @@ -176,7 +176,7 @@ To iterate through 10 first logs for example::

>>> logs = job.logs.iter_values(count=10)
>>> for log in logs:
# do something, log is a dict with log level, message and time keys
... # do something, log is a dict with log level, message and time keys

Collections
^^^^^^^^^^^
Expand Down Expand Up @@ -246,4 +246,4 @@ Module contents
:undoc-members:
:show-inheritance:

.. _scrapinghub.ScrapinghubClient: ../client/overview.html
.. _scrapinghub.ScrapinghubClient: ../client/overview.html
4 changes: 2 additions & 2 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Work with your projects::
Run new jobs from the client::

>>> project = client.get_project(123)
>>> project.jobs.run('spider1', job_args={'arg1':'val1'})
>>> project.jobs.run('spider1', job_args={'arg1': 'val1'})
<scrapinghub.client.Job at 0x106ee12e8>>

Access your jobs data::
Expand Down Expand Up @@ -69,7 +69,7 @@ By default, tests use VCR.py ``once`` mode to:
It means that if you add new integration tests and run all tests as usual,
only new cassettes will be created, all existing cassettes will stay unmodified.

To ignore existing cassettes and use real service, please provide a flag::
To ignore existing cassettes and use real services, please provide a flag::

py.test --ignore-cassettes

Expand Down
11 changes: 5 additions & 6 deletions scrapinghub/client/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
from scrapinghub import Connection as _Connection
from scrapinghub import HubstorageClient as _HubstorageClient

from .exceptions import _wrap_http_errors
from .projects import Projects
from .exceptions import wrap_http_errors

from .utils import parse_auth
from .utils import parse_project_id, parse_job_key

Expand All @@ -13,14 +12,14 @@

class Connection(_Connection):

@wrap_http_errors
@_wrap_http_errors
def _request(self, *args, **kwargs):
return super(Connection, self)._request(*args, **kwargs)


class HubstorageClient(_HubstorageClient):

@wrap_http_errors
@_wrap_http_errors
def request(self, *args, **kwargs):
return super(HubstorageClient, self).request(*args, **kwargs)

Expand Down Expand Up @@ -71,9 +70,9 @@ def get_project(self, project_id):
return self.projects.get(parse_project_id(project_id))

def get_job(self, job_key):
"""Get Job with a given job key.
"""Get :class:`~scrapinghub.client.jobs.Job` with a given job key.

:param job_key: job key string in format 'project_id/spider_id/job_id',
:param job_key: job key string in format ``project_id/spider_id/job_id``,
where all the components are integers.
:return: a job instance.
:rtype: :class:`~scrapinghub.client.jobs.Job`
Expand Down
28 changes: 17 additions & 11 deletions scrapinghub/client/activity.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from __future__ import absolute_import

from .utils import _Proxy
from .utils import parse_job_key
from .proxy import _Proxy
from .utils import parse_job_key, update_kwargs


class Activity(_Proxy):
Expand Down Expand Up @@ -31,23 +31,29 @@ class Activity(_Proxy):
- post a new event::

>>> event = {'event': 'job:completed',
'job': '123/2/4',
'user': 'jobrunner'}
... 'job': '123/2/4',
... 'user': 'jobrunner'}
>>> project.activity.add(event)

- post multiple events at once::

>>> events = [
{'event': 'job:completed', 'job': '123/2/5', 'user': 'jobrunner'},
{'event': 'job:cancelled', 'job': '123/2/6', 'user': 'john'},
]
... {'event': 'job:completed', 'job': '123/2/5', 'user': 'jobrunner'},
... {'event': 'job:cancelled', 'job': '123/2/6', 'user': 'john'},
... ]
>>> project.activity.add(events)

"""
def __init__(self, *args, **kwargs):
super(Activity, self).__init__(*args, **kwargs)
self._proxy_methods([('iter', 'list')])
self._wrap_iter_methods(['iter'])
def iter(self, count=None, **params):
"""Iterate over activity events.

:param count: limit amount of elements.
:return: a generator object over a list of activity event dicts.
:rtype: :class:`types.GeneratorType[dict]`
"""
update_kwargs(params, count=count)
params = self._modify_iter_params(params)
return self._origin.list(**params)

def add(self, values, **kwargs):
"""Add new event to the project activity.
Expand Down
Loading