Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 79 additions & 19 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Project instance also has the following fields:

- activity - access to project activity records
- collections - work with project collections (see ``Collections`` section)
- frontier - using project frontier (see ``Frontier`` section)
- frontiers - using project frontier (see ``Frontiers`` section)
- settings - interface to project settings
- spiders - access to spiders collection (see ``Spiders`` section)

Expand Down Expand Up @@ -411,44 +411,104 @@ Usual workflow with `Collections`_ would be::

Collections are available on project level only.

Frontier
--------
Frontiers
---------

Typical workflow with `Frontier`_::

>>> frontier = project.frontier
>>> frontiers = project.frontiers

Add a request to the frontier::
Get all frontiers from a project to iterate through it::

>>> frontiers.iter()
<list_iterator at 0x103c93630>

List all frontiers::

>>> frontiers.list()
['test', 'test1', 'test2']

Get a frontier by name::

>>> frontier = frontiers.get('test')
>>> frontier
<scrapinghub.client.Frontier at 0x1048ae4a8>

Get an iterator to iterate through a frontier slots::

>>> frontier.iter()
<list_iterator at 0x1030736d8>

List all slots::

>>> frontier.list()
['example.com', 'example.com2']

Get a frontier slot by name::

>>> slot = frontier.get('example.com')
>>> slot
<scrapinghub.client.FrontierSlot at 0x1049d8978>

Add a request to the slot::

>>> slot.queue.add([{'fp': '/some/path.html'}])
>>> slot.flush()
>>> slot.newcount
1

``newcount`` is defined per slot, but also available per frontier and globally::

>>> frontier.add('test', 'example.com', [{'fp': '/some/path.html'}])
>>> frontier.flush()
>>> frontier.newcount
1
>>> frontiers.newcount
3

Add a fingerprint only to the slot::

>>> slot.fingerprints.add(['fp1', 'fp2'])
>>> slot.flush()

There are convenient shortcuts: ``f`` for ``fingerprints`` and ``q`` for ``queue``.

Add requests with additional parameters::

>>> frontier.add('test', 'example.com', [{'fp': '/'}, {'fp': 'page1.html', 'p': 1, 'qdata': {'depth': 1}}])
>>> frontier.flush()
>>> frontier.newcount
2
>>> slot.q.add([{'fp': '/'}, {'fp': 'page1.html', 'p': 1, 'qdata': {'depth': 1}}])
>>> slot.flush()

To delete the slot ``example.com`` from the frontier::
To retrieve all requests for a given slot::

>>> frontier.delete_slot('test', 'example.com')
>>> reqs = slot.q.iter()

To retrieve requests for a given slot::
To retrieve all fingerprints for a given slot::

>>> reqs = frontier.read('test', 'example.com')
>>> fps = slot.f.iter()

To list all the requests use ``list()`` method (similar for ``fingerprints``)::

>>> fps = slot.q.list()

To delete a batch of requests::

>>> frontier.delete('test', 'example.com', '00013967d8af7b0001')
>>> slot.q.delete('00013967d8af7b0001')

To retrieve fingerprints for a given slot::
To delete the whole slot from the frontier::

>>> fps = [req['requests'] for req in frontier.read('test', 'example.com')]
>>> slot.delete()

Flush data of the given frontier::

>>> frontier.flush()

Flush data of all frontiers of a project::

>>> frontiers.flush()

Close batch writers of all frontiers of a project::

>>> frontiers.close()

Frontier is available on project level only.
Frontiers are available on project level only.

Tags
----
Expand Down
Loading