Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a generic IndexSpider class #337

Closed
yolile opened this issue Mar 24, 2020 · 2 comments
Closed

Add a generic IndexSpider class #337

yolile opened this issue Mar 24, 2020 · 2 comments
Assignees
Labels
existing spider framework-spiders Relating to common spider functionality

Comments

@yolile
Copy link
Member

yolile commented Mar 24, 2020

The Honduras API returns a json like:

  "releases": 991869,
  "pages": 99187,
  "page": "1",
  "next": "http://www.contratacionesabiertas.gob.hn/api/v1/record/?format=json&page=2",
  "previous": null,
  "recordPackage": {...}

We can use the pages field to build the next url in a loop and make the request in parallel

@jpmckinney
Copy link
Member

jpmckinney commented Jun 1, 2020

Let's also check the other spiders that inherit from LinksSpider, in case any return the total count of pages/results.

In doing so, we can create a new superclass for all spiders that can generate requests based on a total count (maybe name it IndexSpider). These include at least:

  • canada_montreal
  • chile_base (maybe – pagination is done in the URL path instead of the query string)
  • kenya_makueni
  • mexico_administracion_publica_federal
  • mexico_quien_es_quien
  • uganda_releases
  • uk_contracts_finder

We can add a note to either the docstring or writing-spiders.rst to say that we prefer using IndexSpider instead of LinksSpider where possible, since generating all requests at once will lead to a faster collection.

@jpmckinney jpmckinney added the framework-spiders Relating to common spider functionality label Jun 1, 2020
@jpmckinney jpmckinney changed the title Update honduras_portal scrapers to allow parallel downloads Add a generic IndexSpider class Jun 2, 2020
@romifz romifz self-assigned this Aug 31, 2020
@romifz romifz mentioned this issue Sep 17, 2020
@romifz
Copy link
Contributor

romifz commented Sep 21, 2020

Closed by #497

@romifz romifz closed this as completed Sep 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
existing spider framework-spiders Relating to common spider functionality
Projects
None yet
Development

No branches or pull requests

3 participants