Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Zaragoza spider #510

Merged
merged 3 commits into from
Oct 6, 2020
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
51 changes: 51 additions & 0 deletions kingfisher_scrapy/spiders/spain_zaragoza.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import json

import scrapy

from kingfisher_scrapy.base_spider import SimpleSpider
from kingfisher_scrapy.util import components, handle_http_error


class SpainZaragoza(SimpleSpider):
"""
Swagger API documentation
https://www.zaragoza.es/docs-api_sede/
Spider arguments
sample
Downloads the first release returned by the API release endpoint.
from_date
Download only data from this date onward (YYYY-MM-DDTHH:mm:ss format).
If ``until_date`` is provided, defaults to '2000-01-01T00:00:00'.
until_date
Download only data until this date (YYYY-MM-DDTHH:mm:ss format).
If ``from_date`` is provided, defaults to today.
"""
name = 'spain_zaragoza'
data_type = 'release_list'
date_format = 'datetime'
default_from_date = '2000-01-01T00:00:00'
url = 'https://www.zaragoza.es/sede/servicio/contratacion-publica/ocds/contracting-process/'

def start_requests(self):
# row parameter setting to 100000 to get all releases
url = self.url + '?rf=html&rows=100000'

# check date parameters and set "yyyy-MM-dd'T'HH:mm:ss'Z'" format
if self.from_date and self.until_date:
after = self.until_date.strftime("%Y-%m-%dT%H:%M:%SZ")
before = self.from_date.strftime("%Y-%m-%dT%H:%M:%SZ")
url = url + '&before={}&after={}'.format(before, after)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no debería ser al revés? before until_date and after from_date?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Si, debe ser al revés, pero creo que en el servicio esta así, after=2020-10-05 y before=2000-01-01 con datos, y after=2000-01-01 y before=2020-10-05 sin datos.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment in the code, saying the before and after query string parameters behave opposite expectations? Otherwise, I'm sure a future reader will again ask the question.


yield scrapy.Request(url, meta={'file_name': 'list.json'}, callback=self.parse_list)

@handle_http_error
def parse_list(self, response):
ids = json.loads(response.text)
for contracting_process_id in ids:

# A JSON array of ids strings
Copy link
Member

@jpmckinney jpmckinney Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any JSON arrays of strings. Maybe delete the comment?

url = self.url + contracting_process_id.get('id')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why get and not []? If the id key isn't set, then get returns None, and str + None is a more confusing error (TypeError) than if we used [] (KeyError).

yield self.build_request(url, formatter=components(-1))

if self.sample:
return