Skip to content

Commit

Permalink
spiders(united_kingdom_contracts_finder): download record packages in…
Browse files Browse the repository at this point in the history
…stead of release packages
  • Loading branch information
Ravf95 committed Apr 5, 2022
1 parent 37fff6b commit 851831f
Show file tree
Hide file tree
Showing 6 changed files with 286 additions and 3 deletions.
14 changes: 14 additions & 0 deletions .idea/inspectionProfiles/Project_Default.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

250 changes: 250 additions & 0 deletions .idea/workspace.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 6 additions & 3 deletions kingfisher_scrapy/spiders/united_kingdom_contracts_finder.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,9 @@ class UnitedKingdomContractsFinder(IndexSpider):

# BaseSpider
ocds_version = '1.0' # uses deprecated fields
root_path = 'results.item'

# SimpleSpider
data_type = 'release_package'
data_type = 'record_package'

# IndexSpider
total_pages_pointer = '/maxPage'
Expand All @@ -30,7 +29,11 @@ def start_requests(self):

def parse(self, response, **kwargs):
if self.is_http_success(response):
yield from super().parse(response)
for result in response.json()['results']:
for release in result['releases']:
ocid = release["ocid"]
url = f'https://www.contractsfinder.service.gov.uk/Published/OCDS/Record/{ocid}'
yield scrapy.Request(url, meta={'file_name': f'{ocid}.json'}, callback=super().parse)
else:
request = response.request.copy()
wait_time = int(response.headers.get('Retry-After', 1))
Expand Down

0 comments on commit 851831f

Please sign in to comment.