Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use JSON Schema to validate items #403

Closed
jpmckinney opened this issue Jun 1, 2020 · 0 comments · Fixed by #420
Closed

Use JSON Schema to validate items #403

jpmckinney opened this issue Jun 1, 2020 · 0 comments · Fixed by #420
Assignees
Labels
framework-items Relating to how we process items

Comments

@jpmckinney
Copy link
Member

Right now there's a simple validate method to check for required fields: https://github.com/open-contracting/kingfisher-collect/blob/master/kingfisher_scrapy/items.py

We can instead write a JSON Schema to validate the data. We'll need the jsonschema and rfc3987 packages, and we can then do:

from jsonschema import FormatChecker
from jsonschema.validators import Draft4Validator as validator

# Initialize the schema once, in the item pipeline's `__init__` method
with open('path/to/schema.json') as f:
    schema = json.load(f)

self.validator = validator(schema, format_checker=FormatChecker())

# Then, to validate:
validator.validate(item)

The JSON Schema can use:

  • An enum for data_type
  • "format": "uri" for url
  • "pattern": "^[^/]$" for file_name (no path separators)
  • "minLength": 1 for required strings
  • "minimum": 1 for number
  • appropriate "type" for all
@jpmckinney jpmckinney added the framework-items Relating to how we process items label Jun 1, 2020
@jpmckinney jpmckinney added this to To do in CDS 2020-05/2021-02 Jun 1, 2020
@yolile yolile moved this from To do to Priority [12 max] in CDS 2020-05/2021-02 Jun 2, 2020
@yolile yolile self-assigned this Jun 18, 2020
@yolile yolile moved this from Priority [12 max] to In progress [6 max] in CDS 2020-05/2021-02 Jun 22, 2020
CDS 2020-05/2021-02 automation moved this from In progress [6 max] to Done Jun 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework-items Relating to how we process items
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants