A datapackage-pipelines processor to validate tabular resources using goodtables.
# clone the repo and install it with pip
git clone https://github.com/frictionlessdata/datapackage-pipelines-goodtables.git
pip install -e .
Add the following to the pipeline-spec.yml configuration to validate each resource in the datapackage. A report is outputted to the logger.
...
- run: goodtables.validate
parameters:
fail_on_error: True,
reports_path: 'path/to/datapackage/reports', # where reports will be written
datapackage_reports_path: 'reports', # relative to datapackage.json
write_report: True,
goodtables:
<key>: <value> # options passed to goodtables.validate()
fail_on_error
: An optional boolean to determine whether the pipeline should fail on validation error (defaultTrue
).reports_path
: An optional string to define where Goodtables reports should be written (default isreports
).datapackage_reports_path
: An optional string to define the path to the report, relative to the datapackage.json (see note below).write_report
: An optional boolean to determine whether a goodtables validation report should be written toreports_path
(default isTrue
).goodtables
: An optional object passed togoodtables.validate()
to customise its behaviour. Seegoodtables.validate()
for available options.
If reports are written, and datapackage_reports_path
is defined, a reports
property will be added to the datapackage, detailing the path to the report for each resource:
...
"reports": [
{
"resource": "my-resource",
"reportType": "goodtables",
"path": "path/to/my-resource.json"
}
]
It is recommended that datapackage_reports_path
is used to define a relative path, from the datapackage.json file, that represents where the report was written. datapackage_reports_path
does not define where the reports will be written, but helps ensure a correct path is defined in the reports
property in datapackage.json. This is useful when the pipeline concludes with a dump_to.path
processor.