From 48291c2acfb8901ce1e4d5c4fd17aeada11a0e8a Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Mon, 17 Nov 2025 11:04:29 +0100 Subject: [PATCH 1/4] docs: :memo: revise guide docs to use `example_*` (plus minor edits) --- docs/guide/check.qmd | 49 +++++-------- docs/guide/config.qmd | 157 +++++++++++++++++------------------------- 2 files changed, 83 insertions(+), 123 deletions(-) diff --git a/docs/guide/check.qmd b/docs/guide/check.qmd index c01388e8..6b6eeedb 100644 --- a/docs/guide/check.qmd +++ b/docs/guide/check.qmd @@ -10,12 +10,12 @@ metadata---stored in its `datapackage.json` file---complies with the the available properties at each level of the `datapackage.json`, which ones are required, and what values are allowed. -This guide shows you how to use the main function `check()` to run these -checks. Each section walks you through a different part of `check()`, -starting with its basic usage with the `properties` argument, -introducing the default checks, how to configure which checks you want -to run with the `config` argument, and how to handle failed checks with -the `error` argument. +This guide shows you how to use the main function +[`check()`](/docs/reference/check.qmd) to run these checks. Each section +walks you through a different part of `check()`, starting with its basic +usage with the `properties` argument, introducing the default checks, +how to configure which checks you want to run with the `config` +argument, and how to handle failed checks with the `error` argument. ::: callout-tip For the full reference of the `check()` function, see the [reference @@ -42,33 +42,21 @@ section below. Let's look at an example. The code below defines a `package_properties` dictionary that includes all the required properties in a correct -format. When we call `check()` on these properties, it returns an empty -list: +format. The example looks like this (from the +[`example_package_properties()`](/docs/reference/example_package_properties.qmd): ```{python} import check_datapackage as cdp +import pprint -package_properties = { - "name": "woolly-dormice", - "title": "Hibernation Physiology of the Woolly Dormouse: A Scoping Review.", - "description": """ - This scoping review explores the hibernation physiology of the - woolly dormouse, drawing on data collected over a 10-year period - along the Taurus Mountain range in Turkey. - """, - "id": "123-abc-123", - "created": "2014-05-14T05:00:01+00:00", - "version": "1.0.0", - "licenses": [{"name": "odc-pddl"}], - "resources": [ - { - "name": "woolly-dormice-2015", - "title": "Body fat percentage in the hibernating woolly dormouse", - "path": "resources/woolly-dormice-2015/data.parquet", - } - ], -} +package_properties = cdp.example_package_properties() +pprint.pp(package_properties) +``` + +When we call `check()` on these properties, it returns an empty list: + +```{python} cdp.check(properties=package_properties) ``` @@ -89,8 +77,9 @@ and one for the `name` field of the wrong type. By default, `check()` runs the standard checks defined as `MUST`s in the Data Package standard. These include checking that all required properties are present and that their values have the correct types and -formats. This happens through a default `Config` object passed to the -`config` argument of `check()`. +formats. This happens through a default +[`Config`](/docs/reference/Config.qmd) object passed to the `config` +argument of `check()`. If you want to configure which checks are performed, you can provide your own `Config` object in `check()`. With this object you can exclude diff --git a/docs/guide/config.qmd b/docs/guide/config.qmd index 490700b1..0db20425 100644 --- a/docs/guide/config.qmd +++ b/docs/guide/config.qmd @@ -4,9 +4,10 @@ jupyter: python3 order: 3 --- -You can pass a `Config` object to `check()` to customise the checks done -on your Data Package's properties. The following configuration options -are available: +You can pass a [`Config`](/docs/reference/Config.qmd) object to +[`check()`](/docs/reference/check.qmd) to customise the checks done on +your Data Package's properties. The following configuration options are +available: - `version`: The version of Data Package standard to check against. Defaults to `v2`. @@ -38,18 +39,22 @@ the `required` check by defining an `Exclusion` object with this `type`: ```{python} from textwrap import dedent +import pprint import check_datapackage as cdp exclusion_required = cdp.Exclusion(type="required") +exclusion_required ``` To exclude checks of a specific field or fields, you can use a [JSON path](https://en.wikipedia.org/wiki/JSONPath) in the `jsonpath` -attribute of an `Exclusion` object. For example, you can exclude all -checks on the `name` field of the Data Package properties by writing: +attribute of an [`Exclusion`](/docs/reference/Exclusion.qmd) object. For +example, you can exclude all checks on the `name` field of the Data +Package properties by writing: ```{python} exclusion_name = cdp.Exclusion(jsonpath="$.name") +exclusion_name ``` Or you can use the wildcard JSON path selector to exclude checks on the @@ -57,54 +62,59 @@ Or you can use the wildcard JSON path selector to exclude checks on the ```{python} exclusion_path = cdp.Exclusion(jsonpath="$.resources[*].path") +exclusion_path ``` -The `type` and `jsonpath` arguments can also be combined: +The `type` and `jsonpath` arguments can also be combined, so we can +ignore [`Issue`](/docs/reference/Issue.qmd) from a specific type on a +specific field. For example, to exclude checks that the `created` field +be in a date format (`type="format"`), we can use: ```{python} -exclusion_desc_required = cdp.Exclusion(type="required", jsonpath="$.resources[*].description") +exclusion_created_format = cdp.Exclusion(type="format", jsonpath="$.created") +exclusion_created_format ``` -This will exclude required checks on the `description` field of Data -Resource properties. - To apply your exclusions when running the `check()`, you add them to the -`Config` object passed to the `check()` function: +`Config` object passed to the `check()` function. First, let's make an +example that has three `Issue` items: the package `name` is a number, +the `created` field is not a date, and the resource `path` doesn't point +to a data file (isn't a real path). So we'll modify our example +`package_properties` from +[`example_package_properties()`](/docs/reference/example_package_properties.qmd) +to make these Issues appear: ```{python} -package_properties = { - "name": 123, - "title": "Hibernation Physiology of the Woolly Dormouse: A Scoping Review.", - "id": "123-abc-123", - "created": "2014-05-14T05:00:01+00:00", - "version": "1.0.0", - "licenses": [{"name": "odc-pddl"}], - "resources": [ - { - "name": "woolly-dormice-2015", - "title": "Body fat percentage in the hibernating woolly dormouse", - "path": "https://en.wikipedia.org/wiki/Woolly_dormouse", - } - ], -} - -config = cdp.Config(exclusions=[exclusion_required, exclusion_name, exclusion_path]) -cdp.check(properties=package_properties, config=config) +package_properties = cdp.example_package_properties() +package_properties["name"] = 123 +package_properties["created"] = "not-a-date" +package_properties["resources"][0]["path"] = "\\not/a/path" +pprint.pp(package_properties) ``` -In the example above, we would expect four `Issue` items: the package -`name` is a number, the required `description` field is missing in both -the package and resource properties, and the resource `path` doesn't -point to a data file. However, as we have defined exclusions for all of -these, the function will flag no issues. +When we run `check()` on these properties, we get the four issues: + +```{python} +cdp.check(properties=package_properties) +``` + +Now let's exclude these Issues so that `check()` finds no issues by +adding our exclusions to a `Config` object and giving it to `check()`: + +```{python} +config = cdp.Config(exclusions=[exclusion_name, exclusion_path, exclusion_created_format]) +cdp.check(properties=package_properties, config=config) +``` ## Adding extensions It is possible to add checks in addition to the ones defined in the Data Package standard. We call these additional checks *extensions*. There -are currently two types of extensions supported: `CustomCheck` and -`RequiredCheck`. You can add as many `CustomCheck`s and `RequiredCheck`s -to your `Config` as you want to fit your needs. +are currently two types of extensions supported: +[`CustomCheck`](/docs/reference/CustomCheck.qmd) and +[`RequiredCheck`](/docs/reference/RequiredCheck.qmd). You can add as +many `CustomCheck`s and `RequiredCheck`s to your `Config` as you want to +fit your needs. ### Custom checks @@ -123,39 +133,16 @@ license_check = cdp.CustomCheck( ) ``` -For more details on what each parameter means, see the -[`CustomCheck`](/docs/reference/custom_check.qmd) documentation. -Specific to this example, the `type` is setting the identifier of the -check to `only-mit` and the `jsonpath` is indicating to only check the -`name` property of each license in the `licenses` property of the Data -Package. +For more details on what each parameter means, see the `CustomCheck` +documentation. Specific to this example, the `type` is setting the +identifier of the check to `only-mit` and the `jsonpath` is indicating +to only check the `name` property of each license in the `licenses` +property of the Data Package. To register your custom checks with the `check()` function, you add them to the `Config` object passed to the function: ```{python} -#| eval: false -package_properties = { - "name": "woolly-dormice", - "title": "Hibernation Physiology of the Woolly Dormouse: A Scoping Review.", - "description": dedent(""" - This scoping review explores the hibernation physiology of the - woolly dormouse, drawing on data collected over a 10-year period - along the Taurus Mountain range in Turkey. - """), - "id": "123-abc-123", - "created": "2014-05-14T05:00:01+00:00", - "version": "1.0.0", - "licenses": [{"name": "odc-pddl"}, {"name": "mit"}], - "resources": [ - { - "name": "woolly-dormice-2015", - "title": "Body fat percentage in the hibernating woolly dormouse", - "path": "resources/woolly-dormice-2015/data.parquet", - } - ], -} - config = cdp.Config(extensions=cdp.Extensions(custom_checks=[license_check])) cdp.check(properties=package_properties, config=config) ``` @@ -172,7 +159,6 @@ with a `RequiredCheck`. For example, if you want to make the `RequiredCheck` like this: ```{python} -#| eval: false description_required = cdp.RequiredCheck( jsonpath="$.description", message="The 'description' field is required in the Data Package properties.", @@ -183,10 +169,13 @@ See the [`RequiredCheck`](/docs/reference/required_check.qmd) documentation for more details on its parameters. To apply this `RequiredCheck`, it should be added to the `Config` object -passed to `check()` like shown below: +passed to `check()` like shown below. We'll create a +`package_properties` without a `description` field to see the effect of +this check: ```{python} -#| eval: false +package_properties = cdp.example_package_properties() +del package_properties["description"] config = cdp.Config(extensions=cdp.Extensions(required_checks=[description_required])) cdp.check(properties=package_properties, config=config) ``` @@ -196,33 +185,15 @@ cdp.check(properties=package_properties, config=config) The Data Package standard includes properties that "MUST" and "SHOULD" be included and/or have a specific format in a compliant Data Package. By default, `check()` only the `check()` function only includes "MUST" -checks. To include "SHOULD" checks, set the `strict` argument to `True`. +checks. To include "SHOULD" checks, set the `strict` argument to `True` +in the `Config` object. + For example, the `name` field of a Data Package "SHOULD" not contain special characters. So running `check()` in strict mode (`strict=True`) -on the following properties would output an issue. +on the following properties would output an Issue: ```{python} -#| eval: false -package_properties = { - "name": "Woolly Dormice (Toros Dağları)", - "title": "Hibernation Physiology of the Woolly Dormouse: A Scoping Review.", - "description": dedent(""" - This scoping review explores the hibernation physiology of the - woolly dormouse, drawing on data collected over a 10-year period - along the Taurus Mountain range in Turkey. - """), - "id": "123-abc-123", - "created": "2014-05-14T05:00:01+00:00", - "version": "1.0.0", - "licenses": [{"name": "odc-pddl"}], - "resources": [ - { - "name": "woolly-dormice-2015", - "title": "Body fat percentage in the hibernating woolly dormouse", - "path": "resources/woolly-dormice-2015/data.parquet", - } - ], -} - -cdp.check(properties=package_properties, strict=True) +package_properties = cdp.example_package_properties() +package_properties["name"] = "data-package!@#" +cdp.check(properties=package_properties, config=cdp.Config(strict=True)) ``` From fb7f18ed8a4f2e67d8658e14ab9a4ad25e378c87 Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Thu, 20 Nov 2025 09:45:37 +0100 Subject: [PATCH 2/4] docs: :pencil2: some minor edits for clarity MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: martonvago <57952344+martonvago@users.noreply.github.com> Co-authored-by: Signe Kirk Brødbæk --- docs/guide/config.qmd | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/guide/config.qmd b/docs/guide/config.qmd index c9a6b769..4f0e4d0d 100644 --- a/docs/guide/config.qmd +++ b/docs/guide/config.qmd @@ -67,9 +67,9 @@ exclusion_path ``` The `type` and `jsonpath` arguments can also be combined, so we can -ignore [`Issue`](/docs/reference/Issue.qmd) from a specific type on a -specific field. For example, to exclude checks that the `created` field -be in a date format (`type="format"`), we can use: +ignore an [`Issue`](/docs/reference/Issue.qmd) of a specific type on a +specific field. For example, to exclude checks of whether the `created` field +is in a specific format (`type="format"`), we can use: ```{python} exclusion_created_format = cdp.Exclusion(type="format", jsonpath="$.created") @@ -93,13 +93,13 @@ package_properties["resources"][0]["path"] = "\\not/a/path" pprint.pp(package_properties) ``` -When we run `check()` on these properties, we get the four issues: +When we run `check()` on these properties, we get the three expected issues: ```{python} cdp.check(properties=package_properties) ``` -Now let's exclude these Issues so that `check()` finds no issues by +Now let's exclude these `Issue`s so that `check()` finds no issues by adding our exclusions to a `Config` object and giving it to `check()`: ```{python} @@ -191,7 +191,7 @@ in the `Config` object. For example, the `name` field of a Data Package "SHOULD" not contain special characters. So running `check()` in strict mode (`strict=True`) -on the following properties would output an Issue: +on the following properties would output an `Issue`: ```{python} package_properties = cdp.example_package_properties() From f305802445c3f0641a5e4a5e41f7aa4342a7838a Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Thu, 20 Nov 2025 09:46:05 +0100 Subject: [PATCH 3/4] docs: :pencil2: remove duplicate text Co-authored-by: martonvago <57952344+martonvago@users.noreply.github.com> --- docs/guide/config.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guide/config.qmd b/docs/guide/config.qmd index 4f0e4d0d..316ed29b 100644 --- a/docs/guide/config.qmd +++ b/docs/guide/config.qmd @@ -185,7 +185,7 @@ cdp.check(properties=package_properties, config=config) The Data Package standard includes properties that "MUST" and "SHOULD" be included and/or have a specific format in a compliant Data Package. -By default, `check()` only the `check()` function only includes "MUST" +By default, `check()` only includes "MUST" checks. To include "SHOULD" checks, set the `strict` argument to `True` in the `Config` object. From ff740492ad80979871654f3a40f7a8d23c3e6352 Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Thu, 20 Nov 2025 09:48:18 +0100 Subject: [PATCH 4/4] docs: :pencil2: one issue, not two --- docs/guide/check.qmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guide/check.qmd b/docs/guide/check.qmd index 78ea25fb..f7ec4747 100644 --- a/docs/guide/check.qmd +++ b/docs/guide/check.qmd @@ -70,8 +70,8 @@ package_properties["name"] = 123 cdp.check(properties=package_properties) ``` -The output now lists two issues: one for the missing `description` field -and one for the `name` field of the wrong type. +The output now lists one `Issue` for the `name` field being of the +wrong type. ## Default checks and configuration (`config`)