Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ quartodoc:
- name: check
- name: read_json
- name: Config
- name: CustomCheck
- name: Exclude
- name: Rule
- name: Issue
- name: example_package_descriptor
- name: example_resource_descriptor
Expand Down
10 changes: 5 additions & 5 deletions docs/design/interface.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -132,11 +132,11 @@ A subitem of `Config` for expressing checks to ignore.

See the help documentation with `help(Exclude)` for more details.

#### {{< var wip >}} `Rule`
#### {{< var wip >}} `CustomCheck`

Expresses a custom check.

See the help documentation with `help(Rule)` for more details.
See the help documentation with `help(CustomCheck)` for more details.

### {{< var wip >}} `Issue`

Expand All @@ -163,7 +163,7 @@ flowchart TD
read_config["read_config()"]

config[/Config/]
rules[/Rules/]
custom_check[/CustomCheck/]
exclude[/Exclude/]
check["check()"]
issues[/"list[Issue]"/]
Expand All @@ -173,8 +173,8 @@ flowchart TD

descriptor_file --> read_json --> descriptor
config_file --> read_config --> config
rules & exclude --> config
rules & exclude -.-> config_file
custom_check & exclude --> config
custom_check & exclude -.-> config_file

descriptor & config --> check --> issues --> explain --> messages
```
48 changes: 24 additions & 24 deletions docs/guide/config.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ on the descriptor. The following configuration options are available:
- `version`: The version of Data Package standard to check against.
Defaults to `v2`.
- `exclude`: The list of checks to exclude.
- `rules`: The list of custom checks to run in addition to the checks
defined in the standard.
- `custom_checks`: The list of custom checks to run in addition to the
checks defined in the standard.
- `strict`: Whether to run recommended checks in addition to required
ones. Defaults to `False`.

Expand Down Expand Up @@ -90,16 +90,16 @@ the package and resource properties, and the resource `path` doesn't
point to a data file. However, as we have defined exclusions for all of
these, the function will flag no issues.

## Adding custom check rules
## Adding custom checks

It is possible to create custom rules in addition to the ones defined in
the Data Package standard.
It is possible to create custom checks in addition to the ones defined
in the Data Package standard.

Let's say your organisation only accepts Data Packages licensed under
MIT. You can express this requirement in a `Rule` as follows:
MIT. You can express this requirement in a `CustomCheck` as follows:

```{python}
license_rule = cdp.Rule(
license_check = cdp.CustomCheck(
type="only-mit",
jsonpath="$.licenses[*].name",
message=dedent("""
Expand All @@ -112,19 +112,19 @@ license_rule = cdp.Rule(

Here's a breakdown of what each argument does:

- `type`: An identifier for your rule. This is what will show up in
error messages and what you will use if you want to exclude your
rule. Each `Rule` should have a unique `type`.
- `jsonpath`: The location of the field or fields, expressed in [JSON
path](https://en.wikipedia.org/wiki/JSONPath) notation, to which the
rule applies. This rule applies to the `name` field of all package
licenses.
- `message`: The message that is shown when the rule is violated.
- `check`: A function that expresses how compliance with the rule is
checked. It takes the value at the `jsonpath` location as input and
returns true if the rule is met, false if it isn't.

To register your custom rules with the `check()` function, you add them
- `type`: An identifier for your custom check. This is what will show
up in error messages and what you will use if you want to exclude
your check. Each `CustomCheck` should have a unique `type`.
- `jsonpath`: The location of the field or fields the custom check
applies to, expressed in [JSON
path](https://en.wikipedia.org/wiki/JSONPath) notation. This check
applies to the `name` field of all package licenses.
- `message`: The message that is shown when the check is violated.
- `check`: A function that expresses the custom check. It takes the
value at the `jsonpath` location as input and returns true if the
check is met, false if it isn't.

To register your custom checks with the `check()` function, you add them
to the `Config` object passed to the function:

```{python}
Expand All @@ -149,18 +149,18 @@ package_descriptor = {
],
}

config = cdp.Config(rules=[license_rule])
config = cdp.Config(custom_checks=[license_check])
issues = cdp.check(descriptor=package_descriptor, config=config)
print(issues)
```

We can see that the custom rule was applied: `check()` returned one
We can see that the custom check was applied: `check()` returned one
issue flagging the first license attached to the Data Package.

## Strict mode

The Data Package standard has both required and recommended rules. By
default, `check()` checks only required rules. Recommended rules can be
The Data Package standard has both requirements and recommendations. By
default, `check()` only checks requirements. Recommendations can be
turned on by setting the `strict` argument to `True`. The example below
violates the recommendation that the package `name` should contain no
special characters.
Expand Down
4 changes: 2 additions & 2 deletions src/check_datapackage/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@

from .check import check
from .config import Config
from .custom_check import CustomCheck
from .examples import example_package_descriptor, example_resource_descriptor
from .exclude import Exclude
from .issue import Issue
from .read_json import read_json
from .rule import Rule

__all__ = [
"Config",
"Exclude",
"Issue",
"Rule",
"CustomCheck",
"example_package_descriptor",
"example_resource_descriptor",
"check",
Expand Down
8 changes: 4 additions & 4 deletions src/check_datapackage/check.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from check_datapackage.config import Config
from check_datapackage.constants import DATA_PACKAGE_SCHEMA_PATH, GROUP_ERRORS
from check_datapackage.custom_check import apply_custom_checks
from check_datapackage.exclude import exclude
from check_datapackage.internals import (
_add_package_recommendations,
Expand All @@ -17,7 +18,6 @@
)
from check_datapackage.issue import Issue
from check_datapackage.read_json import read_json
from check_datapackage.rule import apply_rules


def check(
Expand Down Expand Up @@ -46,7 +46,7 @@ class for more details, especially about the default values.
_add_resource_recommendations(schema)

issues = _check_object_against_json_schema(descriptor, schema)
issues += apply_rules(config.rules, descriptor)
issues += apply_custom_checks(config.custom_checks, descriptor)
issues = exclude(issues, config.exclude, descriptor)

return sorted(set(issues))
Expand Down Expand Up @@ -82,9 +82,9 @@ class SchemaError:
Attributes:
message (str): The error message generated by `jsonschema`.
type (str): The type of the error.
schema_path (str): The path to the violated rule in the JSON schema.
schema_path (str): The path to the violated check in the JSON schema.
Path components are separated by '/'.
jsonpath (str): The JSON path to the field that violates the rule.
jsonpath (str): The JSON path to the field that violates the check.
parent (Optional[SchemaError]): The error group the error belongs to, if any.
"""

Expand Down
12 changes: 6 additions & 6 deletions src/check_datapackage/config.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from dataclasses import dataclass, field
from typing import Literal

from check_datapackage.custom_check import CustomCheck
from check_datapackage.exclude import Exclude
from check_datapackage.rule import Rule


@dataclass
Expand All @@ -12,8 +12,8 @@ class Config:
Attributes:
exclude (list[Exclude]): Any issues matching any of these exclusions will be
ignored (i.e., removed from the output of the check function).
rules (list[Rule]): Custom checks listed here will be done in addition
to checks defined in the Data Package standard.
custom_checks (list[CustomCheck]): Custom checks listed here will be done in
addition to checks defined in the Data Package standard.
strict (bool): Whether to run recommended as well as required checks. If
True, recommended checks will also be run. Defaults to False.
version (str): The version of the Data Package standard to check against.
Expand All @@ -24,17 +24,17 @@ class Config:
import check_datapackage as cdp

exclude_required = cdp.Exclude(type="required")
license_rule = cdp.Rule(
license_check = cdp.CustomCheck(
type="only-mit",
jsonpath="$.licenses[*].name",
message="Data Packages may only be licensed under MIT.",
check=lambda license_name: license_name == "mit",
)
config = cdp.Config(exclude=[exclude_required], rules=[license_rule])
config = cdp.Config(exclude=[exclude_required], custom_checks=[license_check])
```
"""

exclude: list[Exclude] = field(default_factory=list)
rules: list[Rule] = field(default_factory=list)
custom_checks: list[CustomCheck] = field(default_factory=list)
strict: bool = False
version: Literal["v1", "v2"] = "v2"
92 changes: 92 additions & 0 deletions src/check_datapackage/custom_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
from dataclasses import dataclass
from typing import Any, Callable

from check_datapackage.internals import (
_filter,
_flat_map,
_get_fields_at_jsonpath,
_map,
)
from check_datapackage.issue import Issue


@dataclass
class CustomCheck:
"""A custom check to be done on a Data Package descriptor.

Attributes:
jsonpath (str): The location of the field or fields the custom check applies to,
expressed in [JSON path](https://jg-rp.github.io/python-jsonpath/syntax/)
notation (e.g., `$.resources[*].name`).
message (str): The message shown when the check is violated.
check (Callable[[Any], bool]): A function that expresses the custom check.
It takes the value at the `jsonpath` location as input and
returns true if the check is met, false if it isn't.
type (str): An identifier for the custom check. It will be shown in error
messages and can be used to exclude the check. Each custom check
should have a unique `type`.

Examples:
```{python}
import check_datapackage as cdp

license_check = cdp.CustomCheck(
type="only-mit",
jsonpath="$.licenses[*].name",
message="Data Packages may only be licensed under MIT.",
check=lambda license_name: license_name == "mit",
)
```
"""

jsonpath: str
message: str
check: Callable[[Any], bool]
type: str = "custom"


def apply_custom_checks(
custom_checks: list[CustomCheck], descriptor: dict[str, Any]
) -> list[Issue]:
"""Checks the descriptor for all custom checks and creates issues if any fail.

Args:
custom_checks: The custom checks to apply to the descriptor.
descriptor: The descriptor to check.

Returns:
A list of `Issue`s.
"""
return _flat_map(
custom_checks,
lambda custom_check: _apply_custom_check(custom_check, descriptor),
)


def _apply_custom_check(
custom_check: CustomCheck, descriptor: dict[str, Any]
) -> list[Issue]:
"""Applies the custom check to the descriptor.

If any fields fail the custom check, this function creates a list of issues
for those fields.

Args:
custom_check: The custom check to apply to the descriptor.
descriptor: The descriptor to check.

Returns:
A list of `Issue`s.
"""
matching_fields = _get_fields_at_jsonpath(custom_check.jsonpath, descriptor)
failed_fields = _filter(
matching_fields, lambda field: not custom_check.check(field.value)
)
return _map(
failed_fields,
lambda field: Issue(
jsonpath=field.jsonpath,
type=custom_check.type,
message=custom_check.message,
),
)
Loading