Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more checks & extend validation. Support vocabulary configuration. #131

Merged
merged 13 commits into from
Jul 19, 2023

Conversation

dalito
Copy link
Member

@dalito dalito commented Jul 17, 2023

This PR adds several checks that are run when converting vocabularies. Validation is now also much stricter (#130). Some code could be removed by making better use of pydantic's features, e.g. the conversion from CURIE to URI (prefix expansion) happens now as part of the model validation.

A config-module to share a common configuration across all modules of voc4cat had to be added first. Sharing is done by importing config.py where needed. config.py reads the config-file (idranges.toml) upon import. The config is also represented as pydantic model and validated. For thorough validation, the PR adds pydantic model fields for ORCID and ROR.

The content of id_ranges.toml was extended and the format was adjusted. The file format should be more stable from now on.

General checks that do not need a config:

  • For all IRIs the pydantic model requires now URLs instead of str. Values that are not URLs will be rejected.

Checks that require a config (idranges.toml):

  • Verify that the the processed vocabulary has at least one id_range defined.
  • Check that all concept/collection IRIs use the location set in the config (permanent_iri_part).
  • Check if the ID part of the IRI is allowed for the actor. The first entry in provenance is used as relevant actor. For this actor the allowed ID range(s) are read from the config and compared to the used ID.

Checks for running on PRs in gh-actions (also require a config):

Theses checks are useful to maintain SKOS-vocabularies with NFDI4Cat/voc4cat-template.

  • If the config sets single_vocab = true, it is checked that the inbox has not more than one file.
  • Check that file names in "inbox" (for xlsx) and "vocabularies" (for SKOS/turtle) directories are consistent with the options in the configuration. This avoids that PRs put files in these directories that should not be there.
  • Check if concepts or collections were removed between two turtle files. This is to find out if PRs delete stuff. Depending on the configuration this will be only logged or interpreted as failure/error (allow_delete parameter).

Still to do:

  • Add tests for new config module
  • Add tests for new checks module
  • Add test for extended model validation

Closes #116

setup_logging did not correctly initialize a basic config.
This also changes the structure of the config. All vocabularies
are stored under a common namespace "vocabs".
Tests are not yet included. test_checks.py is only a template.

Other changes:
- add type annotation for class vars in fields.py
Some validations use parameter from idranges.toml if present.

Other changes
- renamed home_vocab_uri to source_vocab (to match xlsx template)
- add vocab_name to parameter list of extract-functions to enable
   validations that need to lookup vocabulary-specific options in config
@dalito dalito changed the title Extend checks: Make use of information in config () Extend checks: Make use of information in config (idranges.toml) Jul 17, 2023
@dalito dalito changed the title Extend checks: Make use of information in config (idranges.toml) Add more checks, extend validation & support for config Jul 19, 2023
@dalito dalito changed the title Add more checks, extend validation & support for config Add more checks & extend validation. Support vocabulary configuration. Jul 19, 2023
@dalito dalito merged commit 955f4c5 into main Jul 19, 2023
3 checks passed
@dalito dalito deleted the issue116 branch July 19, 2023 09:59
@dalito dalito added this to the 0.5.0 milestone Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add checks that make use of information in idranges.toml
1 participant