Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Data Resource JSON schema definition for `schema` seems to contradict specification #645
Caution: there's a lot of different uses of "schema" coming, so bear with me...
In the JSON schema definition file for a resource, data-resource.json, the formal definition for a
This allows only JSON-objects as
This seemingly contradicts the human-readable specification of a
I believe the section should read:
I am no specialist for JSON schema, but if I change it, the Java implementation works with
How to reproduce
Reproduce the buggy behavior
Result: validation will fail with the following error:
Reproduce the proposed fix
Result: no validation error will be shown.
Fix to YAML source
Created a pull request, please check and if working, merge
The idea behind this decision is that this JSON schema has to be used after:
For example in Python:
Because of these preparational steps, there is a guarantee that
It works fine for dynamic languages but probably creates problems for static languages like Java (I'm not sure). If it does I would recommend adding
In that scenario, obviously we'll only have objects there. I don't quite understand the stipulation that the schema is to be applied after those steps, though. IMO, this kind of undermines the rigidity of having a JSON schema in the first place, as general-purpose schema validators like https://www.jsonschemavalidator.net/ will mark perfectly valid DataPackages as invalid.
I'd always validate first and after that, de-reference the dependencies as part of the parsing step. Is it because a validator might not load the resources, either because they are local to the file system or will not want to load any old URL, and therefore isn't able to fully validate the schema?
If that's the rationale, there should be an explanation (or maybe I just missed it). The Java code I took over does not do resource de-referencing before validation, but will first validate the DataPackge descriptor JSON and then lazy validate each resource if/when it is queried - which is how I fell over the issue mentioned above. It's of course perfectly possible in Java to first resolve the URL's/file descriptors and only then go for validation, but the behavior of the Java implementation seems more correct to me.
I think the point was to validate a schema also because it can turn a data package to be invalid after dereferencing. I agree this area is not handled very well in the specs and probably some kind of composite approach is better when e.g. validation happens on different levels (package/resource/schema) although the current approach seems the easiest to do the trick.