Support --continue to validate all JSONL entries beyond the first error#727
Support --continue to validate all JSONL entries beyond the first error#727
--continue to validate all JSONL entries beyond the first error#727Conversation
…rror Fixes: #726 Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
🤖 Augment PR SummarySummary: This PR adds a new Changes:
Technical Notes: Exit code behavior remains consistent (expected-failure exit code when any entry fails), while JSONL reporting becomes "fail-fast" only when 🤖 Was this summary useful? React with 👍 or 👎 |
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
| jsonschema validate path/to/my/schema.json path/to/my/dataset.jsonl | ||
| ``` | ||
|
|
||
| ### Validate a JSONL dataset reporting all failures |
There was a problem hiding this comment.
Does it report all failures or for each failing record the first validation error?
There was a problem hiding this comment.
I'll clarify. Only first failure. JSON Schema, as per the standard, doesn't really describe how to proceed past first validation error. Maybe could be done, but might be very complex in certain cases. Definitely a longer track of work
There was a problem hiding this comment.
If I remember correctly ajv has this feature at least as option --all-errors. The main reason we are experimenting with jsonschema is the support for jsonl.
There was a problem hiding this comment.
Yeah, sadly AJV is one of the worst ones in terms of JSON Schema standard compliance we know of (and pretty much abandoned at this point). See https://bowtie.report for the official ranking we maintain.
I'll take a note of it. In theory it is possible to keep going even despite of errors, but I think we would need to be careful with how we present errors and not spit out non-sensical stack traces. We'll see!
The main reason we are experimenting with jsonschema is the support for jsonl.
Can you share more about the use case, out of curiosity?
There was a problem hiding this comment.
Sure, I work for hbz which provides digital infrastructure for scientific libraries in the state of North Rhine-Westphalia, Germany.
One of our services is a search index of the union catalogue of the libraries called lobid-resources: https://lobid.org/resources
The index data is created by an ETL transformation from a library data format MARC21 to JSON-LD.
We also created a JSON Schema for our index data: https://github.com/hbz/lobid-resources/tree/master/src/test/resources/schemas
We already validate our single test files around 200 with ajv. But we also want to test larger portions of our index e.g. current updates (serveral thousand records up to hundret thousand) or the whole index (22 Mio.) as jsonl dump files. The search for a validator that supports jsonl lead us to your project jsonschema.
So far our tests with the update files look promising. Maybe a support for compressed jsonl files would be nice.
There was a problem hiding this comment.
The support for reporting all errors would also be nice, since it would help us to improve our transformation with regard to all errors. Our update files change daily and if a record has multiple errors we would spot the next error only if the record gets an update again.
There was a problem hiding this comment.
On hbz, very nice! Let me know how we can help. You might find the schema linter interesting, and https://one.sourcemeta.com is a nice way to visualise and serve JSON Schema data models. Overall, we are trying hard to produce a next-level JSON Schema ecosystem, so any feedback you have is very appreciated.
Maybe a support for compressed jsonl files would be nice.
That's probably not very hard to implement. Can you submit an issue? What kind of compression are you using?
We already validate our single test files around 200 with ajv.
In general, we from the JSON Schema TSC advise against AJV given its compliance issues. We offer a test command in this project to setup JSON Schema unit tests. Would that be a match here?
The support for reporting all errors would also be nice, since it would help us to improve our transformation with regard to all errors.
Right. Makes sense. Let me think more about it!
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
Fixes: #726
Signed-off-by: Juan Cruz Viotti jv@jviotti.com