Skip to content

Support --continue to validate all JSONL entries beyond the first error#727

Merged
jviotti merged 4 commits intomainfrom
jsonl-continue
Apr 23, 2026
Merged

Support --continue to validate all JSONL entries beyond the first error#727
jviotti merged 4 commits intomainfrom
jsonl-continue

Conversation

@jviotti
Copy link
Copy Markdown
Member

@jviotti jviotti commented Apr 23, 2026

Fixes: #726
Signed-off-by: Juan Cruz Viotti jv@jviotti.com

jviotti added 2 commits April 23, 2026 09:41
…rror

Fixes: #726
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
@jviotti jviotti marked this pull request as ready for review April 23, 2026 13:50
@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Apr 23, 2026

🤖 Augment PR Summary

Summary: This PR adds a new --continue/-c flag to the jsonschema validate command to keep validating JSONL inputs after the first failing entry.

Changes:

  • Introduced a continue_on_error option and threaded it through the JSONL/streamed validation loop.
  • Adjusted multi-document (JSONL) control flow so failures no longer stop iteration when --continue is set.
  • Added spacing/separation in verbose and error outputs when reporting multiple JSONL failures.
  • Registered the new flag in the CLI option parser for the validate subcommand.
  • Updated validation documentation to describe the new behavior and provide an example.
  • Added new Unix shell tests covering continue behavior across normal, verbose, and JSON output modes.

Technical Notes: Exit code behavior remains consistent (expected-failure exit code when any entry fails), while JSONL reporting becomes "fail-fast" only when --continue is not provided.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/main.cc
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 12 files

Comment thread docs/validate.markdown
jsonschema validate path/to/my/schema.json path/to/my/dataset.jsonl
```

### Validate a JSONL dataset reporting all failures
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it report all failures or for each failing record the first validation error?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clarify. Only first failure. JSON Schema, as per the standard, doesn't really describe how to proceed past first validation error. Maybe could be done, but might be very complex in certain cases. Definitely a longer track of work

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly ajv has this feature at least as option --all-errors. The main reason we are experimenting with jsonschema is the support for jsonl.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sadly AJV is one of the worst ones in terms of JSON Schema standard compliance we know of (and pretty much abandoned at this point). See https://bowtie.report for the official ranking we maintain.

I'll take a note of it. In theory it is possible to keep going even despite of errors, but I think we would need to be careful with how we present errors and not spit out non-sensical stack traces. We'll see!

The main reason we are experimenting with jsonschema is the support for jsonl.

Can you share more about the use case, out of curiosity?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I work for hbz which provides digital infrastructure for scientific libraries in the state of North Rhine-Westphalia, Germany.

One of our services is a search index of the union catalogue of the libraries called lobid-resources: https://lobid.org/resources

The index data is created by an ETL transformation from a library data format MARC21 to JSON-LD.

We also created a JSON Schema for our index data: https://github.com/hbz/lobid-resources/tree/master/src/test/resources/schemas

We already validate our single test files around 200 with ajv. But we also want to test larger portions of our index e.g. current updates (serveral thousand records up to hundret thousand) or the whole index (22 Mio.) as jsonl dump files. The search for a validator that supports jsonl lead us to your project jsonschema.

So far our tests with the update files look promising. Maybe a support for compressed jsonl files would be nice.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The support for reporting all errors would also be nice, since it would help us to improve our transformation with regard to all errors. Our update files change daily and if a record has multiple errors we would spot the next error only if the record gets an update again.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On hbz, very nice! Let me know how we can help. You might find the schema linter interesting, and https://one.sourcemeta.com is a nice way to visualise and serve JSON Schema data models. Overall, we are trying hard to produce a next-level JSON Schema ecosystem, so any feedback you have is very appreciated.

Maybe a support for compressed jsonl files would be nice.

That's probably not very hard to implement. Can you submit an issue? What kind of compression are you using?

We already validate our single test files around 200 with ajv.

In general, we from the JSON Schema TSC advise against AJV given its compliance issues. We offer a test command in this project to setup JSON Schema unit tests. Would that be a match here?

The support for reporting all errors would also be nice, since it would help us to improve our transformation with regard to all errors.

Right. Makes sense. Let me think more about it!

Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
@jviotti jviotti merged commit f18026d into main Apr 23, 2026
14 checks passed
@jviotti jviotti deleted the jsonl-continue branch April 23, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Report all validation failures in a jsonl dataset

2 participants