Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More than a million validation errors crashes the application. #199

Closed
khiftikhar opened this issue Oct 16, 2019 · 6 comments
Closed

More than a million validation errors crashes the application. #199

khiftikhar opened this issue Oct 16, 2019 · 6 comments

Comments

@khiftikhar
Copy link
Contributor

Hi,

We have an application which allows customer to send a json of 100 megabytes. In one case, we experienced that the customer sent a file in which every almost every field had validation exceptions. This validator gathers all the errors and report them at the end, but in this particular case, the there too+ many errors which caused the application to crash since it went out of memory. It would be nice to introduce a kind of fail early with a configurable threshold. Touching the threshold shall stop further processing and report errors gathered at that time.

@stevehu
Copy link
Contributor

stevehu commented Oct 17, 2019

@khiftikhar I think it is a good idea and I am a big fan of fail-fast. If we can make it configurable, I don't think anybody will complain about it. Would you like to submit a PR for this feature? Thanks.

@khiftikhar
Copy link
Contributor Author

@stevehu sure, I can start looking into it. Do you have any suggestion for a good candidate to put the code? Or which classes might be good to start looking at? I was thinking something like this,

final Set<ValidationMessage> validate = JsonSchemaFactory
            .builder()
            .objectMapper(objectMapper)
            .maxAllowedErrors(maxErrors)
            .build()
            .getSchema(schemaStream)
            .validate(rootNode);

@stevehu
Copy link
Contributor

stevehu commented Oct 18, 2019

@khiftikhar I think it is a good starting point. Previously, I was thinking to return on the first error, but make the max errors configurable might be better as other users might have that requirement. Also, it can prevent too many errors that crash the validator. We need to set a default maxErrors to a number big enough so that existing use cases won't be impacted. How about 100?

From the processing efficiency perspective, it might be easier to handle one/all scenarios because we don't need to compare with the maxErrors in each validator.

@khiftikhar
Copy link
Contributor Author

@stevehu What if the user is expecting more than 100 errors in one of his test cases? I was thinking to let the default value to be Integer/Long.MAX_VALUE. So it doesn't introduce any breaking changes. But personally, I would like to have default value <= 100.

From the processing efficiency perspective, it might be easier to handle one/all scenarios because we don't need to compare with the maxErrors in each validator.

Regarding this, can you specify what do you mean by that? I was thinking that we should stop the validator to do any processing if current errorCount have reached the maxError count. But can you give me some example, like where do you mean comparison should happen?

@stevehu
Copy link
Contributor

stevehu commented Oct 18, 2019

I think it will definitely impact the performance if there are errors more than 100 and most use cases will have much less then 100. As you said, I would like a small number but I am afraid of changing the current behavior.

What I mean regarding to the efficiency is that you have to compare the current number of the error in the error set with the maxErrors each time a new error is added. In the pure fail-fast, you return immediately when the first error is encountered.

@khiftikhar
Copy link
Contributor Author

@stevehu Performance is the main reason most people use this library in the first place. So I completely understand and acknowledge your concern. So, we can have fail fast without maxErrors in the first version i.e. processing shall stop when a failFast() is configured to true and first validation error is encountered.

final Set<ValidationMessage> validate = JsonSchemaFactory
            .builder()
            .objectMapper(objectMapper)
            .failFast()
            .build()
            .getSchema(schemaStream)
            .validate(rootNode);

khiftikhar added a commit to khiftikhar/json-schema-validator that referenced this issue Oct 24, 2019
khiftikhar added a commit to khiftikhar/json-schema-validator that referenced this issue Oct 24, 2019
khiftikhar added a commit to khiftikhar/json-schema-validator that referenced this issue Oct 27, 2019
khiftikhar added a commit to khiftikhar/json-schema-validator that referenced this issue Oct 27, 2019
stevehu added a commit that referenced this issue Oct 28, 2019
@stevehu stevehu closed this as completed Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants