Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return also errors when calling Ajv.validate (for stateless usage) #256

Closed
Clemens85 opened this issue Aug 1, 2016 · 11 comments
Closed

Comments

@Clemens85
Copy link

Clemens85 commented Aug 1, 2016

In the current version all errors of a validation are not returned, but are saved in the ajv instance, thus if I want to access them I need something like that:

ajv.addSchema(schema, 'mySchema'); var valid = ajv.validate('mySchema', data); var errors = ajv.errors; // All validation errors are written into this property

This is of course no problem in a normal javascript environment. We have however a quite advanced scenario where we call AJV in our java backend and we have only one AJV instance (that we initialize on server startup) for all requests. As far as I can tell this works, except for retrieving the specific error properties which are always written to the used AJV instance and is hence not stateless.

Now my question: Is it possible to create e.g. a new method (also for not breaking the existing API) like Ajv.validateWithErrorResult(...) that return a result object like e.g. following:
{ valid: false, errors: { .... } }

Hopefully this may be only a change in API (?)

@whitfin
Copy link

whitfin commented Aug 1, 2016

Gigantic +1 for this, I don't understand the logic in attaching errors to an instance when the point of ajv is that it caches schemas, making it perfect for being used as a singleton.

I have an application which may perform thousands of validations per second and it's unrealistic to expect to never hit a race condition in which ajv.errors has not changed since the validation call.

@epoberezkin
Copy link
Member

Thanks for the issue. It's similar to #242.

JavaScript is always single threaded so you won't have any race conditions in asynchronous environment if you use the errors in the same execution block - see the referenced issue.
You also can create your own api wrapper, which is a common practice anyway as it often combines schema retrieval, or use json-schema-consolidate if an additional function call is not a concern. I don't think an additional method doing the same validation should be added to ajv.

It's also better to use compiled validation function rather than accessing it via ajv.validate method.

@Clemens85
Copy link
Author

Thanks for your quick reply. You are of course right when speaking about JavaScript.
We have however a very special polygot case, where we call the JavaScript from a multithreaded environment (we use Java which provides its own JavaScript engine ("nashorn") and thus we can invoke JS functions from a running Java-Thread). This is of course sort of a mismatch (single-threaded environment called by a multi-threaded environment). I think however that by doing so it is indeed possible that JS functions are called concurrently (I am however not 100% sure).
I thought that it might be anyway "cleaner" to return all validation result immedietaly and not writing it in the ajv (or compiled) instance variable. The current API design makes the error handling even in plain javascript a little bit more complicated when dealing with promises due to the error state in AJV. If you would just return it, it would be more "clean" for a caller.
But maybe there are implementation-specific reasons why the error-state needs to be written back to the ajv instance, but if this not the case and it's only the API design I would again challenge this API design decision.

Please correct me if I am wrong, but the compiled validation function has exactly the same behaviour in which the error result is written back to the compiled validation function instance.
From my understanding the compiled validation functions improve performance, but when calling ajv.compile('schema') several times, I get always the same compiled validation function instance.

@epoberezkin
Copy link
Member

epoberezkin commented Aug 3, 2016

we call the JavaScript from a multithreaded environment (we use Java which provides its own JavaScript engine ("nashorn") and thus we can invoke JS functions from a running Java-Thread). This is of course sort of a mismatch (single-threaded environment called by a multi-threaded environment). I think however that by doing so it is indeed possible that JS functions are called concurrently (I am however not 100% sure).

Interesting... If the same JavaScript function (the same instance of function) is indeed used from multiple threads (not sure if that is possible) then changing Ajv instance API may not be enough to solve the problem. I would suggest checking if the validation function you access is the same instance (by marking it with some unique property e.g.). Even if it's the same function, it's worth checking if they are called concurrently. I would still expect there should be some locking mechanism in this case, that would prevent the same instance of function to be concurrently executed from multiple threads. Could you please check it?

I thought that it might be anyway "cleaner" to return all validation result immedietaly and not writing it in the ajv (or compiled) instance variable. The current API design makes the error handling even in plain javascript a little bit more complicated when dealing with promises due to the error state in AJV. If you would just return it, it would be more "clean" for a caller.

There are both historical reasons (other fast validators have the same api) and performance reasons not to return an object. Given that convenience is a subjective thing and also that the majority of users end up having their custom wrapper anyway (to retrieve schemas, to process validation errors, etc.) I am not convinced that convenience is strong enough reason to change or to extend the api.

By the way, are you aware that ajv supports asynchronous validation where you can add your own custom keywords (or formats) having some async calls in them? Even without custom keywords, if you add $async: true keyword to the schema root the validation function will return promise that either resolves with true or rejects with an Error instance that has errors property.

From my understanding the compiled validation functions improve performance, but when calling ajv.compile('schema') several times, I get always the same compiled validation function instance.

Correct, compiled function is cached using stable-stringified schema as a key. When you use ajv.validate you also use the same function under the hood. What I meant is that by calling a compiled function directly you avoid extra function call.

@Clemens85
Copy link
Author

Sorry for my very late reply now, I have been on vacation and somehow I also forgot this issue ;-)
But anyway here are my remarks:

There are both historical reasons (other fast validators have the same api) and performance reasons not to return an object. Given that convenience is a subjective thing and also that the majority of users end up having their custom wrapper anyway (to retrieve schemas, to process validation errors, etc.) I am not convinced that convenience is strong enough reason to change or to extend the api.

We have also our own wrapper around AJV, so of course the convenience thing matters just inside the wrapper implementation ;-) Anyway it was more about being stateless vs having a state in general, independently of the single-thread model of javascript. To be honest, I never thought that returning a simple object vs returning a boolean would be a performance bottleneck (I thought this wouldn't even be measurable after all).
But I understand your concerns regarding an API change.

Interesting... If the same JavaScript function (the same instance of function) is indeed used from multiple threads (not sure if that is possible) then changing Ajv instance API may not be enough to solve the problem. I would suggest checking if the validation function you access is the same instance (by marking it with some unique property e.g.). Even if it's the same function, it's worth checking if they are called concurrently. I would still expect there should be some locking mechanism in this case, that would prevent the same instance of function to be concurrently executed from multiple threads. Could you please check it?

I know this is a little bit strange due to we have a very special scenario.
Even by marking such an instance with a unique property, it is really quite hard to reproduce such a scenario with mutliple threads accessing the same instance concurrently.
I link instead to an article about the whole issue: https://blogs.oracle.com/nashorn/entry/nashorn_multi_threading_and_mt
It clearly states that there exist no locking mechanism or something like that, so it might definitely happen that a javascript instance is accessed concurrently from our java environment, thus they strongly recommend to not share objects after all (or at least if doing so, having the objects stateless).

But at the end you are right, this might be more a general problem due to as soon we use an object with a state we cannot hold one object between several threads.
We solved the issue for now by creating a pool in our environment of such javascript-engine-objects with each engine associated an own AJV instance. So evey thread gets its own exclusive AJV instance from the pool. This works, but adds of course also a bit extra complexity to our server-side application...

@epoberezkin
Copy link
Member

Thank you, I will read.
The difference in performance was quite measurable, I don't remember the exact difference though...
Changing this api would be a much bigger pain than you think; it is used internally as well as externally and if the same objects/functions are indeed concurrently accessed and executed in nashorn from the multiple threads there may be race conditions - simply adding API wrapper would not solve the potential problem as internal functions would still have the state.
It would be interesting to find out if this is the case :)...

@epoberezkin
Copy link
Member

I've read the post. Using JavaScript with JVM seems to be fun :) Using the pool of objects seems to be the best approach, it will work in general case. Given that JS is inherently single threaded any library, even with a stateless api, could have some state under the hood, it would be too difficult to find out whether it's the case. Using pool of single threaded resources solves this issue in any case.

@Bessonov
Copy link

Bessonov commented Nov 1, 2019

@epoberezkin from my point of view it's a bad design to have something global if there is no reason to have it global. Furthermore, to assume, that JavaScript is always single threaded, is just a speculation. You can see, that there are runtimes, which are not single threaded. I'm fine with argumentation, that you don't want to spend time to make your library resistent against race condition, but still, such design bothers me and makes the usage counterintuitive and error prone.

@epoberezkin
Copy link
Member

Furthermore, to assume, that JavaScript is always single threaded, is just a speculation

This is not a speculation - this is a standard for JavaScript implementations: https://www.ecma-international.org/ecma-262/10.0/index.html#sec-agents

An object can only be accessed by one thread; even though there may be multiple threads in the environments, but they cannot access the same objects. So no normal objects are shared between the treads.

that you don't want to spend time to make your library resistent against race condition

I believe it is resilient to race conditions as is - it follows from JS spec. Happy to be proven otherwise by a specific code sample that demonstrates race condition you are talking about in at least one JavaScript environment that complies with JS spec.

such design bothers me and makes the usage counterintuitive and error prone.

It may be the case, and it can be addressed by a single wrapper around the library to minimise the chances of any errors. Maintaining backward compatibility and avoiding unnecessary API changes takes priority over individual preferences to the API design.

@adriano-tirloni
Copy link

@epoberezkin would you mind providing an example of such wrapper?
I'm trying to use the same cached instance instead of compiling again on each validation call, is it possible?
or the way to go is to compile the schema inside myWrapperValidate in order to get the expected result in the below snippet? [1,2,3] (unordered)

import Ajv from 'ajv'

const ajv = new Ajv({ verbose: true })
ajv.addSchema({$id: 'test', type: 'object', properties: { number: {type: 'number', maximum: -1} }})

async function myWrapperValidate(data, sleepTime) {
  //Get cached compiled validator
  let compiledSchemaValidator = ajv.getSchema('test')
  
  //validate data
  compiledSchemaValidator(data)

  //simulate some delay
  await new Promise(r => setTimeout(r, sleepTime));
  
  //get error data information
  return compiledSchemaValidator.errors[0].data
}

let returned = await Promise.all([
  myWrapperValidate({number: 1}, 300), 
  myWrapperValidate({number: 2}, 200),
  myWrapperValidate({number: 3}, 100)
])

// returned => [3,3,3]

@adriano-tirloni
Copy link

Regarding this question, it is subtle and counter intuitive: Using async/await blurs evens more the boundary of execution block:

Async/await with errors and validate on different execution block.
image

Using thenables it gets more clear:
image

I would wrap it in something like this before using:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

6 participants
@Bessonov @Clemens85 @epoberezkin @whitfin @adriano-tirloni and others