[Spike] Validation and type conversion at REST/Model/Database layer #1057

dhmlau · 2018-02-27T04:02:11Z

Timeboxed to 2 weeks

Overview

Validation

1st tier: property/parameter
- Validate against types and other constraint
- For example for string type, other constraints could be regex, length, email format
2nd tier: model
- i.e. multiple properties/parameters validation
- For example for REST calls, validate:
  - query/path/headers/form parameters
  - request body - content-type e.g. application/json, application/xml
3rd tier: model collection level
- For example, to make sure email is unique in Customer model
4th tier: across models/systems
- For example, when placing order with customerId, need to check if customerId exists.
- business logics need to be involved in this kind of validation
- Spike: evaluate the possibility. whether we need to open up extension for this purpose. e.g. in LB3, we have operational hooks. our users use those for validation purpose.

Type Conversion

How to serialize (parsing the request from REST) and deserialize (writing response back to REST) from REST to JavaScript and vice versa?
How to serialize and deserialize from JavaScript to database
- Consider cases like storing boolean value - in some databases, it uses yes/no.
How to serialize and deserialize between different database type
- For example, Customer stored in MongoDB where customerId is stored using objectId. Order stored in MySQL which doesn't have this native type. How to store customer using objectId as PK, referencing that from order in MySQL which has to be a string.
How does it work with different schema language, e.g. swagger, openapi v3, graphQL, xml schema.
- how to go from REST call to database call

Acceptance Criteria

What is the expected UX? May include pseudo code.
- for application developers
- for extension developers
High level design - slides/diagram to illustrate how things work
- what leverage we can get from LB3
How does it work in LB3 vs LB4? What should be improved and thrown away?
Feasibility
Complexity
Any open questions?

Questions to Answer

Validation

How much we can leverage LB3 juggler?
The goal is to be able to leverage legacy juggler as much as we can.
For 4th tier (across models) validation, evaluate the possibility.
Do we need to open up extension for this purpose. e.g. in LB3, we have operational hooks that are used by users for additional validation.

Note: Due to the different aspect of this spike, spike owner can go through the user experience and present to the team first before going further.

References

One of the user scenarios in here: https://strongloop.com/strongblog/examples-of-validations-for-loopback-models/

related to Router: coercion of complex parameters from string sources #100

cc @raymondfeng

The text was updated successfully, but these errors were encountered:

shimks · 2018-04-13T20:52:36Z

For validation, I was thinking of letting users provide a JSON schemas to validate their controller parameters and thought about using decorators for that purpose like this:

class Controller{
  createANumber(@validate(numberSchema) num: number) {
    return num;
  }
}

Unfortunately, I don't think it's possible to access the arguments from a parameter decorator, so it needs to be done in a method decorator. Then the question to answer is, do we want this feature to be available at the OpenAPI operation decorators, or should this feature live in its own method decorator?

In other words, how should validation be done from the following two options?:

@get() // incorporate into `openapi-v3`
createANumber(@validate(schema) num: number) {}

@validatable()
createANumber(@validate(schema) num: number) {}

For reference, OpenAPI spec does not seem to support useful fields that are present in JSON Schemas like pattern, so the information probably can't be directly integrated into the OpenAPI spec being generated.

Additionally, a very rough PoC on type coercion is up on #1256. Please let me know if you like the approach or have any questions about it.

Thoughts @strongloop/lb-next-dev @raymondfeng @bajtos?

b-admike · 2018-04-16T18:42:23Z

@shimks IIUC we are talking about Tier 1 validation. I think your approach sounds reasonable and like the choice of JSON schema representation for the validation. +1 for using method decorators for parameters. I'm not aware of any drawbacks to this. Based on what you said for lack fo support for regex patterns for types in OAI spec https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.0.md#data-types, I'd go with option 2 for having separate method decorator. Have you looked into LoopBack 3 and how it does this type of validation?

shimks · 2018-04-16T19:50:01Z

@b-admike This is what we have for LoopBack 3 so far https://loopback.io/doc/en/lb2/Validating-model-data.html#using-validation-methods. Based on my short briefing with @raymondfeng, apparently this isn't good enough and we should be looking to replace the validation functions with something more robust.

jannyHou · 2018-04-16T21:16:20Z

@shimks For the validation part, I think the job of parameter decorator and method decorator would be:

param decorator: register the validation function for a certain argument in Reflect metadata
method decorator: retrieve the validation functions and apply them on arguments

This is not related to the path decorator IMO, so let's keep it as a separate decorator. e.g.

@validate()
createANumber(@validateLength(schema) num: number) {}

And +1 for @b-admike 's comment

IIUC we are talking about Tier 1 validation.

Reading the Overview....is this spike for the whole validation Epic?

I may need more time to catch up the progress of type coercion.

jannyHou · 2018-04-16T21:27:15Z

@shimks OpenAPI spec supports pattern in SchemaObject, see https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.0.md#schema-object

And could you share more details like what's the usage of pattern in validation?

shimks · 2018-04-16T22:17:42Z

@jannyHou I can't I believe I missed that (pattern). pattern is essentially just regex for limiting whatever you're validating (see https://spacetelescope.github.io/understanding-json-schema/reference/regular_expressions.html#example).

I think since pattern exists for OpenAPI, it's possible to use openapi schemas to do validation, but I feel it'd be harder to do individual validation of each properties with only a part of OpenAPI spec so I think using JSON Schemas to do validation is the way to go.

raymondfeng · 2018-04-16T22:28:53Z

Here is how I see OpenAPI parameter validation works:

How to provide the metadata for validation
- Each parameter has a TypeScript/JavaScript type (from the decorator or inference)
- The OpenAPI spec for parameter can specify its type and format (subtype)
- We can either extend OpenAPI spec with more constraints or decorate parameters to supply such metadata
How validation is enforced at runtime
- The method decorator can return a proxy that does the validation before calling the original method
- An OpenAPI parameter validation action to perform the checks
- A parameter resolver to trigger validation for the params

bajtos · 2018-04-17T08:51:26Z

As for Tier1 validation, my understanding is that such validation rules will be declared at controller method level. In that case, I'd prefer to allow developers to use transport-specific decorators like OpenAPI spec @param to define validation rules. If needed, we can extend OpenAPI spec with our custom contraints, as mentioned by @raymondfeng.

Upsides:

It's easy to support design-first (top-down) approach, where validations are described directly in the OpenAPI spec/gRPC .proto document.
It's easy to generate API spec (e.g. OpenAPI spec or gRPC .proto documents) from the decorated controller methods, because the metadata used in decorators maps 1:1 to the spec.
Developers have full control over the API spec, it's easy to use any protocol-specific features.

Downsides:

When writing a server that provides multiple API protocols (REST/OpenAPI, gRPC, etc.), developers have to learn validation syntax of each protocol.

Regarding the enforcement of validation at runtime: as long as we are talking about validation of inputs for controller methods (or function handler based routes), I'd personally prefer to implement the validation in the REST layer, most likely as part of parseParams sequence action.

The method decorator can return a proxy that does the validation before calling the original method

While this is doable, I am concerned about the ramifications of generating proxy methods:

Performance impacts.
How easy it will be to troubleshoot proxied methods.
We may loose some flexibility in the controller method API - for example, can proxy methods support optional positional arguments?
How are we going to generate proxy methods in JavaScript codebase, where decorators are not available?

shimks · 2018-04-17T15:11:10Z

Here's what I have down so far for tier 1 validation: 6b40a33. Please look at the test file for the use case.

The PoC comprises of doing validation with JSON Schema, which I feel lines up better with our current support for bottom-up approach. This way we'd also have access to popular validators such as AJV.

I do agree that when looking at validation from the top-down approach, it's much more streamline to do validation with an OpenAPI schema. We'd just need to either use an OpenAPI schema validator or use a OpenAPI schema -> JSON Schema converter and then use a JSON Schema validator. This would take care of both tier 1 and tier 2 validations.

The PoC also shows validation done through a proxy method using decorators. (EDIT: I didn't know there was an actual Proxy object that could be used to do validation. I think the way it's done in my PoC so far serves a similar purpose though)

performance

I don't think using decorators will impact performance more than it already does since all we're doing is running validation on the given arguments and then passing them back to the original method.

troubleshooting

I also don't see problems with troubleshooting the proxied methods unless there are gaps in my knowledge.

flexibility

I may need to investigate more for positional arguments stuff.

JavaScript

I think for JavaScript codebases, we can just offer simple validation functions that the users can incorporate into their controller methods.

shimks · 2018-04-18T15:46:09Z

I'm about to start investigation for tier 3 validation and was wondering if uniqueness is the only example of it? I've checked https://loopback.io/doc/en/lb2/Validating-model-data.html and it seems like uniqueness is the only one that seem to fall within this tier.

bajtos · 2018-04-19T12:52:54Z

An example that comes to my mind - I guess it would have to be implemented as a custom validator:

Let's say we have a Price model that has a property Currency of type string enum. Let's say we maintain the list of valid currencies in another table, where the currency name is NOT a primary key. In that case we may want to write a validator to ensure price.currency always contains one of the enum values listed in Currency table/model.

Maybe the example above is not a realistic one? Let's change it a bit:

The Price model remains the same. The currency table uses the currency name as the primary key. Conceptually, we want to say that "price.currency" is constrained to values (PKs) of "Currency" table. While this can be modeled by belongsTo relation and enforced at the database level, I think it would be great to have this kind of validation available at Tier 3 level. Few reasons for that:

We don't want to expose "belongsTo" relation API neither to JS code nor the REST API.
An Tier3 constraint can be converted to SQL constraints by our automigrate/autoupdate layer
Not all databases support referential integrity, a LoopBack Tier3 validation could be a good solution for them. A prime example: our memory connector. This feature is even more important if we still want to claim that LoopBack applications can seamlessly switch between backing databases.

I did a bit of googling aroud SQL constraints and referential integrity and could not find any other use cases below these two that we have already identified: uniqueness and referential integrity.

shimks · 2018-04-19T21:01:42Z

I think referential integrity can be accomplished at tier 2 (model) level as long as the schema provided to do the validation for the model contains all of the information. Do you think the spike should still investigate implementation for referential integrity at tier 3 validation?

For the case of validation of uniqueness, @raymondfeng and I have found that it should be possible to update the juggler definition to support uniqueness for the DBs that support the constraint. For DBs that don't support it, we think it's ok to defer the validation feature for uniqueness. If the users need it, I think they can use our tier 4 validation feature to do it themselves.

Thinking about these validations in our described 3rd tier, I feel that they can be accomplished at different tiers (2nd tier for referential integrity and 4th tier for non-native uniqueness support)

Thoughts @bajtos and @raymondfeng?

EDIT: we've found that support for UNIQUE constraint cannot be added in without access to autoupdate/automigrate.

jannyHou · 2018-04-23T16:44:20Z

Hope I didn't misunderstand the whole spike, I think we want to apply validator when given the OpenAPI spec, not the other way around, generate the OpenAPI spec according to validating rules.

As for Tier1 validation, my understanding is that such validation rules will be declared at controller method level. In that case, I'd prefer to allow developers to use transport-specific decorators like OpenAPI spec @param to define validation rules.

@bajtos I still feel as the initial implementation, it would be good to have a general validator instead of an OpenAPI specific one. How about using composed decorator? e.g.

async createUser(
  @validateEmailFormat()
  @requestBody()
  newUser: User
): Promise<User>

The OpenAPI decorator is get called first, so the validator decorator can consume the information in the spec. It guarantees the spec generation and validation are targeted on the same parameter, and also easy to switch to other protocols. Moreover, we also have a chance to modify the spec in the validator

I am a little confused with tier 3, what does model collection level mean? What would be the validation that tier 1 and 2 could not do and have to pass to tier 3?

shimks · 2018-04-23T18:23:52Z

@jannyHou What we want to do is set a baseline for other potential APIs to do their validation on (or at least that was my approach to it). It should be possible to integrate @validate() with @param or @operation to have the validation automatically done.

shimks · 2018-04-23T20:31:02Z

@raymondfeng @bajtos What kind of UX are we looking for in tier 4 validation? From what it sounds like, we need to be able to give users access to the arguments from the controller method to have them add in their own logic, but the easiest solution to this I feel like is for the users to just do this type of validation in the method itself, both for the users and for us. What am I missing?

Was the purpose to see if we can find a way to implement tools for extension devs to provide users with out-of-the-box validation functions (like uniqueness, presence of FK, etc.)?

jannyHou · 2018-04-24T17:44:08Z

Had a talk with @shimks regarding tier 3 and 4, take email validation as an example, there could be two kinds of validations:

validate format
validate uniqueness(need to call backend api)

If we can pass in the model and repository to the decorator function, then those two use cases don't have a difference, and moreover, if we can get access to the related models and repositories in decorator function, that's even better, tier 3 and 4 can be merged into tier 1 and tier 2, we only need two tiers: property level and method level.

So the question would be: given a controller class constructor/prototype, can we get the context like its models and repositories? Or just pass them in as the inputs of decorator function?

bajtos · 2018-04-26T13:24:25Z

For the case of validation of uniqueness, @raymondfeng and I have found that it should be possible to update the juggler definition to support uniqueness for the DBs that support the constraint. For DBs that don't support it, we think it's ok to defer the validation feature for uniqueness. If the users need it, I think they can use our tier 4 validation feature to do it themselves.

Thinking about these validations in our described 3rd tier, I feel that they can be accomplished at different tiers (2nd tier for referential integrity and 4th tier for non-native uniqueness support)

Sounds great 👍

Hope I didn't misunderstand the whole spike, I think we want to apply validator when given the OpenAPI spec, not the other way around, generate the OpenAPI spec according to validating rules.

I think we actually need both. Consider the case where a controller method is accepting a model data. In this case, most (if not all) validation rules are already described in LDL model definition. IMO, repository-json-schema should be able to take LDL/juggler validation rules and convert them to JSON Schema/OpenAPI spec.

Implementation wise, I agree the REST layer should apply validation based on the OpenAPI spec provided by individual endpoints (controller methods, route handlers, etc.)

raymondfeng · 2018-04-26T16:06:18Z

Some thoughts around the 4 tiers of validation:

Tier 1:

Per property of models
Per parameter of api method calls

Tier 2:

multiple properties within the same model instance
multiple parameters of the same api method calls

Tier 3:

multiple instances of the same model collection (class/store)

Tier 4:

custom validations that relies on business logic with possible async access of external resources outside of LB4's control. Have the ability to apply validation logic at certain extension points such as model persistence and method calls.

Validations can be declarative (declaring constraints) and imperative (writing custom validator functions).

The overlapping between parameter value and model instance validation is interesting:

For example, CustomerController.create(customer) probably hints two types of validations here:

Make sure customer parameter is valid. One use case is to disallow customer.id to be passed in during creation.
Make sure customer is a valid instance of Customer model, such as email is required.

shimks · 2018-04-27T21:04:02Z

Spike findings

Coercion:

link to PoC PR [PoC - WIP] very rough PoC on request parameter type coercion/validation #1256 (ignore the commits on validation)
We should leverage types in @loopback/repository to do string/JSON coercion for us.
- @loopback/types will act as a global typing system for serialization/deserialization/coercion
- Coercion will be based on the parameter type of the OpenAPI spec that the route corresponds to
- done in parseParams
Coerce according to Data types (NOTE: coercion logic in @loopback/types still needs to be verified)
- integer/long/float/double -> number
- string -> string
- byte/binary -> Buffer
- boolean -> boolean
- date/date-time -> Date
- password -> N/A
- null -> null (still need to create a LoopBack type for this)
- any -> any (don't coerce)
- object and Reference Objects -> object
  - no Model validation should be done here
- array -> if only one item, coerce accordingly, but don't coerce if array can have more than one items; too ambiguous

// in parseParams
let params: any[];
// ... parameter parsing
let coercedParams: any[];
for (param of params) {
  const lbType = getLoopBackTypeBasedOnParamSpec(param, paramSpec);
  coercedParams.push(lbType.coerce(param, paramSpec));
}
return coercedParams

see Fix coercion of input arguments (HTTP/REST) strongloop/strong-remoting#312 and https://github.com/strongloop/strong-remoting/tree/master/test/rest-coercion for an idea on the idea for test cases
- two scopes of tests:
  - actual coercion logic from a type to another
  - coercion based on the parameter spec

Validation:

link to PoC PR [PoC] feat(validator): add in a JSON Schema validator decorator #1294
Use @validate (parameter level decorator) and @validatable (method level decorator) (names can be changeable) to validate the parameters against these two options:
- JSON Schema: for validation on queries/models alike using AJV
- functions/async functions: for custom validation using logic; intended for making manual (asynchronous) back-end calls for validation
AJV can be used to do coercion as well, but it is not as flexible as we'd like it to be
- for example, we cannot coerce a string into a Date type
can be integrated into openapi-v3 decorators by taking the spec information (manually input) from @operation or @param, converting it into a JSON Schema, and then calling the decorator functions
Weighing the approach of property descriptor manipulation through method-level decorator
- pros:
  - logically clean; validation is done on the parameters as the method is being invoked
  - extension friendly; not tied down to rest, and any other potential future servers can use this
- cons:
  - need to address edge cases like replacing the function's length property (the parameter is not a rest parameter with a length of 0)
  - with current implementation, all functions that are decorated with @validatable (and by extension @operation) now return promises due to needing to address the possibility of the validator being an asynchronous function. Can be fixed by iterating through the list of validators and checking if any of them return promises.
    - a controller method decorated with an asynchronous function will still return a promise however, no matter what the original method returned
  - no access to Context; validation would be done isolated

Miscellaneous suggestions from the spike

Validating uniqueness of a key should be done after a juggler revamp, so that we could leverage the existing validatesUniquenessOf
Referential Integrity should be done on the applicable relation decorators without explicitly tying them into the new relation engine

Proposed discussion points regarding the spike results

Validation:

decide on method-level decorator approach vs baked-into-router approach
decide on shortcut validator decorators if we want any (like @email)

Proposed issues to come out of the spike

~~Implement~~ ~~Finish~~ Fully flesh out the coercion PR (if agreed to be the approach we want to take)
- implement the http-js layer coercion
  - both deserialization and serialization
- for queries/paths, string should be coerced into the right javascript runtime type
- for JSON/xml, reject if given json value is not the right type, except for data types like Date
- Make @loopback/types extensible so that extension developers can add in their own type converters
  - focus on the conversion case for Date as an extension example to flesh out @loopback/types
- Come up with a controller that utilizes both geoservice and Date as a form of acceptance test
- Find a way for users to provide the minimum metadata needed to determine the coercion type; what is the dev exp? What can we provide to the devs?
Polish and merge the @validate PR (if agreed to be the approach we want to take)
- blocked by the coercion PR
- as follow up:
  - integrate @validate into openapi-v3 decorators
    - will need to implement fix to OpenAPI decorator does not properly generate schemas of type Partial #1179 as part of the PR, otherwise it will be broken.
  - add in ability to provide custom error messages for validation failures
Add in full JSON Schema support for @model and @property

shimks · 2018-05-07T16:31:44Z

Follow up issue for this spike has been created in #1306.

dhmlau added spike Core-GA Validation labels Feb 27, 2018

dhmlau mentioned this issue Mar 5, 2018

Monthly Milestone - April 2018 #1044

Closed

23 tasks

dhmlau mentioned this issue Mar 22, 2018

Monthly Milestone - May 2018 #1172

Closed

23 tasks

shimks self-assigned this Apr 11, 2018

dhmlau added the DP3 label Apr 19, 2018

This was referenced Apr 26, 2018

[PoC] feat(rest): add type coercion for parseParams #1285

Closed

[PoC] feat(validator): add in a JSON Schema validator decorator #1294

Closed

shimks added the in-progress label Apr 27, 2018

shimks mentioned this issue Apr 27, 2018

OpenAPI decorator does not properly generate schemas of type Partial #1179

Closed

5 tasks

shimks mentioned this issue May 4, 2018

Implement coercion between HTTP and LB4 layers(converter and deserialize) #1306

Closed

8 tasks

shimks closed this as completed May 7, 2018

shimks removed the in-progress label May 7, 2018

shimks added this to the April 2018 milestone May 14, 2018

shimks mentioned this issue May 29, 2018

feat: add type coercion #1370

Merged

7 tasks

bajtos added the p1 label Aug 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spike] Validation and type conversion at REST/Model/Database layer #1057

[Spike] Validation and type conversion at REST/Model/Database layer #1057

dhmlau commented Feb 27, 2018 •

edited by shimks

shimks commented Apr 13, 2018 •

edited

b-admike commented Apr 16, 2018

shimks commented Apr 16, 2018

jannyHou commented Apr 16, 2018 •

edited

jannyHou commented Apr 16, 2018

shimks commented Apr 16, 2018

raymondfeng commented Apr 16, 2018

bajtos commented Apr 17, 2018

shimks commented Apr 17, 2018 •

edited

shimks commented Apr 18, 2018 •

edited

bajtos commented Apr 19, 2018

shimks commented Apr 19, 2018 •

edited

jannyHou commented Apr 23, 2018 •

edited

shimks commented Apr 23, 2018

shimks commented Apr 23, 2018 •

edited

jannyHou commented Apr 24, 2018

bajtos commented Apr 26, 2018

raymondfeng commented Apr 26, 2018

shimks commented Apr 27, 2018 •

edited

shimks commented May 7, 2018

[Spike] Validation and type conversion at REST/Model/Database layer #1057

[Spike] Validation and type conversion at REST/Model/Database layer #1057

Comments

dhmlau commented Feb 27, 2018 • edited by shimks

Overview

Validation

Type Conversion

Acceptance Criteria

Questions to Answer

Validation

References

shimks commented Apr 13, 2018 • edited

b-admike commented Apr 16, 2018

shimks commented Apr 16, 2018

jannyHou commented Apr 16, 2018 • edited

jannyHou commented Apr 16, 2018

shimks commented Apr 16, 2018

raymondfeng commented Apr 16, 2018

bajtos commented Apr 17, 2018

shimks commented Apr 17, 2018 • edited

shimks commented Apr 18, 2018 • edited

bajtos commented Apr 19, 2018

shimks commented Apr 19, 2018 • edited

jannyHou commented Apr 23, 2018 • edited

shimks commented Apr 23, 2018

shimks commented Apr 23, 2018 • edited

jannyHou commented Apr 24, 2018

bajtos commented Apr 26, 2018

raymondfeng commented Apr 26, 2018

shimks commented Apr 27, 2018 • edited

Spike findings

Coercion:

Validation:

Miscellaneous suggestions from the spike

Proposed discussion points regarding the spike results

Proposed issues to come out of the spike

shimks commented May 7, 2018

dhmlau commented Feb 27, 2018 •

edited by shimks

shimks commented Apr 13, 2018 •

edited

jannyHou commented Apr 16, 2018 •

edited

shimks commented Apr 17, 2018 •

edited

shimks commented Apr 18, 2018 •

edited

shimks commented Apr 19, 2018 •

edited

jannyHou commented Apr 23, 2018 •

edited

shimks commented Apr 23, 2018 •

edited

shimks commented Apr 27, 2018 •

edited