-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Custom data validators in schemas #797
Comments
Clean proposal! These validators are definitely useful!
Can
Regex can introduce security issues (vulnerability to DDos attacks) 1, 2). I doubt this will have any practical impact. Our infra will not be impacted because every room is sandboxed. But maybe we should let the developer know of the risks? 🤔
I like the syntax!
Definitely rejecting. Schema validation should never modify the behavior of an incoming operations.
A s discussed earlier, not sure the |
Great point, I forgot to mention that in the proposal. I've added the Precedence section.
It would avoid the potential name clashes. But I personally think it would look unnecessarily alien and unfamiliar to users. Example: type Storage {
name: string
age: positiveint
email: email
list: LiveList<int[0..10] | iso8601>
} I think using capital casing here is more common and familiar and communicates that these are "just types", except that they happen to be defined elsewhere, and not in this document. But I fully admit it's a matter of taste.
Great point, I've added a warning callout about it.
Thanks for confirming my hunch 🙏 ! I've removed the open question from the document and replaced it with this decision. |
Motivation
When designing a schema for a data format, it is often necessary to perform specific validations on the data, beyond what is possible with the basic
string
ornumber
types. For example, a schema may need to validate that a given field is an integer, or that another field contains a valid email address. In such cases, it is common to define custom validation functions or use regular expressions to perform these validations.However, defining custom validation functions can be time-consuming, error-prone, and may result in non-standard schemas. Using regular expressions can also be complex and may not cover all possible cases. Moreover, it can be challenging to share these validation functions or regexes across multiple schemas or projects.
To address these issues, this proposal suggests the addition of a selected set of globally available and well-known types that can be freely used in the schema language to perform more specific validations. These types could include commonly used types such as
Int
,PositiveInt
,Email
, and regexes.Proposal
This proposal suggests the addition of a set of built-in types that can be used in the schema language to perform more specific validations. These types would be globally available and well-known, and could include the following:
Positive
: a positive number (>= 0)Int
: an integer numberPositiveInt
: a positive integer numberEmail
: a valid email addressISO8601
1: a valid ISO-formatted date stringAn example:
Users would be able to use those types as if they were built-in types, but they are part of a standard library of "pluggable types" that we would offer and document.
Precedence
True built-ins like
string
,number
, etc. cannot be overridden in the language, i.e. the following document would be invalid:But for these new data validation types, we may need to be a bit more flexible. It may be a bit of a contrived example, but suppose someone already has defined this schema:
And then at one point in the future we would introduce a
Phone
validator, for validating phone numbers. The question now is: how do we interpret this old, already-existing, and working schema? In this case, for this document,Phone
should mean the locally-defined object type. But in a document without atype Phone { ... }
definition, it would mean the custom data validator.In other words: these data validators should be allowed to be redefined in a schema document, in which case you can no longer refer to the custom validator that the Liveblocks runtime provides.
Regex literals
A special case in the syntax of the language would be reserved for regexes, because those would have to be parameterized, so the proposal is to allow using regex literals directly in type positions, like in the
postalCode
field in the example.What is the expected behavior if the regex anchors are not present. I.e. what is the expected behavior of
/\d{5}/
vs/^\d{5}$/
? Is it expected that/\d{5}/
will match"abc123456xyz"
? Because it will in JavaScript.Allowing ranges for numeric types
A common case is to allow numeric values within certain ranges, i.e. a number between 0 and 100, or -100 to 100. It may be possible to allow such ranges on any numeric type, i.e. on
number
,Int
,Positive
, etc.The proposal is to allow optional range specifiers, that look like:
The syntax is:
<minbound>
or<maxbound>
must be provided, or both. The range[..]
is not valid syntax. (That would be the default range.)<minbound>
may be prefixed with a>
sign to exclude the min bound from the range.<maxbound>
may be prefixed with a<
sign to exclude the max bound from the range.Allowing this syntax for all numeric types has the benefit that it works even for custom data validators, like
Int
orPositiveInt
:Note: with this syntax, we don't need a specific type named
Positive
orPositiveInt
. You could usenumber[0..]
andInt[0..]
respectively, which would be exactly the same.Reject, don't clamp
Suppose you have this schema:
If a client does
root.set('age', 33.5)
, then we will reject this message. We will not automatically round numbers to the nearest integer, because there can be cases where automatically changing data would have pretty terrible consequences depending on the app.Ranges on string types
Range syntax can also be useful on strings, to express a minimum or maximum string length. The syntax is similar:
The difference with numeric ranges is that on string types, ranges must only use positive numbers, and there is no "exclusion" operator (i.e. you cannot do
string[>3..<8]
, onlystring[4..7]
).Like with numeric types, this syntax would automatically also work on all string-like types, although it's questionable how useful it would be:
Not super useful for these cases, but… it would be a free side-benefit from this design ¯\_(ツ)_/¯
Implementation details
An interesting implementation detail is in which part of the system to implement these types. Ultimately, a type like
Email
is a runtime validator, so intuitively it may belong to our private backend repo.However, to the language itself, it has to know if
Email
is going to be a string-like type or a number-like type.For the language (= parser + checker), it's important to know that
Email
is a string-like type, because it must be able to reject a union like this:Similarly, it must know which types are number-like types:
It will depends on the runtime implementation (in our private backend repo) what meaning is given to these global types, but the language will have to know upfront what sort of data those pluggable parts will produce.
Therefore, this proposal suggests to make this configuration part of the
parse()
call, as an argument, passing the knowledge about these types down to the language.With this configuration, the parser and type checker will have enough knowledge to interpret the unknown global types that are potentially found in a schema text correctly, while not having to know much else about them:
Take this schema text as an example:
Then:
And:
But:
The parser will not return a
StringType
AST node for thatFoo
instance, but aStringLikeType
AST node, which will carry the alias "Foo" as payload. The schema validation runtime then has all the knowledge to know how to interpret that type, and to build a decoder for it that performs the adequate validation.Footnotes
Deliberately not using the
Date
type here, because it might suggest that JSDate
instances would go to/from the server, which is not the case. These are strings that would be in the ISO8601 format. ↩The text was updated successfully, but these errors were encountered: