Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expression string validation in Rholang #19

Open
fabcotech opened this issue Jun 2, 2020 · 3 comments
Open

Regular expression string validation in Rholang #19

fabcotech opened this issue Jun 2, 2020 · 3 comments
Labels
Priority-Low Technical Technical Proposal

Comments

@fabcotech
Copy link

fabcotech commented Jun 2, 2020

Introduction

We must have the possibility to validate strings in rholang.

This is a needed feature for Dappy main network (no urgent though).

rchain-names is a name system coded in rholang, Dappy uses this name system to manage name purchases/selling/renewals, just like for the domain name system.

Right now the rholang code names.rho checks only the String type, therefore anyone is able to register names like "aaa", "aaaa123", "aBC", "a?!..", "eeꀀ€", and maybe even non-latin characters "的的的".

So we must be able to validate strings in a regexp-like way.

Note: this proposal is dedicated to string validations and not string manipulation like .toLowerCase().

Danger

Regexp validation can be very much resource intensive, so if we provide a transparent rholang -> scala bridge, it will be a very dangerous situatin, moreover we do not have an easy way to price a regexp operation.

See https://snyk.io/blog/redos-and-catastrophic-backtracking/

Motivating Examples

Strings in rholang already have functions like .slice(0, 3). We cannot do a one-to-one bridge like the following "abc".test(/[a-z]/g)

So my proposition is a limited set of regexp patterns, and therefore validation capabilities much more limited than regexp, but for which a pricing mechanism is easier to setup.

"abc".test(["a-z"]) // true
"abcD".test(["a-z"]) // false
"abcD".test(["a-z", "A-Z"]) // true
"abcD8".test(["a-z", "A-Z"]) // false
"abcD8".test(["a-z", "A-Z", "0-9"]) // true
// from now the patterns that are not one of "a-z", "A-Z", "0-9" must be a litteral match
"abcD8?".test(["a-z", "A-Z", "0-9"]) // false
"abcD8?".test(["a-z", "A-Z", "0-9", "?"]) // true
// etc...

There is no way to validate that a string starts with x or ends with x. Or that a string respects a certain pattern followed by another pattern, just dead-simple string validation.

Design

Just like every changes in rholang, this implies a "hard-fork" and that all nodes upgrade. It could be triggered at a specific block height, independently of release number.

Drawbacks

We must be very careful https://snyk.io/blog/redos-and-catastrophic-backtracking/

Alternatives

"aa".containsLowercases() // true
"aa".containsNonAscii() // false
"aâ".containsNonAscii() // true
"aa".containsUppercases() // false
"aa".containsNumbers() // false
"aa12".containsNumbers() // true
"aa12".containsSpecialCharacters() // false
"aa12?".containsSpecialCharacters() // true
@9rb 9rb added the Technical Technical Proposal label Jun 4, 2020
@tgrospic tgrospic changed the title RChip 12 - regular expression string validation in rholang Regular expression string validation in Rholang Jun 4, 2020
@jimscarver
Copy link
Collaborator

jimscarver commented Jul 30, 2020

I suggest file pattern (glob) matching to avoid redos attacks

@9rb
Copy link
Collaborator

9rb commented Jul 30, 2020

According to @nzpr we need to consider how this interacts with cost accounting. i.e. can we cost account for this easily ?

@fabcotech
Copy link
Author

fabcotech commented Aug 4, 2020

I think it is not that difficult to account.

If we do it the first way "abcD".test(["a-z", "A-Z"]) then the cost could be [cost proportionnal to length] * [number of validations]. In this case their are 2 regexp validations to perform.

If we do it the second way "abcD".containsUppercases() then their is a cost proportionnal to string length each time the function is called.

How about that ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority-Low Technical Technical Proposal
Projects
None yet
Development

No branches or pull requests

3 participants