Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve schema compilation performance for schemas with large strings (review regular expressions) #557

Open
cristianstaicu opened this issue Sep 5, 2017 · 4 comments

Comments

@cristianstaicu
Copy link

commented Sep 5, 2017

The following regular expression used in compiling the JSON-schema is vulnerable to ReDoS:

 /if\s*\([^)]+\)\s*\{\s*\}(?!\s*else)/g

The slowdown is moderate: for 40.000 characters around 4 seconds matching time. However, I would still suggest one of the following:

  • remove the regex,
  • anchor the regex,
  • limit the number of characters that can be matched by the repetition,
  • limit the input size.

If needed, I can provide an actual example showing the slowdown.

@epoberezkin

This comment has been minimized.

Copy link
Owner

commented Sep 5, 2017

This particular regular expression is only used during the schema compilation. The schema is not supplied by users, so this regex will only be used once (assuming you compile your schemas once). Also, the generated code never has more than one space so the actual match will never be that slow.

Could you demonstrate the actual schema that is very slow to compile because of this regular expression (i.e. when removing it would make schema compilation noticeably faster)?

@epoberezkin epoberezkin added the question label Sep 5, 2017

@cristianstaicu

This comment has been minimized.

Copy link
Author

commented Sep 5, 2017

function genstr(len, chr) {
    var result = "";
    for (i=0; i<=len; i++) {
        result = result + chr;
    }
    return result;
}

var Ajv = require('ajv');
var ajv = new Ajv;

var validate = ajv.compile({
    "type": "object",
    "properties": {
        "foo": { "type": 'string',
            "oneOf": [
                {"pattern": genstr(12000, "if(") +"x" +  genstr(12000, ")")  }
            ]}
    }
});

This code blocks the main Node.js event loop on my PC for 5 seconds. But if indeed the schema is never under the user's control, I agree this is not a problem. My only concern is that I am not sure if the 850 modules that depend on this module are aware of this assumption.

@coderextreme

This comment has been minimized.

Copy link

commented Sep 5, 2017

Thank god my schemas are not provided by users. And I was just thinking about generating a schema based on data in the JSON document (profile, component). Naughty users.

More about my use case. I have certain objects that only appear under certain profiles (enumeration) and components (enumeration and integer level). The combinatorics are such that generating all possible schemas is unlikely. It is acceptable to accept a Full profile and check the entire schema, but that won't tell you if a certain profile or component is required for a given object. Thus some objects in a full profile probably shouldn't be passed through a lesser profile. Does something like this sound possible in draft-06? Or even draft-04? I know I can do oneOf's for profiles, but components that can be removed from documents at will that may cause some objects to become invalid seem much more difficult to validate. This is the equivalent of providing different schema views of a schema, I think.

@epoberezkin

This comment has been minimized.

Copy link
Owner

commented Sep 5, 2017

@cristianstaicu thank you

The compilation time in this particular case does not change if I remove the regex you mention (and couple of others). So it needs to be better investigated. It may be the case that there are some other regular expression used during schema compilation.

@epoberezkin epoberezkin changed the title Vulnerable Regular Expression Improve schema compilation performance (review regular expressions) Sep 5, 2017

@epoberezkin epoberezkin changed the title Improve schema compilation performance (review regular expressions) Improve schema compilation performance for schemas with large strings (review regular expressions) Sep 5, 2017

@epoberezkin epoberezkin removed the question label Sep 30, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.