-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix invalid regex in unconstrained arrays for json_schema.py #919
Conversation
regexes = [ | ||
to_regex(resolver, t, whitespace_pattern) for t in legal_types | ||
] | ||
return rf"\[{whitespace_pattern}({'|'.join(regexes)})(,{whitespace_pattern}({'|'.join(regexes)})){num_repeats}{allow_empty}{whitespace_pattern}\]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lapp0 - Right now the array will have at least one element. I think there is missing a parentheses to denote that we want the entire array to be optional and not just the elements which start with ,
i.e.
rf"\[({whitespace_pattern}({'|'.join(regexes)})(,{whitespace_pattern}({'|'.join(regexes)})){num_repeats}){allow_empty}{whitespace_pattern}\]"
Note the starting bracket after rf"\[
and closing bracket before {allow_empty}
.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored a lot of this code so we can do constrained yaml. It's a lot more liberal with parenthesis use. Do you think this makes sense to fix the problem?
def format_object_with_additional_properties(self, value_pattern, min_properties=None, max_properties=None):
if max_properties == 0:
return rf"\{{{self.ws}\}}"
inner = self._regex_repeat_elem(
elem_pattern=f"{STRING}{self.ws}:{self.ws}{value_pattern}",
separator_pattern=f"{self.ws},{self.ws}",
min_elem=min_properties,
max_elem=max_properties,
pad=self.ws
)
return rf'{{{inner}}}'
def _regex_repeat_elem(self, elem_pattern, separator_pattern, min_elem=None, max_elem=None, pad=""):
"""Creates a pattern allowing between min_elem and max_elem occurrences of elem_pattern"""
if max_elem == 0:
return ""
base_pattern = f"{elem_pattern}"
suffix_pattern = f"(?:{separator_pattern}{elem_pattern})"
min_suffix_repeats = "" if min_elem is None else max(0, int(min_elem) - 1)
max_suffix_repeats = "" if max_elem is None else max_elem - 1
if max_suffix_repeats == 0:
pattern = base_pattern
else:
pattern = f"{base_pattern}({suffix_pattern}){{{min_suffix_repeats},{max_suffix_repeats}}}"
padded_pattern = f"({pad}{pattern}{pad})"
if not min_elem:
return f"({padded_pattern}|{pad})"
else:
return padded_pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didnt get you. I just meant that right now the regex is this:
rf"\[{whitespace_pattern}({'|'.join(regexes)})(,{whitespace_pattern}({'|'.join(regexes)})){num_repeats}{allow_empty}{whitespace_pattern}\]"
and I think it should be this:
rf"\[({whitespace_pattern}({'|'.join(regexes)})(,{whitespace_pattern}({'|'.join(regexes)})){num_repeats}){allow_empty}{whitespace_pattern}\]"
For e.g its this [(true|false)(,(true|false)){0,}?]
vs [((true|false)(,(true|false)){0,})?]
. The former wont allow []
whereas the latter will allow it
Does this make sense?
('{"a": 1, "b": null}', True), | ||
('{"a": {"z": {"g": 4}}, "b": null}', True), | ||
("1234", False), # not an object | ||
('["a", "a"]', False), # not an array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lapp0 are we missing a test case for having arrays inside object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments! It's important that we create patterns that follow json-schema spec expectations. I'll ensure the two conditions you've referenced are tested in the refactor here lapp0#36
Fixes #913
Enables complete version of PR #914 which allows arrays
Problems
items
types specified) would always result in an illegal regexChange
depth
key to prevent infinite recursion