-
-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regexp mutation p{Latin} #1234
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Before the default was to return the generic node mutator on lookup misses. * This is fine for the main mutation registry, but not the external regexp ast transformations. * This change allows the default to be instance specific, allowing the transform registry to fail on unknown nodes rather than to return the generic mutator that hides the real issue.
dgollahon
added a commit
that referenced
this pull request
Nov 7, 2021
- This is up for early review because I'm not sure about the dynamic creation of the table of unicode properties. I tried just creating a list of them but it was so slow for my editor to process that I couldn't even format the giant lookup table. I suspect that if we want to "bake" these to avoid however long it takes to compute the table and maybe avoid any unexpected drift, it might make sense to dump to YAML or something like that. I'm not sure the best approach. - I'm also guessing there's a better option than just dumping all the regexp node types in the other list of supported regexp nodes. - We probably should do this for other regex types--we might be missing some of the posix classes, for instance (I have not checked yet). - Prevents crashes when having an unsupported property type in source. - Related to #1234 (which was a very partial fix) - Note that this turns our `\p{Latin}` formatting into `\p{latin}`. We could fix this with some very simple inflection but I wanted to do the simplest approach first to demonstrate the problem since this seems to be semantically equivalent. The ruby docs use the uppercase form. I have a text file from the upstream regex toolkit that we could use to confirm inflection rules if we want to.
dgollahon
added a commit
that referenced
this pull request
Nov 7, 2021
- This is up for early review because I'm not sure about the dynamic creation of the table of unicode properties. I tried just creating a list of them but it was so slow for my editor to process that I couldn't even format the giant lookup table. I suspect that if we want to "bake" these to avoid however long it takes to compute the table and maybe avoid any unexpected drift, it might make sense to dump to YAML or something like that. I'm not sure the best approach. - I'm also guessing there's a better option than just dumping all the regexp node types in the other list of supported regexp nodes. - We probably should do this for other regex types--we might be missing some of the posix classes, for instance (I have not checked yet). - Prevents crashes when having an unsupported property type in source. - Related to #1234 (which was a very partial fix) - Note that this turns our `\p{Latin}` formatting into `\p{latin}`. We could fix this with some very simple inflection but I wanted to do the simplest approach first to demonstrate the problem since this seems to be semantically equivalent. The ruby docs use the uppercase form. I have a text file from the upstream regex toolkit that we could use to confirm inflection rules if we want to.
dgollahon
added a commit
that referenced
this pull request
Nov 7, 2021
- This is up for early review because I'm not sure about the dynamic creation of the table of unicode properties. I tried just creating a list of them but it was so slow for my editor to process that I couldn't even format the giant lookup table. I suspect that if we want to "bake" these to avoid however long it takes to compute the table and maybe avoid any unexpected drift, it might make sense to dump to YAML or something like that. I'm not sure the best approach. - I'm also guessing there's a better option than just dumping all the regexp node types in the other list of supported regexp nodes. - We probably should do this for other regex types--we might be missing some of the posix classes, for instance (I have not checked yet). - Prevents crashes when having an unsupported property type in source. - Related to #1234 (which was a very partial fix) - Note that this turns our `\p{Latin}` formatting into `\p{latin}`. We could fix this with some very simple inflection but I wanted to do the simplest approach first to demonstrate the problem since this seems to be semantically equivalent. The ruby docs use the uppercase form. I have a text file from the upstream regex toolkit that we could use to confirm inflection rules if we want to.
dgollahon
added a commit
that referenced
this pull request
Nov 7, 2021
- This is up for early review because I'm not sure about the dynamic creation of the table of unicode properties. I tried just creating a list of them but it was so slow for my editor to process that I couldn't even format the giant lookup table. I suspect that if we want to "bake" these to avoid however long it takes to compute the table and maybe avoid any unexpected drift, it might make sense to dump to YAML or something like that. I'm not sure the best approach. - I'm also guessing there's a better option than just dumping all the regexp node types in the other list of supported regexp nodes. - We probably should do this for other regex types--we might be missing some of the posix classes, for instance (I have not checked yet). - Prevents crashes when having an unsupported property type in source. - Related to #1234 (which was a very partial fix) - Note that this turns our `\p{Latin}` formatting into `\p{latin}`. We could fix this with some very simple inflection but I wanted to do the simplest approach first to demonstrate the problem since this seems to be semantically equivalent. The ruby docs use the uppercase form. I have a text file from the upstream regex toolkit that we could use to confirm inflection rules if we want to.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes the
p{Latin}
regexp constructs that used to crash mutant as reported in #1231.Also refactor the
Mutant::Registry
class to allow use case specific default behavior, making it easier to address incomplete mappings in the future.