Implement thinning consistently #2

thatch · 2014-05-04T22:30:10Z

When the user passes in a charset currently, it's only used for dot. I'm expanding this to be use intersection between the charset passed in, and categories like \w\s\d as well, but don't intend to for literals.

Should it apply to character classes? I'm not sure.

For some, like [^\w] it's pretty clear it should (once Unicode support lands), but others like [a-z_] are already fairly limited.

The text was updated successfully, but these errors were encountered:

jayvdb · 2020-09-22T11:13:51Z

#27 takes a stab at this by allowing the caller to provide different subsets for each category, which if used sanely allows control over expansion of \w, \d and \s.

This helps a little towards re.UNICODE, as it brings charset= into the internals in a way that can be described and reasoned about when the categories have multiple possible values.

It doesnt address explicit subsets like [a-z_] - I think there is more pain than joy in using charset= as a way to reduce the result space in cases like this - it could only be used if the input regex is quite predictable to the developer.

This was referenced May 4, 2014

Support flags=re.UNICODE #3

Open

Support flags = re.IGNORECASE #4

Open

jayvdb linked a pull request Sep 22, 2020 that will close this issue

WIP: Allow overriding all categories #27

Open

jayvdb mentioned this issue Sep 23, 2020

Random values #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement thinning consistently #2

Implement thinning consistently #2

thatch commented May 4, 2014

jayvdb commented Sep 22, 2020

Implement thinning consistently #2

Implement thinning consistently #2

Comments

thatch commented May 4, 2014

jayvdb commented Sep 22, 2020