You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current implementation of re.LOCALE support for Unicode strings is nonsensical. It correctly works only on Latin1 locales (because Unicode string interpreted as Latin1 decoded bytes string. all characters outside UCS1 range considered as non-words), on other locales it got strange and useless results.
Proposed patch fixes re.LOCALE support for Unicode strings. It uses the wide-character equivalents of C characters functions (towlower(), iswalpha(), etc).
The problem is that these functions are not exists in C89, they are introduced only in C99. Gcc understand them, we should check other compilers. However these functions are already used on FreeBSD and MacOS.
I don't think we should fix this in 2.x: some people may rely on the old behaviour, and it will be difficult for them to debug.
In 3.x, I simply propose we deprecate re.LOCALE for unicode strings and make it a no-op.
Here is simple patch which just deprecate using of the re.LOCALE flag with str patterns. It also deprecates using of the re.LOCALE flag with the re.ASCII flag (with bytes patterns) and adds some re.LOCALE related tests.