-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid matching with Cyrillic TLDs #32
Comments
Thanks for reporting this bug. It seems to happen with all non-ASCII TLDs - I was able to reproduce it with chinese and korean. We never had a test specifically for this case where a TLD is directly followed with a space, so I think this never worked. |
Thanks for a reply, @mvdan |
The issue is that \b (the word boundary) isn't unicode-aware, so it thinks chinese and other alphabets are non-words. I don't think there is a good fix here. The \b was used so that foo.comgarbage didn't match foo.com; we might have to lose that feature, or restrict it to ASCII TLDs only. |
Hi!
It seems like there is a problem with Cyrillic TLDs. Here an example:
If there are any symbols, even whitespace after cyrillic domain - it's not match anymore.
I tried to solve that issue and found that it can be something in string but I don't sure
In
\b
part. I tried to use|\b|\B
but some tests failed.Thanks!
The text was updated successfully, but these errors were encountered: