-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid Relaxed from matching trailing TLDs without a word break #49
Comments
Indeed, this is a known problem, and I attempted to fix it a few months ago but didn't actually finish it. I'll take another look. |
mvdan
changed the title
False positive on text url parsing when sentences are connected without leading space.
avoid Relaxed from matching trailing TLDs without a word break
May 19, 2021
Should be fixed in master now. |
gopherbot
pushed a commit
to golang/tools
that referenced
this issue
Mar 4, 2022
Update to v2.4.0, which brings optimizations and minor improvements. Upstream fixed the issue with word breaks in v2.3.0, released back in July 2021. For more, see: mvdan/xurls#49 In short, ASCII TLDs now require a word break in Relaxed again. Note that non-ASCII TLDs don't require one, as that just breaks them. So, in that sense, this is making non-ASCII TLDs work in gopls again. Plus, we avoid wasting CPU cycles at init time by rebuilding the regexp. Change-Id: Iddb34827b61594fa1f911143f1d7e2c67a6658c9 Reviewed-on: https://go-review.googlesource.com/c/tools/+/389814 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Robert Findley <rfindley@google.com> gopls-CI: kokoro <noreply+kokoro@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When leveraging
rxRelaxed.FindAllString
to find all urls in SMS text, we found out that when sentences were connected without a blank after a dot it might mislead the regex and be detected as a legitimate url.For example: a SMS text like
ups has informed us that your equipment has been delivered.The ups tracking number is 1zraxxxxxxx to track the status of your delivery click here hxxps://abc.com/yz3e4a2p|70994
Reproduction steps
What did you expect to happen?
Relax method works as strict method, only one result comes out
What actually happened?
[delivered.th https://abc.com/yz3e4a2p] for relax method
[hxxps://abc.com/yz3e4a2p] for strict method
Environment
The text was updated successfully, but these errors were encountered: