Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent autoLinkUrls from including ending punctuation in the URL #1947

Merged
merged 2 commits into from Feb 12, 2015

Conversation

nwsw
Copy link
Contributor

@nwsw nwsw commented Jan 24, 2015

The regular expression that implements autoLinkUrls includes an ending character match that is designed to prevent inclusion of ending punctuation in the URL. For some reason, this last character match has been short circuited by adding a final question mark meta-character, which renders it useless. This patch corrects this mistake, and adds the same utf8 Letter mode matching that is permitted before the final character in the URL.

It looks like the regex for ftp also suffers from this problem, but this update does not fix that one.

Signed-off-by: Noteworthy Software, Inc online@noteworthysoftware.com

The regular expression that implements autoLinkUrls includes an ending character match that is designed to prevent inclusion of ending punctuation in the URL. For some reason, this last character match has been short circuited by adding a final question mark meta-character, which renders it useless. This patch corrects this mistake, and adds the same utf8 Letter mode matching that is permitted before the final character in the URL.

It looks like the regex for ftp also suffers from this problem, but this update does not fix that one.

Signed-off-by: Noteworthy Software, Inc <online@noteworthysoftware.com>
@nwsw
Copy link
Contributor Author

nwsw commented Jan 24, 2015

It looks like this issue was originally created by an old @Spuds update: SimpleMachines/SMF@97cda43

There are others ways to fix the problems with autoLinkUrls. For example, final characters could be detected using something like (?<![\.)>"\']) at the end of the expression. If this is done, the ending character check in the existing expression should just be removed, since it no longer has the desired effect.

@emanuele45
Copy link
Contributor

Relevant discussion:
http://www.elkarte.net/community/index.php?topic=499.0
autolinking is not an easy thing, it has several edge cases and context-based cases and perception-based cases.
If the idea is to cover them all somehow, a regexp is not enough, it would require a function to catch most of them.

@nwsw
Copy link
Contributor Author

nwsw commented Jan 24, 2015

Thanks. I did not see that topic before creating this one. I generally view the autoLinkUrls mechanism as a casual courtesy, and don't think that edge cases should override basic user expectations.

Regardless, the existing character match [/\w\-_\~%@\?;=#}\\\\]? is broken.

@emanuele45
Copy link
Contributor

That's the main problem, is not "strictly speaking" broken, but it's not always doing what people would expect. 😜 Yes, I know I'm nitpicking. 👼

Out of curiosity, what the \p{L} does?

@nwsw
Copy link
Contributor Author

nwsw commented Jan 24, 2015

Actually, it is absolutely broken as currently written (or, at the least, it is dead code that does nothing but make the regex look more intimidating).

The \p{L} matches utf8 letters.

@emanuele45
Copy link
Contributor

ohh... okay, regular expressions are not really my "competence area". lol

@Spuds you are the one that changed it, what do you think?

@nwsw
Copy link
Contributor Author

nwsw commented Jan 28, 2015

Since this seems to be stagnating, I'll just add/repeat a final thought.

In my view, autoLinkUrls should not be greedy. Unusual URLs and edge cases should not be the focus, as these can always be linked by a BBC when (rarely) necessary. The behavior of autoLinkUrls should not force the use of BBC when engaging in the common practice of enclosing a URL in parentheses.

Perhaps a much simplified regex that just avoids the inclusion of common punctuation in the auto-linked URL is a better fix. However, the existing regex does have the benefit of standing the test of time (used in SMF 1.1 and 2.0).

@Spuds
Copy link
Contributor

Spuds commented Feb 4, 2015

Sorry I've not been able to chime in on this at all. I think its fine to revert the ? back out and go back to what was there. I think this part [/\p{L}\w could be change to [/\p{L}\d as well, I don't think other than digits there is anything \w is going to cover that \p{L} will not, plus the _ is already in there that \w would have covered.

@nwsw
Copy link
Contributor Author

nwsw commented Feb 4, 2015

Thanks for the comment. I'll review this based on your comment.

@emanuele45 emanuele45 added this to the 1.1 Beta milestone Feb 4, 2015
Signed-off-by: Noteworthy Software, Inc online@noteworthysoftware.com
@nwsw
Copy link
Contributor Author

nwsw commented Feb 11, 2015

I agree with your comment. I went ahead and used the Unicode equivalent \p{N} to match utf8 numbers, instead of using \d. I also reverted the ftp autolink, since it has essentially the same issue.

The Travis CI error is 'Your last search was less than 5 seconds ago' which I do not believe is related to this PR.

@Spuds
Copy link
Contributor

Spuds commented Feb 12, 2015

Yeah sometimes travis goes bump, restarted the job to clear it up. Thanks for checking in to what \p{N} covers ... looks good !

emanuele45 added a commit that referenced this pull request Feb 12, 2015
Prevent autoLinkUrls from including ending punctuation in the URL
@emanuele45 emanuele45 merged commit 3cdbd87 into elkarte:development Feb 12, 2015
@emanuele45
Copy link
Contributor

I guess it would be good to backport it to 1.0.3, right?

@emanuele45
Copy link
Contributor

The problem with travis at the moment is that elk.net doesn't like too many searches:

Your last search was less than 5 seconds ago. Please try again later.

in the Curl tests.

@Spuds
Copy link
Contributor

Spuds commented Feb 14, 2015

I guess it would be good to backport it to 1.0.3, right?

Yes

Your last search was less than 5 seconds ago

🍅

@nwsw nwsw deleted the autolinkfix branch February 15, 2015 05:05
@emanuele45 emanuele45 added the bug label Mar 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants