Parenthesis and URLs #6121

kensanata · 2017-12-28T12:47:11Z

Parenthesis are legitimate characters in a URL. RFC 2396: «Data characters that are allowed in a URI ... "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"» This leads to Mastodon considering the following link to end with "background" instead of including the remaining ().html: https://coolguy.website/writing/the-future-will-be-technical/background().html
I know that Markdown use parenthesis to enclose an URL but this is a problematic idea. I think we should include parenthesis and only exclude the closing parenthesis if and only if it is at the end of the string, or before whitespace, or before punctuation and whitespace or the end of the status.
Something like the following:
http://example.com/foo() includes all
http://example.com/foo().html includes all
http://example.com/foo(). excludes period
http://example.com/foo().html includes all
(see http://example.com/foo()) excludes last parenthesis
(See http://example.com/foo().) excludes last period and parenthesis
(see http://example.com/foo()). excludes last parenthesis and period
(see http://example.com/foo().html). excludes last parenthesis and period

I searched or browsed the repo’s other issues to ensure this is not a duplicate.
This bug happens on a tagged release and not on master (If you're a user, don't worry about this).

The text was updated successfully, but these errors were encountered:

kensanata · 2017-12-28T13:07:17Z

I guess this should be an issue for twitter-text according to what I saw elsewhere. Something is wrong about these lines.

kensanata · 2017-12-28T13:20:12Z

And the source code leads me to RFC 3986 which reduced the list of unreserved characters. But looking at the Path section we see that it consists of segments which consist of pchars which is pchar = unreserved / pct-encoded / sub-delims / ":" / "@" and parenthesis are part of sub-delims.
Oh well. Now that I look at the code that generates the regular expression, I feel that maybe this isn't worth it. Uuaaagh. In this particular case it failed because there was nothing inside the balanced parenthesis.

kensanata closed this as completed Dec 28, 2017

GeopJr mentioned this issue Apr 1, 2024

[Bug]: Does not break hyperlinks based on URI-illegal special characters (like apostrophes) GeopJr/Tuba#883

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parenthesis and URLs #6121

Parenthesis and URLs #6121

kensanata commented Dec 28, 2017 •

edited

Loading

kensanata commented Dec 28, 2017

kensanata commented Dec 28, 2017

Parenthesis and URLs #6121

Parenthesis and URLs #6121

Comments

kensanata commented Dec 28, 2017 • edited Loading

kensanata commented Dec 28, 2017

kensanata commented Dec 28, 2017

kensanata commented Dec 28, 2017 •

edited

Loading