You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parenthesis are legitimate characters in a URL. RFC 2396: «Data characters that are allowed in a URI ... "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"» This leads to Mastodon considering the following link to end with "background" instead of including the remaining ().html: https://coolguy.website/writing/the-future-will-be-technical/background().html
I know that Markdown use parenthesis to enclose an URL but this is a problematic idea. I think we should include parenthesis and only exclude the closing parenthesis if and only if it is at the end of the string, or before whitespace, or before punctuation and whitespace or the end of the status.
Something like the following: http://example.com/foo() includes all http://example.com/foo().html includes all http://example.com/foo(). excludes period http://example.com/foo().html includes all (see http://example.com/foo()) excludes last parenthesis (See http://example.com/foo().) excludes last period and parenthesis (see http://example.com/foo()). excludes last parenthesis and period (see http://example.com/foo().html). excludes last parenthesis and period
I searched or browsed the repo’s other issues to ensure this is not a duplicate.
This bug happens on a tagged release and not on master (If you're a user, don't worry about this).
The text was updated successfully, but these errors were encountered:
And the source code leads me to RFC 3986 which reduced the list of unreserved characters. But looking at the Path section we see that it consists of segments which consist of pchars which is pchar = unreserved / pct-encoded / sub-delims / ":" / "@" and parenthesis are part of sub-delims.
Oh well. Now that I look at the code that generates the regular expression, I feel that maybe this isn't worth it. Uuaaagh. In this particular case it failed because there was nothing inside the balanced parenthesis.
Parenthesis are legitimate characters in a URL. RFC 2396: «Data characters that are allowed in a URI ...
"-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
» This leads to Mastodon considering the following link to end with "background" instead of including the remaining().html
: https://coolguy.website/writing/the-future-will-be-technical/background().htmlI know that Markdown use parenthesis to enclose an URL but this is a problematic idea. I think we should include parenthesis and only exclude the closing parenthesis if and only if it is at the end of the string, or before whitespace, or before punctuation and whitespace or the end of the status.
Something like the following:
http://example.com/foo()
includes allhttp://example.com/foo().html
includes allhttp://example.com/foo().
excludes periodhttp://example.com/foo().html
includes all(see http://example.com/foo())
excludes last parenthesis(See http://example.com/foo().)
excludes last period and parenthesis(see http://example.com/foo()).
excludes last parenthesis and period(see http://example.com/foo().html).
excludes last parenthesis and periodmaster
(If you're a user, don't worry about this).The text was updated successfully, but these errors were encountered: