Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLs with parentheses are not clickable #1196

Closed
pragma- opened this issue Jan 9, 2023 · 33 comments
Closed

URLs with parentheses are not clickable #1196

pragma- opened this issue Jan 9, 2023 · 33 comments

Comments

@pragma-
Copy link

pragma- commented Jan 9, 2023

Hello again!

Thanks for the work on OpeningMod to support clickable URLs. It's been working wonderfully!

I've been noticing a minor annoyance with URLs that contain parentheses, notably Wikipedia URLs. Would it be possible to include parentheses in the URL regex? Perhaps even an option to expose the regex as a configurable item if you're willing to go that far.

@mintty
Copy link
Owner

mintty commented Jan 9, 2023

URL selection follows the same pattern as word selection, so this request is a bit tricky; I don't think you'd like to select "abc (def)" as one word with double-click. You can adapt the character set with option WordChars for experimenting.

@pragma-
Copy link
Author

pragma- commented Jan 10, 2023

I don't think you'd like to select "abc (def)"

Rather than "abc (def)" it's actually "abc_(def)" with an underscore. But the URL selection stops at the first opening parenthesis. I think if a parenthesis follows an underscore or non-whitespace character, the URL selection ought to consider those parentheses as part of the URL.

Looking at the documentation for WordChars, it says "By default, this string setting is empty, in which case double-click word selection uses the default algorithm that is geared towards picking out file names and URLs." Since parentheses are valid characters in URLs and can be valid in filenames too if quoted, then parentheses that follow an underscore or non-whitespace characters ought to be included as part of the word.

In the meantime, I've set WordChars=[a-zA-Z0-9()_:/?#%&=.] which seems to work in my limited experimentation, but does not feel ideal.

@pragma-
Copy link
Author

pragma- commented Jan 10, 2023

Can you provide an example command or text file to reproduce the issue?

An example URL would be https://en.wikipedia.org/wiki/Class_(set_theory)

@mintty
Copy link
Owner

mintty commented Jan 10, 2023

It wasn't actually too tricky to distinguish URL and word selection. I've uploaded a tweak that allows parentheses (also square and curly) in URLs.

@pragma-
Copy link
Author

pragma- commented Jan 10, 2023

Thank you!

Please do not include <> in URLs -- these characters are often used to wrap URLs to identify them as URLs in some text files and IRC channels, i.e. <https://en.wikipedia.org/wiki/Class_(set_theory)>.

It seems that https://www.rfc-editor.org/rfc/rfc3986#section-2 means that ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/%?#[]@!$&'()*+,;= are the characters that are valid in a URL, according to a quick Internet search.

@pragma-
Copy link
Author

pragma- commented Jan 10, 2023

It would be ideal if the URL-specific pattern could ignore or discard trailing ?!.,;: from a URL. These are the punctuation characters that often follow a URL in text or conversation that 99.9% of the time oughtn't be included as clickable. :-)

@mintty
Copy link
Owner

mintty commented Jan 10, 2023

<> are not included, before or after the patch.
! must be included, like ?&=, as they add parameters to a URL which are an essential part.

@mintty
Copy link
Owner

mintty commented Jan 10, 2023

OK, you say trailing ! etc, again a bit tricky though.

@pragma-
Copy link
Author

pragma- commented Jan 10, 2023

I mentioned <> because the comment in the commit asked whether we should consider them.

Ignoring or stripping certain trailing punctuation would be very helpful. The specific subset I listed, ?!.,;: I think could be safely ignored or stripped from the end of 99.99% of URLs without any issue. Please consider the following argument.

It is extremely, extremely rare to see one of ?!.,;: as the final characters of a URL. Within a URL, yes, but almost never ever as the trailing characters, if ever. On the other hand, in text files and chats people frequently often forget to put a space between the URL and conversational-punctuation like ?!.,;:. I think the lesser annoyance, given the rarity of ?!.,;: as the final characters of a URL and the frequency of them being used in text files and chats, would be to ignore or strip these characters from the end of URLs.

@pragma-
Copy link
Author

pragma- commented Jan 26, 2023

I've stumbled up on a related issue.

Given a URL like https://metacpan.org/pod/Thread::Csp, if I mouse-over the URL the whole thing underlines. But if I click the URL before the :: part, it opens https://metacpan.org/pod/Thread.

If I click on the URL after the :: (i.e. clicking on on Csp) then it opens the full URL to https://metacpan.org/pod/Thread::Csp.

@mintty
Copy link
Owner

mintty commented Jan 30, 2023

If I hover or click before ::, only the prefix is underlined or opened.
If I hover or click behind ::, the whole URL is underlined or opened, consistently.

@pragma-
Copy link
Author

pragma- commented Jan 30, 2023

I suspect it is the WordChars=[a-zA-Z0-9()_:/?#%&=.] I had configured in an attempt to have URLs with () be clickable. I've removed this line and now I have the same behavior with :: as you.

@goyalyashpal
Copy link

hey @pragma- , what's your use case for this?

depending on use case, the urls may also be following by closing parenthesis, which will require even more tuning to get right. but thankfully, the situation i am thinking of for that doesn't generally have to deal with terminals

so, i am interested to know what is your use case for urls in terminal?

@pragma-
Copy link
Author

pragma- commented Feb 1, 2023

@yashpalgoyal1304 Mostly IRC. But also manpages, READMEs, comments in source files, and more.

@pragma-
Copy link
Author

pragma- commented Feb 1, 2023

If the URL is followed by a closing parenthesis but does not contain an opening parenthesis, then the closing parenthesis ought to be discarded/ignored, I think? I personally have never ever seen a URL that has a ) in it at the end without a ( within the URL.

@pragma-
Copy link
Author

pragma- commented Feb 1, 2023

Perhaps instead of trying to hardcode a URL parser it may make sense instead to use a lightweight regexp library and expose a URL regex option so users can configure their own regexes for matching URLs, as many popular terminal emulators do?

@goyalyashpal
Copy link

without a ( within the URL.

no, i meant like this: this was discussed at that issue (https://example.org) - here, the trailing closing parenthesis doesn't belong to URL. Is it already lexed as intended?

@pragma-
Copy link
Author

pragma- commented Feb 8, 2023

Another URL with weird hover/click behavior:

https://encrypted-tbn0.gstatic.com/images?q=tbn:therestoftheurl&usqp=CAU

If I mouse-over the part before the : in tbn: it underlines https://encrypted-tbn0.gstatic.com/images?q=tbn and if I mouse over after the tbn: it underlines tbn:therestoftheurl&usqp=CAU. I can't click this URL to open it properly.

Screen recording: https://i.imgur.com/jYdsyAv.mp4

@mintty
Copy link
Owner

mintty commented Feb 8, 2023

(Such issues can easily be reproduced, screen recording not needed.)
That's likely because ":" separates the protocol prefix of a URL. I am aware that URL identification is very heuristic. There are border cases, however, that make it hardly conceivable that all cases can reasonably be caught, e.g. a URL with a (...) section vs a URL completely put into parentheses (see above).
To open any problematic link, you can always simply select it with the mouse, then right-click and choose "Open".

@pragma-
Copy link
Author

pragma- commented Feb 8, 2023

I addressed how to easily disambiguate a URL with a (...) section vs a URL wrapped in parentheses already. Disappointing.

@pragma- pragma- closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2023
@goyalyashpal
Copy link

just sharing for reference:

@mintty
Copy link
Owner

mintty commented Feb 9, 2023

I addressed how to easily disambiguate a URL

If it's so easy, you are welcome to propose an implementation.

@BrianInglis
Copy link

Source MIT so cloneable.
Just happy there is an Open link option: just click or drag to select -- easier for humans -- than dragging down the terminal with excessive, or really any, code for non-core functions!
Still prefer "Running Light Without Overbyte!" ;^>

@pragma-
Copy link
Author

pragma- commented Feb 12, 2023

@yashpalgoyal1304 I've switched to wezterm. See https://github.com/pragma-/dotfiles/blob/c6c9498f53f812ea872fdf94e044f63956862756/wezterm/wezterm.lua#L70-L85 for how I have URLs clickable in exactly the way I want.

edit: Updated URL regexes to be more robust.

@pragma-
Copy link
Author

pragma- commented Feb 12, 2023

@yashpalgoyal1304 also note https://github.com/pragma-/dotfiles/blob/c6c9498f53f812ea872fdf94e044f63956862756/wezterm/wezterm.lua#L9 for starting Cygwin instead of CMD. wezterm supports WSL, as well, but I don't use that.

@goyalyashpal
Copy link

for starting Cygwin instead of CMD

so, u dont use Msys2? i have just set my Environment variable ComSpec to be <address of bash.exe> i.e. D:\msys64\bin\bash.exe and wezterm respects that and starts in it by default

@pragma-
Copy link
Author

pragma- commented Feb 12, 2023

@yashpalgoyal1304 Nice! No, I like Cygwin's package manager and the ease it provides me in installing other things. I use xwin.exe -multiwindow along with ssh -Y to have some of my Linux desktop apps x-forwarded to my Windows machine, for instance. X-forwarding isn't really that great, but I'm used to it and it works out of the box. It's fine for apps that are very light in their graphical usage.

@mintty
Copy link
Owner

mintty commented Feb 13, 2023

URLs are now matched with proper detection of parentheses nesting.
Different styles of parentheses are not distinguished however; should that be necessary, it'll be easy to add.

@mintty
Copy link
Owner

mintty commented Feb 13, 2023

The current patch also fixes the inconsistency around :: when setting WordChars.

@pragma- pragma- reopened this Feb 14, 2023
@pragma-
Copy link
Author

pragma- commented Feb 14, 2023

I previously closed this issue as Not planned. I did not realize development would continue on this. I've reopened it so that it may be closed as Completed when done.

@mintty
Copy link
Owner

mintty commented Feb 14, 2023

It could be considered done for now, unless parentheses style matching is desired or someone finds another problem with it.
I usually close issues with the next release.

@pragma- pragma- closed this as completed Feb 26, 2023
@mintty
Copy link
Owner

mintty commented Mar 25, 2023

Released 3.6.4.

@clayne
Copy link

clayne commented Apr 2, 2023

Related: #1208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants