Profanity fails to properly detect URLs wrapped in <> #1877

mdosch · 2023-08-23T19:10:11Z

If an URL is wrapped in <>, e.g.:

Fürst Pückler bezeichnet: (<https://de.wikipedia.org/wiki/F%C3%BCrst_P%C3%BCckler>)

profanitys /url open <tab> will try expand to /url open https://de.wikipedia.org/wiki/F%C3%BCrst_P%C3%BCckler>).

I think < and > should not be considered as part of an URL as they are unsafe characters that must be escaped:

Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; […]
https://datatracker.ietf.org/doc/html/rfc1738

Expected Behavior

/url open <tab> with the previous example should expand to /url open https://de.wikipedia.org/wiki/F%C3%BCrst_P%C3%BCckler.

Current Behavior

/url open <tab> expands to /url open https://de.wikipedia.org/wiki/F%C3%BCrst_P%C3%BCckler>).

Possible Solution

Steps to Reproduce (for bugs)

See explanation at the beginning.

Environment

Debian Testing/Unstable

profanity --version
Profanity, version 0.14.0dev.master.492efc43
Copyright (C) 2012 - 2019 James Booth <boothj5web@gmail.com>.
Copyright (C) 2019 - 2023 Michael Vetter <jubalh@iodoru.org>.
License GPLv3+: GNU GPL version 3 or later <https://www.gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Build information:
XMPP library: libstrophe
Desktop notification support: Enabled
OTR support: Disabled
PGP support: Enabled (libgpgme 1.18.0)
OMEMO support: Enabled
C plugins: Enabled
Python plugins: Disabled
GTK icons/clipboard: Disabled
GDK Pixbuf: Enabled

The text was updated successfully, but these errors were encountered:

jubalh · 2023-08-24T06:51:31Z

I tried parsing it with g_uri_parse() and then making a string out of it with g_uri_to_string() but that unfortunately does: https://de.wikipedia.org/wiki/F%C3%BCrst_P%C3%BCckler>) -> https://de.wikipedia.org/wiki/Fürst_Pückler%3E). So it encodes it. Maybe there is no glib function and we need to ignore invalid chars on our own?

First I tried with g_uri_parse() and g_uri_to_string() but then I learned that GUri validation API is only for things that are part of a proper URL. Let's cut the string at `>` since they are sometimes enclosed in `<>`. Fix #1877

@jaeckel

First I tried with g_uri_parse() and g_uri_to_string() but then I learned that GUri validation API is only for things that are part of a proper URL. Then used `g_utf8_strchr()` to cut the string at `>` since they are sometimes enclosed in `<>`. Thanks to @jaeckel for providing a proper regex. Fix #1877

@jaeckel

First I tried with g_uri_parse() and g_uri_to_string() but then I learned that GUri validation API is only for things that are part of a proper URL. Then used `g_utf8_strchr()` to cut the string at `>` since they are sometimes enclosed in `<>`. Thanks to @jaeckel for providing a proper regex from https://stackoverflow.com/questions/43588699/regex-for-matching-any-url-character Fix #1877

@sjaeckel

First I tried with g_uri_parse() and g_uri_to_string() but then I learned that GUri validation API is only for things that are part of a proper URL. Then used `g_utf8_strchr()` to cut the string at `>` since they are sometimes enclosed in `<>`. Thanks to @sjaeckel for providing a proper regex from https://stackoverflow.com/questions/43588699/regex-for-matching-any-url-character Fix #1877

jubalh added the bug label Aug 24, 2023

jubalh added this to the next milestone Aug 24, 2023

jubalh mentioned this issue Aug 28, 2023

Cut URLs before adding to automcomp at > #1883

Merged

jubalh closed this as completed in #1883 Sep 5, 2023

jubalh self-assigned this Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profanity fails to properly detect URLs wrapped in <> #1877

Profanity fails to properly detect URLs wrapped in <> #1877

mdosch commented Aug 23, 2023

jubalh commented Aug 24, 2023

Profanity fails to properly detect URLs wrapped in <> #1877

Profanity fails to properly detect URLs wrapped in <> #1877

Comments

mdosch commented Aug 23, 2023

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Environment

jubalh commented Aug 24, 2023