Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profanity fails to properly detect URLs wrapped in <> #1877

Closed
mdosch opened this issue Aug 23, 2023 · 1 comment · Fixed by #1883
Closed

Profanity fails to properly detect URLs wrapped in <> #1877

mdosch opened this issue Aug 23, 2023 · 1 comment · Fixed by #1883
Assignees
Labels
Milestone

Comments

@mdosch
Copy link
Contributor

mdosch commented Aug 23, 2023

If an URL is wrapped in <>, e.g.:

Fürst Pückler bezeichnet: (<https://de.wikipedia.org/wiki/F%C3%BCrst_P%C3%BCckler>)

profanitys /url open <tab> will try expand to /url open https://de.wikipedia.org/wiki/F%C3%BCrst_P%C3%BCckler>).

I think < and > should not be considered as part of an URL as they are unsafe characters that must be escaped:

Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; […]
https://datatracker.ietf.org/doc/html/rfc1738

Expected Behavior

/url open <tab> with the previous example should expand to /url open https://de.wikipedia.org/wiki/F%C3%BCrst_P%C3%BCckler.

Current Behavior

/url open <tab> expands to /url open https://de.wikipedia.org/wiki/F%C3%BCrst_P%C3%BCckler>).

Possible Solution

Steps to Reproduce (for bugs)

  1. See explanation at the beginning.

Environment

  • Debian Testing/Unstable
profanity --version
Profanity, version 0.14.0dev.master.492efc43
Copyright (C) 2012 - 2019 James Booth <boothj5web@gmail.com>.
Copyright (C) 2019 - 2023 Michael Vetter <jubalh@iodoru.org>.
License GPLv3+: GNU GPL version 3 or later <https://www.gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Build information:
XMPP library: libstrophe
Desktop notification support: Enabled
OTR support: Disabled
PGP support: Enabled (libgpgme 1.18.0)
OMEMO support: Enabled
C plugins: Enabled
Python plugins: Disabled
GTK icons/clipboard: Disabled
GDK Pixbuf: Enabled
@jubalh
Copy link
Member

jubalh commented Aug 24, 2023

I tried parsing it with g_uri_parse() and then making a string out of it with g_uri_to_string() but that unfortunately does: https://de.wikipedia.org/wiki/F%C3%BCrst_P%C3%BCckler>) -> https://de.wikipedia.org/wiki/Fürst_Pückler%3E). So it encodes it. Maybe there is no glib function and we need to ignore invalid chars on our own?

@jubalh jubalh added the bug label Aug 24, 2023
@jubalh jubalh added this to the next milestone Aug 24, 2023
jubalh added a commit that referenced this issue Aug 28, 2023
First I tried with g_uri_parse() and g_uri_to_string() but then I
learned that GUri validation API is only for things that are part of a
proper URL.

Let's cut the string at `>` since they are sometimes enclosed in `<>`.

Fix #1877
jubalh added a commit that referenced this issue Sep 4, 2023
First I tried with g_uri_parse() and g_uri_to_string() but then I
learned that GUri validation API is only for things that are part of a
proper URL.

Then used `g_utf8_strchr()` to cut the string at `>` since they are
sometimes enclosed in `<>`.

Thanks to @jaeckel for providing a proper regex.

Fix #1877
jubalh added a commit that referenced this issue Sep 5, 2023
First I tried with g_uri_parse() and g_uri_to_string() but then I
learned that GUri validation API is only for things that are part of a
proper URL.

Then used `g_utf8_strchr()` to cut the string at `>` since they are
sometimes enclosed in `<>`.

Thanks to @jaeckel for providing a proper regex from
https://stackoverflow.com/questions/43588699/regex-for-matching-any-url-character

Fix #1877
jubalh added a commit that referenced this issue Sep 5, 2023
First I tried with g_uri_parse() and g_uri_to_string() but then I
learned that GUri validation API is only for things that are part of a
proper URL.

Then used `g_utf8_strchr()` to cut the string at `>` since they are
sometimes enclosed in `<>`.

Thanks to @sjaeckel for providing a proper regex from
https://stackoverflow.com/questions/43588699/regex-for-matching-any-url-character

Fix #1877
@jubalh jubalh self-assigned this Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants