Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Urls with special characters (like ÆØÅ) don't become clickable links with preview #4810

Open
1 task done
goibon opened this issue Jan 13, 2021 · 24 comments
Open
1 task done
Labels

Comments

@goibon
Copy link

goibon commented Jan 13, 2021

  • I have searched open and closed issues for duplicates

Bug Description

When pasting a url (e.g. https://www.reddit.com/r/Denmark/comments/kwc0jz/æble_eller_pære/) the url doesn't become a clickable link along with a preview of the url. If I remove the two occurrences of the letter 'Æ' (which thankfully is still a valid url for the Reddit post) and then make another change then a preview appears.
If I send a message with the original url then that message contains a clickable link in the iOS app but not on desktop.
If I paste the original url in the iOS app then it generates a preview and a clickable link just fine.
So it seems the desktop version has a slightly different way of handling url detection.
I've attached a brief screen recording demonstrating the issue:

special_chars_in_url.mov

Steps to Reproduce

  1. Past a url containing special characters, in my case: https://www.reddit.com/r/Denmark/comments/kwc0jz/æble_eller_pære/ in the message input field

Actual Result:

The url remained plain text, e.g. not a clickable link, and no preview was produced.

Expected Result:

I expected the url to be transformed into a clickable link and a preview to be attached to my message.

Screenshots

Platform Info

Signal Version: v1.39.4

Operating System: macOS Catalina 10.15.7

Linked Device Version: 5.0.3.0 (iOS)

Link to Debug Log

https://debuglogs.org/c367cd3a70c3af7567d28c9f9df47c42ca8e85ef7faa1631760d5b1379656628

@EvanHahn-Signal
Copy link
Contributor

Thanks for reporting. I'm not sure what we should do here.

We have some code that ensures that all character's in a URL's path are valid characters based on the URI standard. æ is not one of those characters, so we don't show a link preview.

Both Firefox and Chrome copy the URL like this:

https://www.reddit.com/r/Denmark/comments/kwc0jz/%C3%A6ble_eller_p%C3%A6re/

All of these characters are valid, so we show a link preview. But Reddit's "Copy Link" behavior copies it with those invalid characters.

To fix this, our options are:

  1. Make our link preview checking more lenient. We would allow characters like æ in the URL, even though those are not technically valid. This might open up some other security issues which we'd need to evaluate.
  2. Update the mobile apps to be consistent with Desktop (and the spec), rejecting URLs with characters like æ.
  3. Do nothing, and accept that the apps are slightly different here.

Not sure what to do here, but I've filed this as a bug. We'll think about it.

@bentolor
Copy link

I just stumbled over the same issue here with a Am*zon URL containing a german umlaut ( 'ä' ) in the URL.

I swear, i directly copy & pasted the URL from Firefox stable into Signal. After @EvanHahn-Signal comment, I tried to reproduce: But now the URL got quoted correctly.

It seems Firefox only quotes the URL if you copy the complete URL. If you (like me), only select parts of the URL (like up to the /dp/xxxxxxx, part) Firefox no longer quotes preceding special characters of the partial URL.

@goibon
Copy link
Author

goibon commented Feb 13, 2021

@bentolor that is an interesting find so I tried it myself and here's what I found:

All of these tests were performed on macOS Catalina 10.15.7

Safari (14.0.3)

  • Selecting complete url: https://www.reddit.com/r/Denmark/comments/kwc0jz/æble_eller_pære/
  • Selecting partial url: /r/Denmark/comments/kwc0jz/æble_eller_pære/

Firefox (85.0.2)

  • Selecting complete url: https://www.reddit.com/r/Denmark/comments/kwc0jz/%C3%A6ble_eller_p%C3%A6re/
  • Selecting partial url: /r/Denmark/comments/kwc0jz/æble_eller_pære/

Chrome (88.0.4324.150)

  • Selecting complete url: https://www.reddit.com/r/Denmark/comments/kwc0jz/%C3%A6ble_eller_p%C3%A6re/
  • Selecting partial url: /r/Denmark/comments/kwc0jz/æble_eller_pære/

So it seems that Safari is the only browser of the three that doesn't encode the url when copying 🤔

@SeriousMatters
Copy link

SeriousMatters commented Jun 14, 2021

So it seems that Safari is the only browser of the three that doesn't encode the url when copying 🤔

On windows 10,
Firefox (89.0)

  • https://www.reddit.com/r/Denmark/comments/kwc0jz/æble_eller_pære/
  • /r/Denmark/comments/kwc0jz/æble_eller_pære/

Chrome (91.0)

  • https://www.reddit.com/r/Denmark/comments/kwc0jz/%C3%A6ble_eller_p%C3%A6re/
  • /r/Denmark/comments/kwc0jz/æble_eller_pære/

Opera (76.0)

  • https://www.reddit.com/r/Denmark/comments/kwc0jz/æble_eller_pære/
  • /r/Denmark/comments/kwc0jz/æble_eller_pære/

Edge (91.0)

  • https://www.reddit.com/r/Denmark/comments/kwc0jz/%C3%A6ble_eller_p%C3%A6re/
  • /r/Denmark/comments/kwc0jz/æble_eller_pære/

@SeriousMatters
Copy link

We have some code that ensures that all character's in a URL's path are valid characters based on the URI standard. æ is not one of those characters, so we don't show a link preview.

@EvanHahn-Signal , is there a reason to enforce pure URI standards while most messengers apps, broswers, and major websites supports UTF-8 urls?

@henkka-fi
Copy link

henkka-fi commented Oct 20, 2022

@EvanHahn-Signal , is there a reason to enforce pure URI standards while most messengers apps, broswers, and major websites supports UTF-8 urls?

Also wondering this. Most modern messaging apps and browsers support UTF-8 URLs and also Signal mobile app works correctly with these. Why is the strict URI standard enforced only on Signal desktop when UTF-8 URLs work basically everywhere else?

@yuvalne
Copy link

yuvalne commented Mar 18, 2023

Any update regarding this, by the way? It's still present in 6.x versions.

@habi
Copy link

habi commented Jan 30, 2024

The issue still persists in Signal 6.58.0.7 on iOS.

davidhaberthür.ch is not linked, davidhaberthuer.ch is.

IMG_6841

@nehemiagurl
Copy link

@habi you should probably open a ticket in the iOS repo for this one, as it's not Desktop

@bentolor
Copy link

bentolor commented Jan 30, 2024

@habi I would argue, that in your case of the domain name this is a good thing and should be kept by design to mitigate https://en.wikipedia.org/wiki/IDN_homograph_attack

On another note: What is http? I know about the secure hypertext transfer protocol https. I suspect http might be some obscure IBM mainframe legacy technology from the last century, no?! 😸

@nehemiagurl
Copy link

@bentolor the way to stop IDN homograph attacks is with Punycode, not by blocking support for all non-ascii characters in urls.
people outside the anglosphere exist.

also stop being obnoxious about https. the person was just demonstrating behaviour in the client, not sending nuclear launch codes. http is perfectly fine for that.

@bentolor
Copy link

@nehemiagurl Please refrain from your toxic behaviour.

@nehemiagurl
Copy link

@bentolor the one shaming people unnecessarily and dismissing their concerns is you.

@bentolor
Copy link

bentolor commented Jan 30, 2024

Dear @nehemiagurl

  1. I'm providing & giving feedback on why the case demonstrated might not fit to the topic of this ticket and explain the reasoning behind it.
  2. My comment on https was a humorous comment explicitly marked as joke with a smiley. Obviously I failed to add enough emojis to enable everybody to recognize it as a joke. For my failure I was rewarded your downvote(s) and minutes later a follow-up attacking me.
  3. You claim Punycode would be the way to fix this. I'd say this is just wrong, as this would imply that Signal modifies the users input. And altering the user content and input is something that Signal definitely shouldn't do at all.
  4. It's pointless to attack people in the internet on trivial jokes and comments. The result is solely both of us having a bad day right now…

@habi
Copy link

habi commented Jan 30, 2024

@habi you should probably open a ticket in the iOS repo for this one, as it's not Desktop

In the iOS repository, these issues about IDN already exist:

signalapp/Signal-iOS#5543 links to signalapp/libsignal#511, which is related to #5237 which is closed as a duplicate of this issue here.

@habi
Copy link

habi commented Jan 30, 2024

@habi I would argue, that in your case of the domain name this is a good thing and should be kept by design to mitigate https://en.wikipedia.org/wiki/IDN_homograph_attack

You know, there are people with an Umlaut in their name, which would actually profit from having their personal URL linked in a software they like to use.

iMessage links my URL, Threema does, Element does. I cannot test WhatsApp, as I don't have an account with them anymore.

@habi
Copy link

habi commented Jan 30, 2024

On another note: What is http? I know about the secure hypertext transfer protocol https. I suspect http might be some obscure IBM mainframe legacy technology from the last century, no?! 😸

I tried to 'minimize' the issue and copy-pasted several versions of my URL.
I personally think this part of your comment is unnecessary for the issue.
I also think it's very hard to balance humorous text with emojis, but don't think it's necessary to talk about this part more here.

@bentolor
Copy link

bentolor commented Jan 30, 2024

You know, there are people with an Umlaut in their name, which would actually profit from having their personal URL linked in a software they like to use.

I know: I guess quite everybody in this issue is here, because they wanted to share a URL containing some innocent, local characters like Umlaut.

My point is: I think there is a significant difference in the security impact of deploying a homoglyph attack as part of the URL path vs. the domain name.

iMessage links my URL, Threema does, Element does. I cannot test WhatsApp, as I don't have an account with them anymore.

I'm not sure that "the others do" is a good reasoning. I think in the case of the domain names, it's really a trade off between security and "least suprise of the user".

Are we on the same page, that rendering homoglyphs in domain names imposes a significant security thread for the users? As an Signal user: How would you be able to understand that the message from you colleague (whose phone has been stolen) asking you to change your password on https://account.mᎥcrosoft.com/ instead of https://account.microsoft.com/ is a fraud?

@bentolor
Copy link

Reading through the linked issues: The general gist here is that currently the behavior is confusing, as some special characters do get linkified on some platforms, others don't.

Ok: That's definitely confusing and not helpful. Signal should decide and take forward either one of both ways: Rendering IDNs or not rendering them at all.

@nehemiagurl
Copy link

How would you be able to understand that the message from you colleague (whose phone has been stolen) asking you to change your password on https://account.mᎥcrosoft.com/ instead of https://account.microsoft.com/ is a fraud?

if the person you're chatting with is an adversary trying to phish you via Signal, you've got bigger problems on your hand. protecting from homograph attacks is the job of the browser - just like Signal can't protect you from sending incriminating messages without massively degrading the app (and even then a more sophisticated adversary can go around that anyway), Signal can't protect you from accepting a message request from someone who's phishing you.

even if they say "screw you" to anyone who uses the internet outside of the Anglosphere and block IDNs from rendering, a phisher can always just send the link in a separate message and wait for the victim to copy-paste it in their browser.
do you really think that if someone who's unsuspecting would look at a series of messages like:

Hi, you need to urgently change your Microsoft password. Go to "my account" in this link and sign in, then configure a new password:

https://account.mᎥcrosoft.com/

they will see that the second link isn't clickable and immediately understand they're being phished?

most browsers implement Punycode conversion by default, as well as some other protections, and that's excellent, because that's the way to actually combat this. but you won't see browsers blocking IDNs altogether, and neither should Signal.

@bentolor
Copy link

Thank you, @habi and @nehemiagurl !

As I mentioned: It's a tradeoff. Now I see the merits in of both point of views and would be ok with both approaches, as long as the behaviour is consistent across Signal platforms and applications.

@habi
Copy link

habi commented Jan 30, 2024 via email

@SeriousMatters
Copy link

most browsers implement Punycode conversion by default, as well as some other protections, and that's excellent, because that's the way to actually combat this. but you won't see browsers blocking IDNs altogether, and neither should Signal.

Exactly! We don't forbid the sale of all knifes and all alcohol just because it can potentially enable criminal activity. Besides, people still fall into scams even if url is nothing like the original.

How about convert to punycode automatically on paste or on send?

@Nowaker
Copy link

Nowaker commented Apr 26, 2024

This!

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

9 participants