Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji domain not recognised as link #11247

Closed
4 tasks done
stophecom opened this issue Apr 29, 2021 · 11 comments
Closed
4 tasks done

Emoji domain not recognised as link #11247

stophecom opened this issue Apr 29, 2021 · 11 comments

Comments

@stophecom
Copy link


Bug description

Links that contain emojis are not recognised as proper links.

Steps to reproduce

  • Open Signal
  • Go to any chat conversation or Note to Self
  • Share a link that contains an emoji. E.g. https://🤫.st

Actual result:
There is no link preview, the URL is not recognised as a proper link (not clickable)

Expected result: Valid domains should be handled as proper links.

It's a bit edge case bug, and might even count as feature request. However I'd be happy for a fix - which most likely goes here somewhere:

cc @greyson-signal @alan-signal Unfortunately I'm not familiar with Java, otherwise I'd do a PR myself. Let me know if I can help nonetheless.

Screenshots

Screenshot_20210429-224722

Device info

Device: OnePlus 6
Android version: 10
Signal version: 5.7.1

Link to debug log

Nothing to log

@hiqua
Copy link

hiqua commented May 2, 2021

With your URL this variable is false:

      boolean validCharacters = ALL_ASCII_PATTERN.matcher(cleanedDomain).matches() ||
                                ALL_NON_ASCII_PATTERN.matcher(cleanedDomain).matches();

because cleanedDomain ends up being 🤫st. Not sure what's the reasoning behind this check though...

@mayjs
Copy link

mayjs commented Jun 4, 2021

@hiqua I noticed a similar issue with domains containing umlauts (e.g. https://üei.de). Could this be related?

@hiqua
Copy link

hiqua commented Jun 4, 2021

I would say so.

@stale
Copy link

stale bot commented Jan 26, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the wontfix label Jan 26, 2022
@stophecom
Copy link
Author

Yes, it is this still relevant. Links with emoji or umlauts (ä, ö, ü) are not recognized as proper links.

@stale stale bot removed the wontfix label Jan 26, 2022
@stale
Copy link

stale bot commented Mar 27, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the wontfix label Mar 27, 2022
@ckujau
Copy link

ckujau commented Mar 27, 2022

Jup, it's still relevant. Btw, this affects the desktop version as well.
Screenshot from 2022-03-27 19-55-10

@stale stale bot removed the wontfix label Mar 27, 2022
@greyson-signal
Copy link
Contributor

We use the stock android linkifier. I don't anticipate that we'll be making a custom one anytime soon, apologies.

@ckujau
Copy link

ckujau commented Mar 28, 2022

Hm, both links work in Google Chat (formerly "Hangouts") just fine. And also in other applications (e.g. "Slack").
Screenshot_20220328-232524~2

@greyson-signal
Copy link
Contributor

Ah, I'm sorry, I misspoke. So we do use the default linkifier, but we filter out some results that fail certain rules. Right now, the main rule is that we don't linkify links whose domain has a mix of ascii and non-ascii characters. This is to help prevent homograph attacks. Our rule for that is a little coarse, but honestly given the huge character space we're trying to play it safe here.

So summary:

  • If a link is linkified, it's because the default linkifier identified it as a link
  • If it's not linkified, it's because the default linkifier missed it or it violates one of the rules we setup to prevent homograph attacks.

That said, I still think the outcome for this ticket is the same: I don't anticipate we'll do more nuanced stuff here just to be safe. But apologies for the previous incorrect reasoning.

@stophecom
Copy link
Author

stophecom commented Mar 30, 2022

Thanks for the explanation.
I would argue it's a nice trick and this rule definitely helps. But it's also a bit arbitrary. And with newer TLDs this "trick" might become less effective. E.g.

image

(You hardly notice that his is now within unicode-land)

And what about my emoji domain 😭😂😂😂. So sad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants