Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autolinker vulnerable to RTLO URL spoofing attacks #377

Open
dbrgn opened this issue Mar 30, 2022 · 26 comments · Fixed by #386 · May be fixed by #388, #390 or #391
Open

Autolinker vulnerable to RTLO URL spoofing attacks #377

dbrgn opened this issue Mar 30, 2022 · 26 comments · Fixed by #386 · May be fixed by #388, #390 or #391

Comments

@dbrgn
Copy link

dbrgn commented Mar 30, 2022

When a URL string contains a unicode RTLO (right-to-left override) character, this results in two separate links being generated.

Context: https://www.bleepingcomputer.com/news/security/url-rendering-trick-enabled-whatsapp-signal-imessage-phishing/

PoC (screenshot, because GitHub would otherwise show the rest of the line as right-to-left):

screenshot-20220330-141920

Note that my terminal ignores the RTLO, but it's there. In the browser (Chrome in this case), this link looks like this:

screenshot-20220330-142044

The left and right part are separate links. When the user clicks on the right part, shown as aisa.mp4, they are redirected to http://4pm.asia/ instead.

I'm not sure what the ideal solution would be. If autolinker would strip all RTLO characters passed to it, then legitimate text (that's not part of an URL) might break. However, an RTLO should not be part of the actual URL. (Autolinker does basically no URL encoding whatsoever. Maybe that should be changed? An urlencoded RTLO would be handled (and ignored) by the browser.)

@dbrgn dbrgn changed the title Autolinker vulnerable to RTLO phishing attacks Autolinker vulnerable to RTLO URL spoofing attacks Mar 30, 2022
@caleb15
Copy link

caleb15 commented Aug 1, 2022

@gregjacobs wondering if you saw this? Snyk lists this as a high-level vulnerability.

@ShaharAdskAcc
Copy link

Any idea when it's going to be fixed?

@Gilgahex
Copy link

Gilgahex commented Aug 2, 2022

@gregjacobs
@olafleur

Please see this pull request

#382

@moblezin
Copy link

moblezin commented Aug 8, 2022

Hey, thanks for the fix, when will you be able to publish it to npm?

@Gilgahex
Copy link

Gilgahex commented Aug 9, 2022

I'd love to but I don't have write access, so I'd have to fork it and create a new pkg on npm

@RomekRJM
Copy link

As mentioned by @caleb15 this is high on SNYKs vulnerability risk. Any idea when the next Autolinker release is going to be? I see past 2 years it's been released annually.

@gregjacobs
Copy link
Owner

Hey guys, waiting on a viable PR for this. The PR linked in this thread doesn't seem to do anything (although I'm surprised to see that the test passed. Perhaps there is a difference in which node version is running?)

@yadue
Copy link
Contributor

yadue commented Sep 6, 2022

@gregjacobs so you're saying that the fix provided doesn't work when you run it but units went correctly and fixed the original issue? I simply cannot understand that. Provided PR does exactly what it is supposed to do and the community is waiting for this PR for ages already.

Comment out
url.replace('\u202E', '');
and you'll see the result. Provided PR works fine, please merge it and deploy new version.

@xfournet
Copy link

xfournet commented Sep 6, 2022

@yadue when i test it, the PR is not working, the added test test fails with or without the patch in url-match.ts (see comments in PR)

@yadue
Copy link
Contributor

yadue commented Sep 7, 2022

@xfournet yeah indeed, yesterday it worked simply fine for me using this PR, today it's not which is extremely weird. I'll work on fix.

@gregjacobs
Copy link
Owner

Hey guys, this has been fixed and released in v3.16.1 👍

@yadue
Copy link
Contributor

yadue commented Sep 7, 2022

@gregjacobs Thanks a lot, sorry for broken units, I’m pretty sure I didn’t have any broken tests only local env. Anyway it would be good to think about running GitHub actions also for PRs so issues like that could be easily avoided.

@gregjacobs
Copy link
Owner

Yeah, big time on GitHub actions. I had that set up at some point actually but seems to have broken.

Just thinking about this issue a little more now actually: is it the best thing to do to simply strip out these RTL characters? Or should we just include them in the generated link?

@gregjacobs
Copy link
Owner

Ugh, just thinking about this more: with this fix of stripping out RTLO characters, we're basically preventing Autolinker from being used on any string with RTLO characters at all.

Might have to revert the fix. Any other ideas on how to solve? My only thought is to include the RTLO characters in the link itself so that it doesn't become two separate links

@gregjacobs gregjacobs reopened this Sep 7, 2022
@gregjacobs
Copy link
Owner

Reverted with v3.16.2 for now

@dbrgn
Copy link
Author

dbrgn commented Sep 7, 2022

Yeah, that's this part of the issue description:

I'm not sure what the ideal solution would be. If autolinker would strip all RTLO characters passed to it, then legitimate text (that's not part of an URL) might break.

Maybe this would be a feasible approach?

Autolinker does basically no URL encoding whatsoever. Maybe that should be changed? An urlencoded RTLO would be handled (and ignored) by the browser.

@yadue
Copy link
Contributor

yadue commented Sep 8, 2022

@gregjacobs @dbrgn here's another proposition, another could be a simple configuration flag to enable stripping for the ones who don't need directional change characters.

@yadue yadue linked a pull request Sep 8, 2022 that will close this issue
@yadue
Copy link
Contributor

yadue commented Sep 8, 2022

@gregjacobs Here you have 2 separate solutions for this problem and at this stage that's all I could spend time on that. Do whatever you think is best for this library but remember that multiple companies have troubles as it is high severity issue. They are not ideal but they fix somehow the original issue.

BTW with recent changes you completely destroyed the execution time of unit tests as because of repeated tests almost 20000 test cases are being generated.

@dbrgn
Copy link
Author

dbrgn commented Sep 8, 2022

but remember that multiple companies have troubles as it is high severity issue

But is it? From what I can tell, this is a trick to make certain types of phishing / user deception a bit easier. It is no vulnerability in itself. Pulling off a successful attack with this is still not trivial.

For half a year, nobody cared about the issue. Then Snyk lists it as high severity (I disagree about the classification), and all of a sudden a lot of people think it's highly critical because they want their company processes to show "green" and not "red" 🙃

For a proper fix, questions I'd ask would be:

  • Are RTLO characters ever allowed in an URL? May an URL contain a path with Hebrew or Arabic content? (If yes, we can't simply strip them without breaking some legitimate URLs.)
  • If RTLO characters can be stripped, do we need to strip them from only the URL (<a href="{here}">...>/a>) or also from the link text (<a href="...">{here}</a>)? Can there be funny effects if they're stripped or not stripped from the link text?
  • Instead of stripping the characters, can we urlencode them in the URLs? (This appears to be the cleanest solution to me, but Autolinker does no encoding whatsoever right now, so this would be a change in strategy/architecture that @gregjacobs would need to decide on.)

Thanks @gregjacobs for the maintenance! URLs are a hairy business. You owe us nothing, but I appreciate your work (be it implementation or review) 🙂

(Edit: @yadue I commented both your PRs. Thanks for taking the time to propose a fix!)

@yadue
Copy link
Contributor

yadue commented Sep 8, 2022

@dbrgn for me it’s also funny it’s ranked high and to be honest I don’t know how the fix should looks like to make snyk happy. Remember that for complaint companies (like the one I work for) it’s absolutely critical to have all external dependencies secure. Maybe it would be worth to reach out snyk and explain it’s actually expected result. If no other solution will be done here we’ll do a fork with one of my solutions as we simply cannot stay with “vulnerable” library and it’s better to simply remove them then being “hacked” in that or another way. We simply cannot afford it.

@gregjacobs
Copy link
Owner

Hey @dbrgn, @yadue, thanks for the comments and the PRs - really appreciate it.

I actually like the idea of URL-encoding the RTLO characters even though Autolinker doesn't currently encode other characters. Seems like the safest thing and we won't be accidentally stripping out legitimate uses of RTLO characters elsewhere in the text. Thoughts?

@dbrgn
Copy link
Author

dbrgn commented Sep 8, 2022

@gregjacobs I'm not sure how Autolinker.js is structured, but would it be feasible to run every URL (the part inside the href attribute) through the following logic?

function encodeUrl(rawUrl: string): string | undefined {
    try {
        return new URL(rawUrl).href;
    } catch (e) {
        // Do not linkify this URL, it's invalid
        return;
    }
}

Browser support seems to be fine: https://caniuse.com/url

It seems to do the trick, and will let the browser handle the URL:

screenshot-20220908-165146

(Would need to be verified first within Autolinker.)

And, if you'd like a laugh, here's how the Chromium debug console handles RTLO in the hostname part 😂

screenshot-20220908-165359

@gregjacobs
Copy link
Owner

It would actually part of the parser that we'd need to detect the RTLO characters so that we don't cut off one URL and then start a new one. @yadue's PR #388 is actually quite close. @yadue, would you like to take a stab at URL-encoding the RTLO chars rather than rejecting the match?

@yadue
Copy link
Contributor

yadue commented Sep 8, 2022

@gregjacobs sure, Ill have a look. Do you have an idea where this escaping should be done? Also please take a look at units and number of generated test cases, not sure if it was expected result but there are almost 20k cases coming from last commits and for some reason it takes literally ages when run locally.

@gregjacobs
Copy link
Owner

Hah, it actually generates upwards of 100,000 test cases :) I may have gone overboard on that but with rewriting the URL parser from scratch, I wanted to make sure that every possible case was handled. I'll see if I can reduce it down a bit at this point. I'm not sure why jasmine just sits and hangs for a while even after running a single test case though - maybe it's time to switch to mocha or something and see if there's still the same issue.

For encoding the RTLO chars, try adding this conversion into the UrlMatch class's getAnchorHref and getAnchorText methods.

@yadue yadue linked a pull request Sep 9, 2022 that will close this issue
@kevin-from-atlassian
Copy link

kevin-from-atlassian commented Sep 26, 2022

Hi there, @gregjacobs. Are there any updates to this issue? Does #391 work as a fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment