Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add links to company websites for breach resolution #2961

Merged
merged 26 commits into from Apr 19, 2023

Conversation

flozia
Copy link
Collaborator

@flozia flozia commented Mar 31, 2023

References:

Jira: MNTOR-1504

Description

Adds a link to the breach resolution when passwords or security questions were involved.

Screenshot

MNTOR-1504

How to test

  1. Login with an account or add an email address that was exposed in a breach.
  2. Visit /user/breaches
  3. If the breach includes passwords or security questions there should be a link to the company website in the associated resolution step.

Checklist (Definition of Done)

  • Localization strings (if needed) have been added.
  • Commits in this PR are minimal and have descriptive commit messages.
  • I've added or updated the relevant sections in readme and/or code comments
  • I've added a unit test to test for potential regressions of this bug.
  • Product Owner accepted the User Story (demo of functionality completed) or waived the privilege.
  • All acceptance criteria are met.
  • Jira ticket has been updated (if needed) to match changes made during the development process.
  • Jira ticket has been updated (if needed) with suggestions for QA when this PR is deployed to stage.

@flozia flozia requested a review from flodolo as a code owner March 31, 2023 15:01
@flozia flozia force-pushed the MNTOR-1504-Add-links-to-websites branch from 6eaf2e5 to 858f19f Compare March 31, 2023 15:01
@flozia flozia force-pushed the MNTOR-1504-Add-links-to-websites branch from 858f19f to fdac319 Compare March 31, 2023 15:06
@flozia flozia requested review from Vinnl and toufali March 31, 2023 15:10
Copy link
Collaborator

@Vinnl Vinnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one blocking comment, unfortunately, though probably fairly easy to resolve :)

src/utils/breach-resolution.js Outdated Show resolved Hide resolved
@flozia flozia self-assigned this Mar 31, 2023
@pdehaan
Copy link
Collaborator

pdehaan commented Mar 31, 2023

Not sure if I mentioned this in another thread, but there is a small-ish risk of linking to external sites. A lot of the domains via HIBP are no longer resolving.

I looked at one point and think my scripts were guessing about 25% of the outbound domains were invalid.
I had a few minutes so I ended up rebuilding my tool: https://github.com/pdehaan/blurts-https-stats which conveniently pipes the report out to README.md if you just want a quick scroll. Or, the raw "alive"-vs-"dead" JSON can be found in https://github.com/pdehaan/blurts-https-stats/blob/main/checker.json.

Some very rough stats, if you like grepping JSON files:

  • "link": occurs 624× (which accounts for empty domains or duplicated domains)
  • "status": "dead" occurs 198× (31.7%)
  • "status": "alive" occurs 426× (68.2%)

BUT the misleading thing is that a lot of those broken 31% reported dead links still work, but the npm module is just reporting bad HTTPS certificates as broken (or getting caught by Cloudflare captchas).


PROTIP: If you want to play a fun game, we could WHOIS all the broken domains and see which domains are available to register, then pick them all up and redirect them to https://monitor.firefox.com.

Copy link
Collaborator

@flodolo flodolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes to make sure the rationale is clarified.

locales/en/breaches.ftl Outdated Show resolved Hide resolved
@flozia
Copy link
Collaborator Author

flozia commented Apr 3, 2023

Not sure if I mentioned this in another thread, but there is a small-ish risk of linking to external sites. A lot of the domains via HIBP are no longer resolving.

I looked at one point and think my scripts were guessing about 25% of the outbound domains were invalid. I had a few minutes so I ended up rebuilding my tool: https://github.com/pdehaan/blurts-https-stats which conveniently pipes the report out to README.md if you just want a quick scroll. Or, the raw "alive"-vs-"dead" JSON can be found in https://github.com/pdehaan/blurts-https-stats/blob/main/checker.json.

Some very rough stats, if you like grepping JSON files:

  • "link": occurs 624× (which accounts for empty domains or duplicated domains)
  • "status": "dead" occurs 198× (31.7%)
  • "status": "alive" occurs 426× (68.2%)

BUT the misleading thing is that a lot of those broken 31% reported dead links still work, but the npm module is just reporting bad HTTPS certificates as broken (or getting caught by Cloudflare captchas).

PROTIP: If you want to play a fun game, we could WHOIS all the broken domains and see which domains are available to register, then pick them all up and redirect them to https://monitor.firefox.com.

That’s interesting. Thank you so much for the tool @pdehaan and for listing the stats! As @flodolo mentioned there had been some back and forth on if we would like to show those links or not. We communicated the risks with PM but I’m raising this again to make sure we are aligned.

@pdehaan
Copy link
Collaborator

pdehaan commented Apr 3, 2023

@flozia To be clear, the broken links are already on the site on the details page, ie: https://monitor.firefox.com/breach-details/LeakedReality
If I click on the domain near the top I see a "Hmm. We’re having trouble finding that site." error page in Firefox Nightly. A WHOIS search shows the domain is registered, but maybe DNS just isn't set up anymore. Even curl shows an error trying to inspect the page:

curl -fL https://leakedreality.com # HTTPS
curl: (6) Could not resolve host: leakedreality.com

curl -fL http://leakedreality.com # try HTTP
curl: (6) Could not resolve host: leakedreality.com

echo $? # 6

… but I can see how adding a new [broken] CTA link would be frustrating and confusing to end users.


It took a few clicks, but https://monitor.firefox.com/breach-details/Abandonia2022 is another example. Clicking the domain link on the details page takes me to a "Unable to connect. An error occurred during a connection to abandonia.com." error page in Nightly. Changing https:// to vanilla http:// fixes the broken link, but probably not intuitive.

Or https://monitor.firefox.com/breach-details/GGCorp external domain link takes me to a scary error page (presumably because of a bad/invalid HTTPS cert):

Warning: Potential Security Risk Ahead
Nightly detected a potential security threat and did not continue to ggcorp.me. If you visit this site, attackers could try to steal information like your passwords, emails, or credit card details.

@mansaj
Copy link
Collaborator

mansaj commented Apr 3, 2023

@pdehaan Looking at the JSON generated by the script.. I noticed that lots of the sites marked as “dead” / 403s are accessible? Are these false positives?

@flozia flozia marked this pull request as draft April 3, 2023 16:09
@flozia flozia added the needs-PM label Apr 3, 2023
@pdehaan
Copy link
Collaborator

pdehaan commented Apr 3, 2023

@mansaj #2961 (comment) "I noticed that lots of the sites marked as “dead” / 403s are accessible? Are these false positives?"

I think the link-check module I used might be considering certificate errors to be "dead" links? So there were sites that were reported as dead, but clicking them showed me the website but there was some console logging and/or certificate issues that most people wouldn't really notice. Per tcort/link-check#47 (comment) there might also be some 403 issues if sites are behind Cloudflare or maybe other proxy services as well.

Per link-check README:

A link is said to be 'alive' if an HTTP HEAD or HTTP GET for the given URL eventually ends in a 200 OK response. To minimize bandwidth, an HTTP HEAD is performed. If that fails (e.g. with a 405 Method Not Allowed), an HTTP GET is performed. Redirects are followed.

GitHub search showed this as one of the very few results for "dead" in their codebase: https://github.com/tcort/link-check/blob/430027f6be03b904db508042945bf2a75472d330/lib/LinkCheckResult.js#L10

I can't think of another great solution. I tried using curl or native fetch instead but neither were great, both were super slow (these scripts take 10-20 minutes to scrape 640 domains w/ 10s timeouts). Another option might be using something like Playwright and trying to click the link from the /breach-details/* page and then taking a screenshot if it's a non-2xx response. If the screenshot shows something maybe we don't care. But if we get no response/bytes/pixels, maybe we consider that a bad link. Although I imagine that's still prone to Cloudflare sandboxes and other gotchas.

@flozia flozia force-pushed the MNTOR-1504-Add-links-to-websites branch from d7a1f84 to e930adb Compare April 5, 2023 14:41
locales/en/breaches.ftl Outdated Show resolved Hide resolved
@flozia flozia removed the needs-PM label Apr 5, 2023
@flozia flozia force-pushed the MNTOR-1504-Add-links-to-websites branch 3 times, most recently from 3e51d02 to a0290e5 Compare April 5, 2023 17:14
@flozia flozia force-pushed the MNTOR-1504-Add-links-to-websites branch from a0290e5 to 1b25b2c Compare April 5, 2023 17:23
@flozia
Copy link
Collaborator Author

flozia commented Apr 18, 2023

Thanks for your input, everyone! After this PR was on hold while aligning with PM and UX, I’m moving this out of “draft” mode.

  1. We decided to compile a first blocklist from @pdehaan’s https://github.com/pdehaan/blurts-domains-playwright/blob/main/stats-https.json. We will use the env variable HIBP_BREACH_LINK_BLOCKLIST which will include domains that do not resolve with 200.
  2. @flodolo The strings were updated and include b tags and a marker breached-company-link that can be replaced by a link — or stripped if it is one we do not want to show.

@flozia flozia marked this pull request as ready for review April 18, 2023 13:45
@flodolo
Copy link
Collaborator

flodolo commented Apr 18, 2023

2. @flodolo The strings were updated and include b tags and a marker breached-company-link that can be replaced by a link — or stripped if it is one we do not want to show.

When you say stripped, you mean that the markup is removed, but the text remains? i.e. it's become inactive text

Copy link
Collaborator

@flodolo flodolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(never mind my previous comment, the code is clear enough)

Copy link
Collaborator

@Vinnl Vinnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only easy fixes or things that can be ignored, I think :)

src/utils/breachResolution.js Show resolved Hide resolved
src/utils/breachResolution.js Outdated Show resolved Hide resolved
src/utils/breachResolution.js Outdated Show resolved Hide resolved
src/utils/breachResolution.js Show resolved Hide resolved
Comment on lines 188 to 189
case BreachDataTypes.Passwords:
case BreachDataTypes.SecurityQuestions: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference, so feel free to ignore, but I'd keep the code simple here and just replace <breached-company-link> in every string, rather than just the passwords and security questions resolutions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No hard preference from my side: I thought being a bit more specific here was OK since we were cautious about how and where we link out to. Less complex is also a good thing, though: d574bee.

// There should be a resolution for `BreachDataTypes.Phone`,
// `BreachDataTypes.Passwords` and `BreachDataTypes.SecurityQuestions`.
// The last two should fallback to a more generic header string that does not
// include the breached company's domain, which we don't know:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how easy it is to mock AppConstants, but if it's easy (but only then - you've been working on this PR for long enough), maybe a test for the blocklist would be a good addition?

Copy link
Collaborator Author

@flozia flozia Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I create an issue for this and will address this in a follow-up in order to not block this PR.

tsconfig.json Outdated Show resolved Hide resolved
@@ -58,6 +58,8 @@ HIBP_THROTTLE_DELAY=2000
HIBP_THROTTLE_MAX_TRIES=5
# Authorization token for HIBP to present to /hibp/notify endpoint
HIBP_NOTIFY_TOKEN=unsafe-default-token-for-dev
# Domains we prefer to not link to
HIBP_BREACH_DOMAIN_BLOCKLIST=a-blocked-domain.com,another-blocked-domain.org
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a limit how long an ENV var can be?
It feels like this could be REALLY long if we end up blocking over 20-30 domains.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to SRE, there is no limit that we would hit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it's better as an ENV var that has to be coordinated w/ SRE, versus maybe some JSON file that lives in the repo that is a single source of truth that we can audit occasionally. (unless there are reasons to keep it as a secret/ENV). 🤷
Although I guess I could technically recreate the blocklist locally by scraping the 650 breaches on the Monitor site and see if the outbound link is a link or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m still not sure about that as well, but one good argument for handling the list in the env is that we would be able to make adjustments without a release. Especially in the beginning, when we might need to test and audit the sites manually.

const args = {
companyName: b.Name,
breachedCompanyLink: showLink ? `https://${b.Domain}` : '',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: in my personal testing, I think http:// had better results than https:// (somewhere between 5-10% more 2xx/3xx results).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for the note. With us trying to be cautious where we link out to I think I’d feel more comfortable linking out to https://.

@flozia
Copy link
Collaborator Author

flozia commented Apr 19, 2023

(never mind my previous comment, the code is clear enough)

Thank you for your renewed review @flodolo. Unfortunately, there has been another change to the strings with two additions: A note on using 2FA and the link to Firefox Password Manager — sorry to burn your cycles on these.

@flozia flozia merged commit 268df94 into main Apr 19, 2023
10 checks passed
@flozia flozia deleted the MNTOR-1504-Add-links-to-websites branch April 19, 2023 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants