Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phishing API #84

Closed
mslipper opened this issue Sep 11, 2017 · 9 comments
Closed

Phishing API #84

mslipper opened this issue Sep 11, 2017 · 9 comments

Comments

@mslipper
Copy link

Hi all,

My company, Spectrum Labs, has an anti-phishing API that's tuned for ICO sites and BTC wallets. I spoke briefly with @FlySwatter last week about integrating it with eth-phishing-detect; he seemed interested and suggested that I open an issue here.

The API works using a combination of document similarity algorithms and machine learning. It's able to recognize new phishing threats regardless of the URL that's hosting them. For example, the API was able to detect yesterday's new 'linknetwork.co' phishing site without that site being on a blacklist or having a similar URL to 'smartcontract.com/link', the real site.

My team is perfectly happy to perform all the integration work required to get this into the MetaMask plugin. Are there any requirements (beyond blocking phishing attacks, of course) that I should be aware of? I understand that privacy is a concern. How can I best answer any privacy questions you may have?

Best,

Matthew

@409H
Copy link
Collaborator

409H commented Sep 12, 2017

The API works using a combination of document similarity algorithms and machine learning. It's able to recognize new phishing threats regardless of the URL that's hosting them.

I'm interested in how this is done. I assume you have an active whitelist that you're using? If so, how is that maintained?

@mslipper
Copy link
Author

@409H yes, part of our system uses an active whitelist. We periodically crawl trusted ICO aggregators to build the whitelist and manually prune/add entries as necessary. We're also working with @FlySwatter to add new ICO sites to the whitelist as MetaMask learns about them. Is there anything specific you'd like to know?

The value of the API comes not from our whitelist per se but rather the fact that once a site is whitelisted we can detect scams the moment users encounter them. This avoids the need to manage an ever-growing blacklist that can be trivially circumvented.

@409H
Copy link
Collaborator

409H commented Sep 12, 2017

Ahh, thats great!

How would you envision us (not just MetaMask) using your blacklists? On every request we send data to your server, or something else? And would there be any cost or other factors to think about?

@mslipper
Copy link
Author

The current thought is that each request would ping our server with the URL that's currently being viewed. I understand that there are some privacy concerns around that, so I think it'd make sense to make the feature opt-in. In the future (and, of course, depending on your buy-in) we might also do some client-side preprocessing of the onscreen data to improve performance. Rest assured, though, we'll explain what's being sent to our server every step of the way.

Regarding cost - we're currently in beta and you would be one of our first users. We won't charge anyone until we're confident that we're providing tangible value to your users. Once that happens we can discuss pricing. In sum: it won't be free forever but it won't cost anything until we know it's good :).

There's one last point I should mention: it would really help us if we could include a mechanism in this plugin that allows users to report false positives/negatives if they encounter them. That will help us update our algorithms as threats evolve. Does that sound like something we should include here or in the MetaMask plugin itself?

@409H
Copy link
Collaborator

409H commented Sep 12, 2017

Ah, thanks for the information.

This should be a discussion for this repository, as it's a library loaded by the MetaMask plugin (it's not MetaMasks intended purpose (AFAIK; I don't work on MetaMask, I just maintain the blacklists)).

cc: @kumavis (to read above discussion)

@kumavis
Copy link
Member

kumavis commented Sep 12, 2017

privacy is a big concern, perhaps that could be improved by (1) making it opt-in and (2) limiting the info sent such as only asking about web3-enabled sites

metamask is free software, its unlikley that we would pay for the service in the long term. would need a way for users to pay for the service themselves.

also, i think metamask early enough that we shouldn't be focusing much effort on off-by-default features right now

@mslipper
Copy link
Author

Got it. Seems like it's not the right time for you guys to integrate something like this. I'm going to close this issue out - let's stay in touch!

@kumavis kumavis closed this as completed Sep 22, 2017
@danfinlay
Copy link
Contributor

We've talked about this more, and the biggest issue right now is that we're about to be launching a new MetaMask platform, which requires even more pressing security work for our team than phishing.

That said, the right balance of user opt-in and privacy features could still make this an appealing path in the future, so I'm re-opening the issue to represent a desire to have an ongoing conversation, and continue exploring how your service could help use cases like ours.

A few other things we're interested in:

  • How much data is required to be sent to the API per site? How confidential is that data potentially?
  • How much of this system could be decentralized? Could we ship an obfuscated webassembly bundle that represents a recent form of your detector, so that the logic remains client-side?

@danfinlay danfinlay reopened this Sep 22, 2017
@mslipper
Copy link
Author

We only need the URL of the site that the user is visiting and an API key to authenticate with our system. We don't collect or store any data that could be used to identify individual people. Some URLs could potentially be sensitive - for example, a query string could contain search times that de-anonymize a user - however that can be easily alleviated by performing client-side sanitization of those URLs. API keys belong to a single customer, which in this case would be the MetaMask organization not your individual users.

Of course, we take data confidentiality extremely seriously. I come from a security background, so keeping sensitive data away from prying eyes is near and dear to my heart. We build products that follow industry-standard security practices. I'm happy to go into more depth here if you'd like, just not in a public GitHub comment since I don't want to make the job of any potential attackers easier.

It's going to be very difficult for us to bundle the detector model without essentially open-sourcing our entire product. The model also uses a number of features that are expensive to generate and cache client side, which could degrade the user experience of the plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants