Phishing API #84

mslipper · 2017-09-11T17:30:39Z

Hi all,

My company, Spectrum Labs, has an anti-phishing API that's tuned for ICO sites and BTC wallets. I spoke briefly with @FlySwatter last week about integrating it with eth-phishing-detect; he seemed interested and suggested that I open an issue here.

The API works using a combination of document similarity algorithms and machine learning. It's able to recognize new phishing threats regardless of the URL that's hosting them. For example, the API was able to detect yesterday's new 'linknetwork.co' phishing site without that site being on a blacklist or having a similar URL to 'smartcontract.com/link', the real site.

My team is perfectly happy to perform all the integration work required to get this into the MetaMask plugin. Are there any requirements (beyond blocking phishing attacks, of course) that I should be aware of? I understand that privacy is a concern. How can I best answer any privacy questions you may have?

Best,

Matthew

409H · 2017-09-12T11:11:12Z

The API works using a combination of document similarity algorithms and machine learning. It's able to recognize new phishing threats regardless of the URL that's hosting them.

I'm interested in how this is done. I assume you have an active whitelist that you're using? If so, how is that maintained?

mslipper · 2017-09-12T17:01:42Z

@409H yes, part of our system uses an active whitelist. We periodically crawl trusted ICO aggregators to build the whitelist and manually prune/add entries as necessary. We're also working with @FlySwatter to add new ICO sites to the whitelist as MetaMask learns about them. Is there anything specific you'd like to know?

The value of the API comes not from our whitelist per se but rather the fact that once a site is whitelisted we can detect scams the moment users encounter them. This avoids the need to manage an ever-growing blacklist that can be trivially circumvented.

409H · 2017-09-12T17:58:17Z

Ahh, thats great!

How would you envision us (not just MetaMask) using your blacklists? On every request we send data to your server, or something else? And would there be any cost or other factors to think about?

mslipper · 2017-09-12T18:39:42Z

The current thought is that each request would ping our server with the URL that's currently being viewed. I understand that there are some privacy concerns around that, so I think it'd make sense to make the feature opt-in. In the future (and, of course, depending on your buy-in) we might also do some client-side preprocessing of the onscreen data to improve performance. Rest assured, though, we'll explain what's being sent to our server every step of the way.

Regarding cost - we're currently in beta and you would be one of our first users. We won't charge anyone until we're confident that we're providing tangible value to your users. Once that happens we can discuss pricing. In sum: it won't be free forever but it won't cost anything until we know it's good :).

There's one last point I should mention: it would really help us if we could include a mechanism in this plugin that allows users to report false positives/negatives if they encounter them. That will help us update our algorithms as threats evolve. Does that sound like something we should include here or in the MetaMask plugin itself?

409H · 2017-09-12T18:52:21Z

Ah, thanks for the information.

This should be a discussion for this repository, as it's a library loaded by the MetaMask plugin (it's not MetaMasks intended purpose (AFAIK; I don't work on MetaMask, I just maintain the blacklists)).

cc: @kumavis (to read above discussion)

kumavis · 2017-09-12T19:09:11Z

privacy is a big concern, perhaps that could be improved by (1) making it opt-in and (2) limiting the info sent such as only asking about web3-enabled sites

metamask is free software, its unlikley that we would pay for the service in the long term. would need a way for users to pay for the service themselves.

also, i think metamask early enough that we shouldn't be focusing much effort on off-by-default features right now

mslipper · 2017-09-15T23:55:55Z

Got it. Seems like it's not the right time for you guys to integrate something like this. I'm going to close this issue out - let's stay in touch!

danfinlay · 2017-09-22T19:09:14Z

We've talked about this more, and the biggest issue right now is that we're about to be launching a new MetaMask platform, which requires even more pressing security work for our team than phishing.

That said, the right balance of user opt-in and privacy features could still make this an appealing path in the future, so I'm re-opening the issue to represent a desire to have an ongoing conversation, and continue exploring how your service could help use cases like ours.

A few other things we're interested in:

How much data is required to be sent to the API per site? How confidential is that data potentially?
How much of this system could be decentralized? Could we ship an obfuscated webassembly bundle that represents a recent form of your detector, so that the logic remains client-side?

mslipper · 2017-09-22T21:33:01Z

We only need the URL of the site that the user is visiting and an API key to authenticate with our system. We don't collect or store any data that could be used to identify individual people. Some URLs could potentially be sensitive - for example, a query string could contain search times that de-anonymize a user - however that can be easily alleviated by performing client-side sanitization of those URLs. API keys belong to a single customer, which in this case would be the MetaMask organization not your individual users.

Of course, we take data confidentiality extremely seriously. I come from a security background, so keeping sensitive data away from prying eyes is near and dear to my heart. We build products that follow industry-standard security practices. I'm happy to go into more depth here if you'd like, just not in a public GitHub comment since I don't want to make the job of any potential attackers easier.

It's going to be very difficult for us to bundle the detector model without essentially open-sourcing our entire product. The model also uses a number of features that are expensive to generate and cache client side, which could degrade the user experience of the plugin.

409H added the discussion label Sep 12, 2017

kumavis closed this as completed Sep 22, 2017

danfinlay reopened this Sep 22, 2017

mslipper closed this as completed Sep 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phishing API #84

Phishing API #84

mslipper commented Sep 11, 2017

409H commented Sep 12, 2017

mslipper commented Sep 12, 2017

409H commented Sep 12, 2017

mslipper commented Sep 12, 2017

409H commented Sep 12, 2017

kumavis commented Sep 12, 2017

mslipper commented Sep 15, 2017

danfinlay commented Sep 22, 2017

mslipper commented Sep 22, 2017

Phishing API #84

Phishing API #84

Comments

mslipper commented Sep 11, 2017

409H commented Sep 12, 2017

mslipper commented Sep 12, 2017

409H commented Sep 12, 2017

mslipper commented Sep 12, 2017

409H commented Sep 12, 2017

kumavis commented Sep 12, 2017

mslipper commented Sep 15, 2017

danfinlay commented Sep 22, 2017

mslipper commented Sep 22, 2017