[Enhancement] Add IPFS URL heuristic #4310

twesterhever · 2022-10-15T11:53:06Z

Given IPFS' popularity among miscreants for phishing hosting and malware dissemination, the presence of an URL containing both "ipfs" as well as a random string reminiscent of an IPFS content identifier is a strong sign of maliciousness (I have never seen a legitimate IPFS URL so far, certainly not in mail traffic).

Please note that while IPFS CIDv0 are easy to parse due to their fixed syntax, CIDv1 neither have a fixed length nor any other static character sets. To avoid miscreants bypassing this heuristic by increasing the size of hashing algorithms used, the CIDv1 rexep is rather fuzzy, catching anything alphanumeric between 45 and 256 characters. Most CIDv1s seen so far, however, stayed between 60 and 120 characters.

See https://docs.ipfs.tech/concepts/content-addressing/ for details on CIDs, and how to parse them.

vstakhov

The rule looks good itself, but I'd suggest to think about performance considerations

vstakhov · 2022-10-19T20:24:14Z

rules/regexp/misc.lua

+-- characters (CIDv0), or a CIDv1 of an alphanumerical string of unspecified length,
+-- depending on the hash algorithm used.
+local ipfs_cid = '/(qm[a-z0-9]{44}|[a-z0-9]{45,256})/{url}i'


This regexp will be very bad from the performance considerations for Hyperscan (and probably PCRE as well). I'd appreciate if we can use something else but {45,256} here.

Thank you. I will look into this, and try to come up with a less costly regexp for parsing IPFS CIDv1s.

@vstakhov

As requested by @vstakhov in rspamd#4310 (review), try to limit the performance impact of this regular expression. However, given that there does not seem to be a hard limit for CIDv1s in IPFS itself, using an hashing algorithm with large output my permit miscreants to get around this rule.

twesterhever · 2022-11-06T14:52:12Z

The rule looks good itself, but I'd suggest to think about performance considerations

Having worked through the CIDv1 specification, the only things we can do against the performance costs of this regexp getting out of hand are:

Check whether a possible CIDv1 string starts with a multibase prefix
Limit the anticipated size of the total CIDv1 to something like 128 bytes (I have seen 110-bytes CIDv1s in the wild, so anything shorter does not seem to make sense).

I added commits implementing these changes. @vstakhov: What do you think of it?

vstakhov · 2022-11-06T17:53:36Z

@citrin has some thoughts about this rule, so he will probably comment a little later. Thank you for working on that!

twesterhever · 2022-12-04T21:27:29Z

ping -

twesterhever · 2023-02-17T14:28:13Z

Are there further changes needed from my end? I continue to observe IPFS phishing frequently, and would love to see this rule making it into rspamd, to provide better detection to its users.

[Enhancement] Add IPFS URL heuristic

39aeb39

vstakhov requested changes Oct 19, 2022

View reviewed changes

twesterhever added 3 commits November 6, 2022 14:31

[Minor] Clarify that IPFS *gateway* URLs are likely considered malicious

9ac1a75

[Minor] Implement multibase prefixes for IPFS gateway URL rule

ac6d1a6

twesterhever added 2 commits November 6, 2022 17:49

[Minor] Fix rule comment

b178156

[Minor] Regexp is case-insensitive, omit redundant characters

2a9abee

Merge branch 'master' into temp-add-ipfs-heuristics

4dfb85f

vstakhov merged commit d31dde9 into rspamd:master Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add IPFS URL heuristic #4310

[Enhancement] Add IPFS URL heuristic #4310

twesterhever commented Oct 15, 2022

vstakhov left a comment

vstakhov Oct 19, 2022

twesterhever Oct 28, 2022

twesterhever commented Nov 6, 2022

vstakhov commented Nov 6, 2022

twesterhever commented Dec 4, 2022

twesterhever commented Feb 17, 2023

[Enhancement] Add IPFS URL heuristic #4310

[Enhancement] Add IPFS URL heuristic #4310

Conversation

twesterhever commented Oct 15, 2022

vstakhov left a comment

Choose a reason for hiding this comment

vstakhov Oct 19, 2022

Choose a reason for hiding this comment

twesterhever Oct 28, 2022

Choose a reason for hiding this comment

twesterhever commented Nov 6, 2022

vstakhov commented Nov 6, 2022

twesterhever commented Dec 4, 2022

twesterhever commented Feb 17, 2023