A few people have noted that SHA-1 is probably overkill.
EKR points out:
Considering the fact that N and P are both defined to be 5 bits, it would be preferable to use a hash function that emits a 62-bit or longer value.
xxHash seems good, but see Cyan4973/xxHash#49.
Likewise, the other ones mentioned there are also defined by their source.
I think our options are:
I'm kind of leaning towards (4), possibly leaving extensibility for other hash algorithms (although, as discussed before, it's not clear how we'd negotiate for that; maybe #22 could help).
Don't know what the relevance of #22 is.
I would say that this seems like a good basis for 2: https://github.com/aappleby/smhasher/wiki/MurmurHash3 These are necessarily very simple programs to specify. A draft shouldn't be hard to write if that's necessary. Of course, I'm lazy just like you.
I take it you're volunteering, then...
"Internet-Draft seeks editor, must like long walks on the beach, sunsets and have poor impulse control."
Fat fingers, bah.
My tendency goes to 4.
If we are to actually choose a simpler algorithm than SHA, I think the choice should better be made based on the popularity and the quality of the hash function.
And for that respect, SipHash 2-4 might be a good candidate. It is known to be fairly fast, and is used as a basis by many hash table implementations (thanks to its resistance to collision).
I am not sure how collisions can be used to attack Cache Digests, but it might still be a good idea to have resistance considering the fact that an attacker can insert a digest (by controlling the victim's browser to access a specific resource on the target's website (by using IMG links, etc.).
Any attack on cache digests is likely to be an advance on cache-timing attacks where the attacker tries to learn whether a particular resource is in the cache.
Say that we have a secret in a URI path and the attacker wants to learn that secret. If the resource is ordinarily pushed alongside a parent document AND neither the parent document nor the secret resource are cacheable you have a way to exploit things AND the 404 page on the site IS cacheable then you have a situation that can be exploited.
The attacker includes the parent document in an iframe and measures the time to load the document when the secret is pushed. The attacker then iteratively tests different URIs by forcing a load of a non-existent, but potentially colliding resource then reloading the iframe. A collision will be apparent when the iframe load takes significantly longer (one round trip). At that point, the attacker need only generate hash pre-images (since none of the fast hashes claims to have pre-image resistance, this isn't that difficult given sufficient constraints on the shape of the secret) until it finds the secret.
This has a false positive rate that would have to be corrected by a more rigorous test on potential candidates, but it seems like it could be feasible given enough time. Everyone give thanks to @w3c/hr-time for making this possible!
Similarly, if the pushed resource is cached, then the attacker might find a way for the server to server and conditionally push it resources with a URI component that is attacker controlled. That would enable a faster attack too.
This attack might be used to recover whose twitter feed someone has viewed. Even though user images are bucketed in size to resist traffic analysis, this sort of attack relies on a different side channel that I doubt has been properly safeguarded.
This is rendered less effective by choosing a hash with pre-image resistance. It's still relatively easy to engineer a collision (particularly since cache digests are necessarily very narrow), but the hash can't be reversed to find the URL. It could still be used as a confirmation step. While the space of potential twitter image URLs is obscenely large, the set accounts that are interesting to an attacker might be easily enumerable (if not all accounts if it comes to certain resourceful attackers).
Code & results: