Skip to content
This repository has been archived by the owner on Jun 6, 2023. It is now read-only.

make sphinx replay detection more time efficient #52

Open
david415 opened this issue Feb 17, 2019 · 6 comments
Open

make sphinx replay detection more time efficient #52

david415 opened this issue Feb 17, 2019 · 6 comments

Comments

@david415
Copy link
Member

very obviously the current sphinx replay detection is not as fast as it could be. certainly the current design is very simple; that is it's only virtue.

@burdges
Copy link

burdges commented Feb 17, 2019

At some point, I concluded a cuckoo filter might support inserts into a large but still in memory table faster than bloom filters, due to only requiring two cache misses, while bloom filter inserts always require k.

@david415
Copy link
Member Author

fair enough optimization and we can do that for sure... however i am thinking of a much more complicated optimization. currently our sphinx replay detection is described in full detail in this document here:
https://github.com/katzenpost/docs/blob/master/specs/sphinx_replay_detection.rst

currently we actually fully process the sphinx packet before doing replay detection whereas we can first match the entire packet via some kind of hash map or something first... but only insert into the replay cache after the DH and MAC validation. in other words, we can have multiple fast paths which are faster than the current fastest path.

@burdges
Copy link

burdges commented Feb 18, 2019

Oh? I'd believed replay protection should check exclusively the key exchange result because replay attacks might be on the key exchange itself, and attackers must obtain the correct key exchange result anyways. I think tricks like HDKD cannot be secure for mixnets anyways, so you cannot gain anything by doing the whole header.

I suppose MAC validation and decryption could happen concurrently, maybe improving performance, but one could imagine instead doing key exchange and replay protection in SGX or whatever, and later doing everything else outside SGX, so maybe I'd structure the code that way.

@david415
Copy link
Member Author

what does HDKD mean?
actually we do gain something by matching the whole header and maybe even the whole packet. we gain a fast path. we can easily hash the packet and then compare that output with a hashmap. that would be way faster than doing a DH and then a MAC.

as stated in that document i linked above, what we currently do now is completely process the entire sphinx packet including the payload and then match the replay tag. slow as fuck.

and yes we would also check the replay of the key exchange. nothing i said is mutually exclusive with matching replays of the DH shared secrets. it's what we do now.

@Yawning
Copy link
Contributor

Yawning commented Feb 18, 2019

Sigh.

So, the design rationale behind the current algorithm includes but is not limited to:

  • It's the simplest possible algorithm that:
    • Has zero false negatives (probably a non-negotiable invariant).
    • Has zero false positives (this might be ok to change).
    • Supports an extremely large numbers of entries (uses where this isn't a requirement, performance doesn't matter).
  • Certain people flipped out when I proposed faster alternatives back in the design phase.
  • An early-reject fast-path is only a net gain when the packet is a replay (because that saves work). If the fraction of packets that are replays is low (or non-existent), adding such a thing is a waste of CPU and memory, because you need to do the full packet processing regardless.

@david415
Copy link
Member Author

@Yawning I like the current design without the early-reject fast-path, however the one obvious way it could be sped up is to do the replay cache lookup/insert after the DH and MAC but before the SPRP.

alternatively the packet's public key could be used in the replay cache lookup before the DH and MAC... but only inserted into the replay cache after the DH and MAC. it would probably make the code a bit messy perhaps. maybe that's why there was resistance from certain people during the design phase.

@burdges um yeah we could use a cuckoo filter... we certainly do not need the deletion feature... but having insertions be faster would be good. I don't understand why you mentioned doing MAC validation and decryption concurrently. I've tried previously to explain to you that in a proper software router all of these operations should happen in a thread pool so there is already plenty concurrency happening. i'll say it again... if you are writing routers with tokio or whatever async thing then you are doing it wrong. here read this: http://www.sosp.org/2001/papers/welsh.pdf

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants