Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we support fast batch validation #1

Open
arj03 opened this issue Aug 3, 2022 · 5 comments
Open

Should we support fast batch validation #1

arj03 opened this issue Aug 3, 2022 · 5 comments

Comments

@arj03
Copy link
Member

arj03 commented Aug 3, 2022

As discussed here ssb:message/sha256/Xf2JIAYPmJJ_0SRVoVQn3NwlC636pQbXmJW1HjLIKQ4=, it could be interesting to add support for only validating the signature of the last message in a batch. This is similar to what buttwoo does. The question if how we go about this? Should it be a config option?

It is significantly faster to sync with this trick. See the numbers here compared to with this:

batching 1378
batching 1663
batching 1956
batching 2083
batching 2003
ok 1 wait for replication to complete
after 3000 ms bob has 9083
@staltz
Copy link
Member

staltz commented Aug 3, 2022

Isn't this just an internal optimization of validateBatch? I mean, if the API doesn't change, and there are no side effects (are there no side effects?) other than faster perf, why not do it?

@arj03
Copy link
Member Author

arj03 commented Aug 3, 2022

The biggest problem I can think of is that someone writes a new client and messes up the signatures so that they in rare cases don't work. If we sync with this new validateBatch we might not detect those invalid messages, while another client that does full validation will stop replication at the point of the first invalid message. In any case all clients using this buggy version would also have the full chain. So in a way I don't really see the big problem. Forking a feed is a much bigger problem that we can't really do anything about.

To expand on that a bit, in both cases you have a portion of the network with one version of a feed and another with a different version. And the only way to allow that feed to continue (if that is even a goal?) would be to come to some conclusion of what the common root chain was and then delete everything after that. This is what people do in the forking situation (since they are in the minority :-)).

@staltz
Copy link
Member

staltz commented Aug 4, 2022

validateBatch

Could the new validateBatch be:

  1. Validate only the last signature
  2. If it's valid, end.
  3. Else, loop over all messages in the batch and validate each signature

Forking

Ah, I see what you mean. Replication happens before validation, and the peer who uploaded the messages will now mark the downloader peer as being at message N. Worse, the downloader could forward the invalid messages to another EBT peer, before (???) validation.

We can pretty easily handle the case of an incorrect SSB app by applying validateSingle on the first ~10 messages of the feed, and then validateBatch on all the remaining ones. If you coded signature generation incorrectly, it's probably very rare that signatures are sometimes correct, sometimes incorrect.

@arj03
Copy link
Member Author

arj03 commented Aug 4, 2022

Could the new validateBatch be:
Validate only the last signature
If it's valid, end.
Else, loop over all messages in the batch and validate each signature

Well if the last one fails, then we are screwed anyway. No need to try and salvage any of the other messages I think. The nice thing about the throttle is that we basically gets this random sampling of points on the chain.

Ah, I see what you mean. Replication happens before validation, and the peer who uploaded the messages will now mark the downloader peer as being at message N.

Well they would always have to be validation locally first in any case. But yes as for replication you assume that if you send seq 100 for feed A to another, that the peer will have that, so you don't send again. But the next time you connect with that peer, it will tell you that it never got the messages (this is similar to a crash before saving), so you send them again. Again, this is not really a big problem I think, the same could happen if you forked your feed.

Worse, the downloader could forward the invalid messages to another EBT peer, before (???) validation.

I'm pretty sure we only send messages that we have in our db (meaning validated).

We can pretty easily handle the case of an incorrect SSB app by applying validateSingle on the first ~10 messages of the feed, and then validateBatch on all the remaining ones. If you coded signature generation incorrectly, it's probably very rare that signatures are sometimes correct, sometimes incorrect.

Sure we could do some validation of the first messages in the batch as well, but no matter what we do there i still the chance that we would accept messages that don't validate unless we validate everything. I'm just arguing that the risk of this is rather low, especially for a format like classic and that there are already other bigger problems like forking where some way of signalling the other end that a feed is borked would be nice.

@staltz
Copy link
Member

staltz commented Aug 4, 2022

Yeah, I got a bunch of details wrong, sorry.

About forking, I remember Aljoscha saying that fork recovery shouldn't be a thing because then you gain the ability to freely rewrite the past in whatever way you want, but I'm thinking that this is a very theoretical possibility, and maybe we should after all just implement fork recovery with a simple strategy like longest-fork-wins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants