Data Validation #314
Replies: 1 comment 4 replies
-
Hi! In the current lib, we can prevent some attacks that exploit compression algorithms through fuzzing tests. For scenarios where clients secretly store data to occupy storage, this can be addressed outside of this lib or the sync protocol. A simple approach is to directly limit the data cap for individual documents or per user. In p2p scenarios, it is best to combine signatures to verify the permissions of editors. The primary guarantee should be consistency, so that other users and administrators can detect malicious behavior. With traceable history, we can undo the malicious actions to create a new document, and administrators can continue collaboration on the new document after excluding malicious users. Even if it is not malicious, the disruption of schema can still occur. Schema validation may be best completed at the application layer, rather than ignoring the update, as this can easily disrupt consistency. For example, if there is a schema that requires a list to have only three elements, and there are already two elements, but A and B concurrently insert new elements a and b, if B is rejected according to some rule, there might be a parallel operation to B that deletes a, which should expose b. In this scenario, we can implement a schema-compliant method by reading at the application layer and slicing the first three elements. In a centralized scenario, if there are validations such as user A not having permission to change the x field, and each element within a list should be text, we can run user-provided closures on the import function to reject certain imports. Do you have a strong demand for such requirements? |
Beta Was this translation helpful? Give feedback.
-
Let me say: Impressive work on loro by the team!
I have been researching some other CRDT and OT libraries, and one of the main issues I found with all of them is data validation. One of the risks of the p2p model is that you just can't trust other peers and have to do validation on the client and/or on a central authoritative server. One risk is invalid data, another is if a peer would fill a data type with garbage data just to blow up storage space.
So to check whether the update from a peer contains valid data, we must apply the update to the whole dataset, check against our validation (JSON/zod) schema and then accept or reject it. The same goes for size limitations. This takes a lot of CPU time for frequent real-time updates.
I'm wondering what your approach to those are if you are going to tackle both those problems on a protocol level?
Beta Was this translation helpful? Give feedback.
All reactions