Data Validation #314

MentalGear · 2024-04-04T10:24:56Z

MentalGear
Apr 4, 2024

Let me say: Impressive work on loro by the team!

I have been researching some other CRDT and OT libraries, and one of the main issues I found with all of them is data validation. One of the risks of the p2p model is that you just can't trust other peers and have to do validation on the client and/or on a central authoritative server. One risk is invalid data, another is if a peer would fill a data type with garbage data just to blow up storage space.

So to check whether the update from a peer contains valid data, we must apply the update to the whole dataset, check against our validation (JSON/zod) schema and then accept or reject it. The same goes for size limitations. This takes a lot of CPU time for frequent real-time updates.

I'm wondering what your approach to those are if you are going to tackle both those problems on a protocol level?

zxch3n · 2024-04-04T11:24:20Z

zxch3n
Apr 4, 2024
Maintainer

Hi!

In the current lib, we can prevent some attacks that exploit compression algorithms through fuzzing tests.

For scenarios where clients secretly store data to occupy storage, this can be addressed outside of this lib or the sync protocol. A simple approach is to directly limit the data cap for individual documents or per user.

In p2p scenarios, it is best to combine signatures to verify the permissions of editors. The primary guarantee should be consistency, so that other users and administrators can detect malicious behavior. With traceable history, we can undo the malicious actions to create a new document, and administrators can continue collaboration on the new document after excluding malicious users.

Even if it is not malicious, the disruption of schema can still occur. Schema validation may be best completed at the application layer, rather than ignoring the update, as this can easily disrupt consistency. For example, if there is a schema that requires a list to have only three elements, and there are already two elements, but A and B concurrently insert new elements a and b, if B is rejected according to some rule, there might be a parallel operation to B that deletes a, which should expose b. In this scenario, we can implement a schema-compliant method by reading at the application layer and slicing the first three elements.

In a centralized scenario, if there are validations such as user A not having permission to change the x field, and each element within a list should be text, we can run user-provided closures on the import function to reject certain imports. Do you have a strong demand for such requirements?

4 replies

ununian Jul 10, 2024

Hello,

Regarding the last part of your comment, if I want to prevent user A from modifying a certain field or from adding elements to an array, what do you mean by the import function? I currently have a strong need for this. In my scenario, I want to implement block-level permission control (read, write) in my collaborative editor. Currently, I am using Yjs, but I can only prevent actions at the application layer and cannot prohibit users from making changes at the data or protocol layer.

If I want to verify user permissions both locally and on a central server, do you have any good suggestions? I believe this is a common requirement.

zxch3n Jul 10, 2024
Maintainer

Hi @ununian,

If I want to verify user permissions both locally and on a central server, do you have any good suggestions?

This can be done in the application code by checking the permission before applying the edits. The server side can reject the updates from clients if the updates are unauthorized. You also need to revert the client back to the version before performing the unauthorized edits.

It's complicated. I'm not sure whether there is a more elegant solution.

ununian Jul 10, 2024

Hi @zxch3n

thx reply

I am currently doing this, but the permissions can only be set for the entire document (Y.Doc in Yjs). When a user has write permission for this Y.Doc, they can edit its data freely. Although it's possible to prevent them from modifying a specific Map at the application layer, if someone intends to attack, they can still use JavaScript to alter any data. What I want to achieve now is to prevent modifications to a particular Map even if the user has permission for the entire document. Additionally, this validation needs to be able to run on the server side. I'm not concerned about rollback because if I detect an attack, I can simply close the connection.

zxch3n Jul 10, 2024
Maintainer

@ununian

they can still use JavaScript to alter any data

In this case, it's malicious behavior. The CRDT lib can offer a function to check the affected paths by an import before applying it, so you can ignore all the imports from the malicious clients. We don't have an API for this yet, but it's not hard to add one.

You don't even need to notify the malicious clients, so it would be difficult for them to tell whether the attacks succeeded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Validation #314

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Data Validation #314

MentalGear Apr 4, 2024

Replies: 1 comment · 4 replies

zxch3n Apr 4, 2024 Maintainer

ununian Jul 10, 2024

zxch3n Jul 10, 2024 Maintainer

ununian Jul 10, 2024

zxch3n Jul 10, 2024 Maintainer

MentalGear
Apr 4, 2024

Replies: 1 comment 4 replies

zxch3n
Apr 4, 2024
Maintainer

zxch3n Jul 10, 2024
Maintainer

zxch3n Jul 10, 2024
Maintainer