Specify how we deal with new operations on deleted documents #172

adzialocha · 2022-05-28T07:54:54Z

what happens when a deleted document receives update operations? as it is deleted its fields have no value by definition but does the view id still change? or is it not possible to apply any operation to a deleted document?
what if there are multiple delete operations? does each one create a new document view id or is the document "frozen" after the first one?

Originally posted by @cafca in #170 (review)

The text was updated successfully, but these errors were encountered:

adzialocha · 2022-05-28T07:55:53Z

what happens when a deleted document receives update operations? as it is deleted its fields have no value by definition but does the view id still change? or is it not possible to apply any operation to a deleted document?

what if there are multiple delete operations? does each one create a new document view id or is the document "frozen" after the first one?

Maybe the rule should be: Consider a document deleted as soon as any DELETE operation was sighted during materialization.

Another rule: It's not allowed to publish a DELETE or UPDATE operation pointing at a DELETE operation as the previous one.

With these rules there can't be two DELETE operations within one operation graph branch, neither an UPDATE after a DELETE - but in separte, multiple branches it might still happen. In this scenario our winning hash will be sorted last in the topological ordering meaning that if some DELETE or UPDATE branches arrive later, they might change the order and indeed add another document view id.

We can't ignore future operations operations and freeze the id, as that would lead to indeterministic behavior across nodes (nodes might wrongly arrive at different state even though they have the same operations, they just came in in different order). Also, it should not be a problem, as document view ids is something for pinned relations only as far as I can tell right now?

It's an interesting behavior as "deleted" documents can somehow still be "alive". It doesn't change the document itself, neither it affects relations, so I think that all is good here, they will just be deleted in any case. Pinned relations are different though and might still go on by pointing at a branch of the operation graph which never seen a DELETE operation ever, even though one exists somewhere else in another branch .. pinned relations are immutable and we can't suddenly make data "deleted" retroactively in that case.

We can and maybe should add a third rule which should reject new entries from a client if a node knows the related document got deleted, but that's mainly just a nice warning, of course if the node didn't know that yet we might still arrive at that above scenario anyhow.

I guess with this you could engineer a scenario where a schema consists of pinned relations and users might use that to keep documents alive, even though they are officially deleted.

Originally posted by @adzialocha in #170 (comment)

sandreae · 2022-05-29T18:24:27Z

Yeh, soooo much to talk about in this topic. It's really interesting!

I wonder what situations causes a DELETE operation to arrive, and what the expected behaviour (from the user side) would be. If I wrote a wiki page, edited it a few times, realised I didn't like it, and pressed "delete" I would expect it to be deleted, removed from wherever it was being stored (with a little, "are you sure you want to delete this?" warning).

How we discuss it above is more like "freeze" than a delete. As in "this document doesn't accept updates anymore", "but if you know the last document-view-id then it is still alive". I'm the first to say how amazing immutable document-views are, but maybe the expected behaviour is actually that we remove the views as well? It might not be so incompatible in the case of application data (obvs, deleting published schemas is a bit different...). Flagging a document as deleted could trigger a request to delete all payloads except for the final DELETE (would that actually work?? Not sure...).

sandreae · 2022-05-29T18:33:05Z

Makes me think there should be two variants: "DELETE" and "PURGE" (or something like that), the first being a fairly light affair, you don't see the document anymore, but all operations are retained, the latter being a network-wide removal request, all operations should eventually be deleted for this document, meaning document-views are also inaccessible.

cafca · 2022-06-01T10:59:48Z

This whole thing feels like a question that asks us to return to unwritten first principles. What is guiding us in p2panda design in general?

One principle that comes to mind in this context is permissionlessness, that is: a public key doesn't need permission to publish any operation. And the flipside: materialisers can choose to ignore any contribution they don't want. Both of these might still have social repercussions.

Thinking about the simplest scenario, publishing a delete operation signals an intention to outright remove this content. If other peers in the network choose to honor this request fully they purge, which means:

remove all materialised views for this document
cease to materialise updates on this document
delete all entries and payloads pertaining to it.
replicate the deletion request to other peers

Aside: If applications have a use case for a lighter form of deletion where content is merely hidden instead of deleted (delete vs purge) I think that would be better served by having a hidden boolean in the schema that can be set in order to hide content without deleting it.

However, coming back to actual deletion there still are valid reasons for other peers to not take all or even any of these steps.

The deleting public key is on a block list.
The deleting public key belong to a rogue author that's part of a key group and they deleted a document without consensus in the group. Everyone else doesn't want to delete the document.
There is a more authoritative signal to keep the content. The scenario I am thinking about is moderation, where if you want to keep a transparent moderation log with reasons for moderation action, you may want to also keep an original copy of the object of moderation. Say you are kicking someone out of the chatroom for writing an offensive message, you want to hide the offending message but also keep a copy of it around in order to justify the kicking out to someone who wasn't around when the offensive message was visible. The message's author shouldn't be able to hide evidence by sending a delete operation.
It is more important for integrity to keep at least some materialised view from the document than to honor the request. This is the npm left-pad scenario. Do we really want to give schema authors the power to render all content inaccessible that was using a schema by deleting the schema definition document?

I think that these scenarios show that we need to offer ways of telling the node "please don't honor deletion of these documents" but also we want to allow building nodes that don't have so much complexity and deal with deletion in a simpler way.

Suggestion

I would suggest having the following rules as should and not must in the spec and always allowing implementations to just outright purge documents and their views (per the permissionless principle) when they see a deletion. We don't need to build all this outright but we put it in the spec because we want to build it eventually.

When a document contains a valid delete operation, the whole document is marked deletion requested internally, which is queryable by clients through document metadata. The field contents of all of its views are not contained in regular listings anymore and only accessible by setting a query parameter to include recently deleted documents/views. This is because we expect good intentions by default and most deletes will be honorable even if separate branches exist. Further update and delete operations on separate branches are still processed for the reasons described by @adzialocha above.
A regular clean-up job purges to be deleted documents, including other branches, and in a non-recoverable way 7 days after deletion - unless either:
a) the deleting public key has been removed from the key group in the meantime or their delete operation is moderated by the group, when that happens the document is reinstated immediately.
b) an "archive" flag for this document has been published by a node operator to prevent e.g. removal of evidence. Node operators are public keys that are set in the node configuration. We create a system schema for publishing these flags (either specifically for this or it becomes part of a more generic moderation system schema?). In this case the document can still be "deleted" by its author, which removes it from regular views, but it will never be "purged" by this specific node and stays accessible when setting a special query parameter as described in 1).

Once a document has been purged the publishEntry endpoint returns an error when trying to publish new entries for it and replication also ceases. If an "archive" flag is received after purging, the node may pick up replication again to help with the archiving.
The node configuration allows overriding author's deletion authority for specific schemas and documents. For example it could be configured that schema definition documents can never be deleted. This should be public information to enable a discussion about node configurations that don't honor deletion requests when they really should.

adzialocha · 2022-08-23T08:32:17Z

We reject currently
SHOULD is nicer (see cafcas comment)
Later this can be configured

sandreae · 2022-08-31T09:47:25Z

This is my proposal for specifying how we deal with deletion at this point, it introduces DO5 & D10 which I think cover the points raised above in a simple way. Can be expanded on later for more complex scenarios.

#248

adzialocha added this to the Brewing ideas hot pot 🍲 milestone May 28, 2022

adzialocha added Operations labels May 28, 2022

adzialocha mentioned this issue May 28, 2022

Update documents-instances.md #170

Merged

adzialocha mentioned this issue Jul 11, 2022

Validation and data flows [TRACKING] #126

Closed

8 tasks

adzialocha mentioned this issue Aug 23, 2022

Specify how materialization deals when detecting DELETE operations in document graph #194

Closed

sandreae modified the milestones: Brewing ideas hot pot 🍲, Core specification Aug 31, 2022

sandreae mentioned this issue Aug 31, 2022

Finally specify document deletion #251

Closed

sandreae self-assigned this Aug 31, 2022

sandreae linked a pull request Aug 31, 2022 that will close this issue

Document deletion requirements #248

Merged

adzialocha closed this as completed in #248 Sep 2, 2022

sandreae mentioned this issue Sep 14, 2022

Specify how to handle entries & operations in a deleted document #258

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify how we deal with new operations on deleted documents #172

Specify how we deal with new operations on deleted documents #172

adzialocha commented May 28, 2022

adzialocha commented May 28, 2022 •

edited

sandreae commented May 29, 2022

sandreae commented May 29, 2022

cafca commented Jun 1, 2022 •

edited

adzialocha commented Aug 23, 2022

sandreae commented Aug 31, 2022

Specify how we deal with new operations on deleted documents #172

Specify how we deal with new operations on deleted documents #172

Comments

adzialocha commented May 28, 2022

adzialocha commented May 28, 2022 • edited

sandreae commented May 29, 2022

sandreae commented May 29, 2022

cafca commented Jun 1, 2022 • edited

Suggestion

adzialocha commented Aug 23, 2022

sandreae commented Aug 31, 2022

adzialocha commented May 28, 2022 •

edited

cafca commented Jun 1, 2022 •

edited