Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify how we deal with new operations on deleted documents #172

Closed
Tracked by #126
adzialocha opened this issue May 28, 2022 · 6 comments · Fixed by #248
Closed
Tracked by #126

Specify how we deal with new operations on deleted documents #172

adzialocha opened this issue May 28, 2022 · 6 comments · Fixed by #248
Assignees

Comments

@adzialocha
Copy link
Member

  • what happens when a deleted document receives update operations? as it is deleted its fields have no value by definition but does the view id still change? or is it not possible to apply any operation to a deleted document?
  • what if there are multiple delete operations? does each one create a new document view id or is the document "frozen" after the first one?

Originally posted by @cafca in #170 (review)

@adzialocha
Copy link
Member Author

adzialocha commented May 28, 2022

  • what happens when a deleted document receives update operations? as it is deleted its fields have no value by definition but does the view id still change? or is it not possible to apply any operation to a deleted document?
  • what if there are multiple delete operations? does each one create a new document view id or is the document "frozen" after the first one?

Maybe the rule should be: Consider a document deleted as soon as any DELETE operation was sighted during materialization.

Another rule: It's not allowed to publish a DELETE or UPDATE operation pointing at a DELETE operation as the previous one.

With these rules there can't be two DELETE operations within one operation graph branch, neither an UPDATE after a DELETE - but in separte, multiple branches it might still happen. In this scenario our winning hash will be sorted last in the topological ordering meaning that if some DELETE or UPDATE branches arrive later, they might change the order and indeed add another document view id.

We can't ignore future operations operations and freeze the id, as that would lead to indeterministic behavior across nodes (nodes might wrongly arrive at different state even though they have the same operations, they just came in in different order). Also, it should not be a problem, as document view ids is something for pinned relations only as far as I can tell right now?

It's an interesting behavior as "deleted" documents can somehow still be "alive". It doesn't change the document itself, neither it affects relations, so I think that all is good here, they will just be deleted in any case. Pinned relations are different though and might still go on by pointing at a branch of the operation graph which never seen a DELETE operation ever, even though one exists somewhere else in another branch .. pinned relations are immutable and we can't suddenly make data "deleted" retroactively in that case.

We can and maybe should add a third rule which should reject new entries from a client if a node knows the related document got deleted, but that's mainly just a nice warning, of course if the node didn't know that yet we might still arrive at that above scenario anyhow.

I guess with this you could engineer a scenario where a schema consists of pinned relations and users might use that to keep documents alive, even though they are officially deleted.

Originally posted by @adzialocha in #170 (comment)

@sandreae
Copy link
Member

Yeh, soooo much to talk about in this topic. It's really interesting!

I wonder what situations causes a DELETE operation to arrive, and what the expected behaviour (from the user side) would be. If I wrote a wiki page, edited it a few times, realised I didn't like it, and pressed "delete" I would expect it to be deleted, removed from wherever it was being stored (with a little, "are you sure you want to delete this?" warning).

How we discuss it above is more like "freeze" than a delete. As in "this document doesn't accept updates anymore", "but if you know the last document-view-id then it is still alive". I'm the first to say how amazing immutable document-views are, but maybe the expected behaviour is actually that we remove the views as well? It might not be so incompatible in the case of application data (obvs, deleting published schemas is a bit different...). Flagging a document as deleted could trigger a request to delete all payloads except for the final DELETE (would that actually work?? Not sure...).

@sandreae
Copy link
Member

Makes me think there should be two variants: "DELETE" and "PURGE" (or something like that), the first being a fairly light affair, you don't see the document anymore, but all operations are retained, the latter being a network-wide removal request, all operations should eventually be deleted for this document, meaning document-views are also inaccessible.

@cafca
Copy link
Member

cafca commented Jun 1, 2022

This whole thing feels like a question that asks us to return to unwritten first principles. What is guiding us in p2panda design in general?

One principle that comes to mind in this context is permissionlessness, that is: a public key doesn't need permission to publish any operation. And the flipside: materialisers can choose to ignore any contribution they don't want. Both of these might still have social repercussions.

Thinking about the simplest scenario, publishing a delete operation signals an intention to outright remove this content. If other peers in the network choose to honor this request fully they purge, which means:

  • remove all materialised views for this document
  • cease to materialise updates on this document
  • delete all entries and payloads pertaining to it.
  • replicate the deletion request to other peers

Aside: If applications have a use case for a lighter form of deletion where content is merely hidden instead of deleted (delete vs purge) I think that would be better served by having a hidden boolean in the schema that can be set in order to hide content without deleting it.

However, coming back to actual deletion there still are valid reasons for other peers to not take all or even any of these steps.

  • The deleting public key is on a block list.
  • The deleting public key belong to a rogue author that's part of a key group and they deleted a document without consensus in the group. Everyone else doesn't want to delete the document.
  • There is a more authoritative signal to keep the content. The scenario I am thinking about is moderation, where if you want to keep a transparent moderation log with reasons for moderation action, you may want to also keep an original copy of the object of moderation. Say you are kicking someone out of the chatroom for writing an offensive message, you want to hide the offending message but also keep a copy of it around in order to justify the kicking out to someone who wasn't around when the offensive message was visible. The message's author shouldn't be able to hide evidence by sending a delete operation.
  • It is more important for integrity to keep at least some materialised view from the document than to honor the request. This is the npm left-pad scenario. Do we really want to give schema authors the power to render all content inaccessible that was using a schema by deleting the schema definition document?

I think that these scenarios show that we need to offer ways of telling the node "please don't honor deletion of these documents" but also we want to allow building nodes that don't have so much complexity and deal with deletion in a simpler way.

Suggestion

I would suggest having the following rules as should and not must in the spec and always allowing implementations to just outright purge documents and their views (per the permissionless principle) when they see a deletion. We don't need to build all this outright but we put it in the spec because we want to build it eventually.

  1. When a document contains a valid delete operation, the whole document is marked deletion requested internally, which is queryable by clients through document metadata. The field contents of all of its views are not contained in regular listings anymore and only accessible by setting a query parameter to include recently deleted documents/views. This is because we expect good intentions by default and most deletes will be honorable even if separate branches exist. Further update and delete operations on separate branches are still processed for the reasons described by @adzialocha above.

  2. A regular clean-up job purges to be deleted documents, including other branches, and in a non-recoverable way 7 days after deletion - unless either:
    a) the deleting public key has been removed from the key group in the meantime or their delete operation is moderated by the group, when that happens the document is reinstated immediately.
    b) an "archive" flag for this document has been published by a node operator to prevent e.g. removal of evidence. Node operators are public keys that are set in the node configuration. We create a system schema for publishing these flags (either specifically for this or it becomes part of a more generic moderation system schema?). In this case the document can still be "deleted" by its author, which removes it from regular views, but it will never be "purged" by this specific node and stays accessible when setting a special query parameter as described in 1).

    Once a document has been purged the publishEntry endpoint returns an error when trying to publish new entries for it and replication also ceases. If an "archive" flag is received after purging, the node may pick up replication again to help with the archiving.

  3. The node configuration allows overriding author's deletion authority for specific schemas and documents. For example it could be configured that schema definition documents can never be deleted. This should be public information to enable a discussion about node configurations that don't honor deletion requests when they really should.

@adzialocha
Copy link
Member Author

  1. We reject currently
  2. SHOULD is nicer (see cafcas comment)
  3. Later this can be configured

@sandreae
Copy link
Member

This is my proposal for specifying how we deal with deletion at this point, it introduces DO5 & D10 which I think cover the points raised above in a simple way. Can be expanded on later for more complex scenarios.

#248

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants