Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify how to handle entries & operations in a deleted document #258

Closed
sandreae opened this issue Sep 4, 2022 · 3 comments
Closed

Specify how to handle entries & operations in a deleted document #258

sandreae opened this issue Sep 4, 2022 · 3 comments

Comments

@sandreae
Copy link
Member

sandreae commented Sep 4, 2022

When the entries in a log are part of document which has been deleted, for storage saving/sparse replication we would want to remove as much of this log as possible. However, we need to still keep track of used logs, we can't re-use this log id. So we need to garbage collect while retaining "used log information". Some ideas:

  • just delete all the operations but leave the entries
  • delete operations and all non-cert-pool entries
  • delete everything and consider how to deal with this when selecting next log id (gaps in the existing log ids would have to be accounted for, as we assume they are where deleted logs were)
@sandreae
Copy link
Member Author

sandreae commented Sep 14, 2022

I'd like to include in the spec a strategy for handling the entries and operations contained in a deleted document. I believe there is a good strategy which is a compromise between respecting data deletion requests & retaining the log id ordering we need. I'll flesh out my proposal below. I think what I'm suggesting is a little harder than we've been imagining, probably because we love pinned relations ;-p I like the idea of starting from "delete means delete" (as far as that is possible) and then refining that to not make pinned relations broken and annoying.

PROPOSAL:

When a document is found to contain a DELETE operation a node SHOULD:

  • delete all operations associated with this document for every author
  • delete all entries except where seq_num == 1 for every author

The exception to this rule are documents which follow a system schema, these MUST NOT be purged as described above.

This would mean that the document is in effect "gone" (it's CREATE operation has been deleted) and any incoming operations targeting this document would be rejected. A node would no longer replicate entries associated with this document on the network. I used SHOULD in the requirement above as this leaves a little flexibility to the node to retain, for example, known about pinned relations.

PROS:

  • well.... deletion!
  • it's a simple approach but still rewards us with full payload deletion and a fair amount of entry deletion.
  • we don't break our rule for strictly incrementing log IDs

CONS:

  • we lose all document views for a deleted document, pinned relations would break which refer to these views (this is a PRO depending on how you look at it I suppose).
  • we lose metadata about the document (is deleted)

ALTERNATIVE APPROACHES:

  • retain the DELETE operation/payload and entry plus a route from it to the root of the document (this needs a lot more unpacking). This means we would retain meta data info about a document (it existed once, and is now deleted) . It means more complexity though and could be looked at / added later.

FUTURE:

  • this could be adapted in the future to allow nodes to make more fine grained decisions about deletion, as discussed here

@sandreae sandreae changed the title Specify log "trimming" strategy for deleted documents Specify how to handle entries & operations in a deleted document Sep 14, 2022
@sandreae
Copy link
Member Author

sandreae commented Sep 14, 2022

Actually, in order to make sure the deletion request propagates/persists on the network, we probably need to retain the DELETE operation so it can be replicated further. This means the "alternative approach" might actually be required.

@sandreae
Copy link
Member Author

This is now specified under requirement OP9 https://p2panda.org/specification/data-types/operations so I will close this issue.

However, the specification needs further consideration and so I will open a new issue to discuss that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants