-
Notifications
You must be signed in to change notification settings - Fork 6
Concurrent version history - versidag #50
Comments
Note that this API is stateless. I will make another proposal using a stateful API shortly to see how they compare. |
Here's a proposal for a stateful version of the API: versidagimport { createVersidag, resumeVersidag } from 'versidag';
// Creates a new instance, without any versions
const versidag = createVersidag({
// deterministic comparison of versidag nodes, should return -1, 1 or 0
// this is used to sort concurrent versions
comparator: (version1, version2) => {},
// reads a versidag node from the underlying storage
read: (versidagNodeCid) => // versidagNode
// writes a versidag node to the underlying storage
write: (versidagNode) => // versidagNodeCid
});
// Creates a new instance, but resuming from a previously known versidagNodeCid
const versidag = resumeVersidag(versidagNodeCid, /* same config object as `createVersidag` */);
// Adds a new version, where `version` can be any type
versidag.add(version) // -> the new versidag node cid
// Merges two or more versidag nodes
// Useful when two replicas forked from a common version
versidag.merge([versidags... or versidagNodeCids...]) // -> the new versidag node cid
// Resolves all the versions, returning a reverse ordered array of versions
// The limit option limits the amount of versions to retrieve, diminishing the number of traverse hops
// The `fromNodeCid` option allows to fetch starting from that node
versidag.resolve({ limit: 5, fromNodeCid }) // -> { versions, nextNodeCid }
// Gets the current versidag cid
versidag.getNodeCid(); ipfs-versidagWrapper to import { createVersidag, resumeVersidag } from 'ipfs-versidag';
const versidag = createVersidag({
// The ipfs instance
ipfs,
// Deterministic comparison, should return -1, 1 or 0
comparator: (version1, version2) => {},
}); I don't want to spoil anyone's feedback, but I prefer the stateful API. |
I like it, maybe use Re stateless vs stateful, don't have a strong opinion. Don't see huge wins in any. Stateful seems a bit more ergonomic when using on a daily basis. On the other hand, with the stateless version, there might be a marginal memory footprint improvement for cases where you have large amounts (thousands+) of versioned dags that share the same base configuration, maybe useful for some more advanced use cases of complex data structures. Even then, I would argue that in those cases, then we should probably go the extra mile, and allow providing the |
Could you say a little more about the use-case? Can you show how it'd be used in discussify? What would it look like if we used it in jenkins to track website release versions like https://github.com/ipfs/jenkins-libs/blob/9c3778503a6dc0ef1147dbff7cf043d032846bd1/vars/website.groovy#L113-L115 |
@olizilla sure, I will try to explain Discussify use-case: Lets say that I've created a comment with Now, I could store this history/versions in the CRDT itself, but it would make the state unnecessarily large. The main role of the CRDT is to provide the comments hierarchy and not the history. That's why we don't store the comments themselves in the CRDT and instead simply store the cid for the comment, which in turn contains the One solution would be to use another CRDT to store the history, but this has a problem: that CRDT could be out-of-sync with the main one. For instance, @pgte suggested using some sort of side-chain for this, which culminated in embracing IPLD to store the versions. With this solution, we just store the heads in the main CRDT, keeping the CRDT small. In fact, the I though that this problem is common enough to worth the effort of creating a library. Here's an example: const versidag = createVersidag({ ... });
const versidagNodeCid = await versidagA.add({ commentCid: 'xxx', timestamp: Date.now() });
// `versidagNodeCid` is then stored in the CRDT and will be the only head
const versidagNodeCid = /* read versidagNodeCid from CRDT */
const versidag = resumeVersidag(versidagNodeCid, { ... });
const versidagNodeCid = await versidagA.add({ commentCid: 'xxx', timestamp: Date.now() });
// The head will be updated to `versidagNodeCid` in the CRDT
At this point, the CRDT contains two different heads: X and Y. This is when For reference, the I hope I made things clearer. If not, we can jump into a call so that I can explain in more detail. |
I agree that a versioning API would likely make life easier for development. A couple of things to think about: You can already use a DAG as a versioned data structure with every sibling node being a branch/concurrent edit, this means that we don't need any new data structures just an API for examining and merging branches. I'm actually using this technique to synchronize changes in a multiwriter environment. I'm not sure how well I understand the Discussify use cases. If I understood correctly Discussify uses CRDTs, so the merge process is very simple (e.g. any time there's a branch it can be replaced with a single line of nodes ordered with any deterministic sort). If this is the case it might be more useful to create an event driven versioning API that wraps an event driven data channel API so that when you receive an update it is automatically merged and an event is emitted. Happy to talk more about this over GitHub or a call. |
That's exactly what this proposal uses underneath, a DAG.
Correct 👍
I haven't though about using an event driven approach. Would you have some time to sketch it out? |
Sure, this week is turning into spec week for me. I'll try and get to it today/tomorrow. |
Also @magik6k may have some input from the IGiS side of things (where versions are tracked, but there is no automerge function). |
So to apply this to git commit history usecase I think this would need 2 things:
cc @dirkmc |
👍 For IGiS we also have the comments use case - when someone leaves comments on a PR or issue they can edit their own comment |
Also worth noting that there are some similarities with OrbitDB log: |
@dirkmc yes there are a lot of similarities. The differences I spot are:
|
@satazor I was thinking that an event-based versidag could be useful in the case of listening to a channel for updates and then processing them as if they were a versioned document. I'm not sure exactly which use cases are most applicable for you, so please chime in with any suggestions and changes. versidag-eventimport { createVersidag, resumeVersidag } from 'versidag';
// Creates a new instance, without any versions
const versidag = createVersidag({
// deterministic comparison of versidag nodes, should return -1, 1 or 0
// this is used to sort concurrent versions
comparator: (version1, version2) => {},
// reads a versidag node from the underlying storage
read: (versidagNodeCid) => // versidagNode
// writes a versidag node to the underlying storage
write: (versidagNode) => // versidagNodeCid
// receives updates of the form (parentCid, childCid) and performs writes
update: (updater) => {}
// registers an event handler that takes as input (parentCid, childCid, childSiblingNumber)
addUpdateReceivingHandler: (handler) => {}
});
// Creates a new instance, but resuming from a previously known versidagNodeCid
const versidag = resumeVersidag(versidagNodeCid, /* same config object as `createVersidag` */);
// Adds a new version, where `version` can be any type
versidag.add(version) // -> the new versidag node cid
// Gets the current versidag cid
versidag.getNodeCid();
// receives batches of updates
versidag.receiveUpdate([versidags... or versidagNodeCids...]) => {}
// Returns the versidagNode of the parent/previous version
versidag.previous() // -> the previous/parent versidag node
// Returns an array of the changes/children
versidag.next() // -> { versidagNode[] }
// Deep resolves all the versions, returning a reverse ordered array of versions
// The limit option limits the amount of versions to retrieve, diminishing the number of traverse hops
// The `fromNodeCid` option allows to fetch starting from that node
versidag.resolve({ limit: 5, fromNodeCid }) // -> { versions, nextNodeCid } Notice that |
Hi everyone 👋 As the original author of ipfs-log, I would like to provide some thoughts on what's discussed here with the hope that, perhaps, there's good grounds to collaborate. What's been discussed is very much the same ideas and questions we've gone through over the years in A log is a beautiful data structure and enables a multitude of use cases, comes with a bunch of flexibility for how things on top of a log can be implemented and the properties it provides are very useful in building p2p applications and systems. As with all data structures and algorithms, there are many trade-offs to be considered when designing an implementation. Reading the proposals and discussion here, there are many, almost direct, similarities with First, I think it's important to identify and define what a "log" is: (a deterministic) ordering for events. That's all. However, when combined with Merkle-DAGs, we gain some extremely useful (and I would argue required) properties when it comes to peer-to-peer networks: integrity, verifiability and replication method with automatic deduplication (of log entries). Anything else: other data structures or models, histories, CRDTs, "custom" or dedicated merge operations/events, etc., I believe, should be a level up in the abstractions. Second, I think it's important to recognize that an implementation can have different user APIs, as in what and how users calls the data structure to work with it. For example, a log can provide an iterator-based API, events, streams, and more. They all have different UX and ergonomics from the user perspective and may involve different trade-offs in terms of performance ("raw throughput"), memory usage, etc. To compare
This was one asked in ipfs-log issues, but I never got around to answer it, so here goes: why ipfs-log doesn't use IPLD? IPLD was still an idea (can't remember exact timings, perhaps there was a spec) when the work on Some of these, or all, may have changed and we'd be happy to explore the possibility of using IPLD in
In short, there's no reason
We use Lamport Clocks in Re. merging: what
You're correct, we keep the entries in memory. This is a consicous trade-off we've made, but again, there's no reason the log needs to be stored in memory. The trade-off is two-fold: 1) keeping them in-memory allows for faster reads (as we don't need to traverse the log and especially hit the disk to read entries from IPFS) and 2) keeping the entries in memory gives us the ability to do efficient comparisons of "known" and "unknown" entries when joining two logs together. We opted to optimize for high write-throughput at the cost of memory usage and read speed and that made us to choose keeping the log in memory. In real-life the memory consumption hasn't become a problem yet, but obviously it can and to enable more use cases we're currently considering changing this. This could be a very concrete thing to collaborate on, but again, there are trade-offs to be made and they need to be thoroughly and carefully considered. (As a side, I personally feel the user should be able to make those trade-offs, but it's not always possible to provide that to users)
If I understand the "read-only mode" here correctly, in relation to the
This is exactly what we do currently in ipfs-log's sorting function LastWriteWins.
While Considering all these perspectives, I believe there would be a lot of synergies in collaborating and improving It might also be interesting to think of the "version histories" as higher level "data model" that builds on a log (similar to how OrbitDB's stores are done): while the log has a deterministic and "opinionated" sorting, traversal, loading, pointers to the child nodes etc., a "VersionHistory" data model/structure could provide a more "version history" specific functionality and API. For example, the payload of a log entry could contain something like I hope this helps to understand I think it could be, and improving it would help the wider community tremendously. As many projects are building on it (by proxy), we'd be very happy and grateful to have more people collaborate on it ❤️ 🙏 |
@haadcode Thanks for shimming into this issue, your comment was very valuable. I see an opportunity for both
If If this looks reasonable, we can collaborate and work together on I've finished the initial implementation of @aschmahmann This initial version is similar to the API I wrote above. Neverthless, we may explore event-driven approaches and add them to the API, or even write a new module on top of this one. |
@satazor looks good to me. I need something similar on the Go side. I'm currently exploring interfaces as described at https://github.com/ipld/replication/pull/3/files and applying it to multiwriter IPNS as described https://github.com/aschmahmann/ipshare/blob/master/sync/MultiWriterIPNS.md. They look pretty similar though which is certainly a good thing. We should probably talk at some point about how closely we want the APIs to mirror each other. |
@satazor thanks for the reply! 👍
I tried to convey in my previous reply that I believe what you're after does exist today in
Where do you feel If you want, we can do a more real-time chat on Gitter to discuss details and clarify. Feel free to ping me on Gitter! |
In short, here's the requirements:
*1 Imagine the following dag:
If one tries to merge B, D and E, the merge node should NOT contain a reference to B. This is because D and B are non-concurrent heads. This allows us to have deterministic merges amongst replicas that just know their heads and not the full dag. If this wasn't done at all, they would be in an infinite loop of merging. @haadcode Could you please analyze if it's easy to change |
Regarding this issue, both https://github.com/ipfs-shipyard/js-versidag and https://github.com/ipfs-shipyard/js-ipfs-versidag are ready! Should we close this issue and keep discussing in a new issue or just keep this open? |
@satazor that's great! I say we keep this one open to discuss collaboration with ipfs-log.. |
I've created a doodle so that we can agree on a time to have a call: https://doodle.com/poll/n7rpk36cy7x9wthk @haadcode and everyone interested, please fill it. 🕐 |
Would be happy to jump to a call and go through everything discussed here. Filled in the doodle, thanks @satazor for coordinating! Will answer your comment above in more detail either in the call or before/after in here (short, non-complete answer is: yes, it should be able to all that right now, save for some possible changes/additions). Looking forward to discuss more! ❤️ 👍 |
We have a match for tomorrow 11:00 gmt, I will create an event at the calendar. See you tomorrow! |
@haadcode could you please email me or dm me on IRC/twitter your email? |
@satazor reached out to you on Twitter. |
First of all, sorry @aschmahmann and Mark, there was a confusion about the exact timezone and we had to shift one hour earlier otherwise @haadcode wouldn't make it. Me, @pgte and @haadcode chatted about this matter and there were two possible solutions:
We decided to go for the
@haadcode hopefully I didn't forget anything. How should we proceed next? Create an issue for each of those things in |
Let me know when those issues are created so that I can start contributing and also close this issue. |
Closing as the issues were created in ipfs-log. I will be examining the ipfs-log codebase and be contributing to those issues. Thanks everyone for participating in this. |
Hello!
A common requirement for projects is to have concurrent version history. This is a use-case in Discussify where one must have the ability to view a comment history (updates & removes). Instead of resolving this common problem in an adhoc manner, what if we created a module that can be reused across projects?
After brainstorming with @pgte, where's what I came up with:
versidag
A library that helps you create versions of a value of any type.
Underneath, it creates a DAG of versidag nodes. The way that concurrent versions get ordered is up to the developer via the
comparator
function. It's important that entries added via.add()
contain information for you to use in the comparator.Note that the library is agnostic to IPFS.
Here's how a normal versidag node looks like:
Here's how a merge versidag node looks like:
ipfs-versidag
Wrapper to
versidag
with thewrite
andread
functions already configured:It would be awesome to get some feedback on this!
The text was updated successfully, but these errors were encountered: