Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

Concurrent version history - versidag #50

Closed
satazor opened this issue Nov 20, 2018 · 32 comments
Closed

Concurrent version history - versidag #50

satazor opened this issue Nov 20, 2018 · 32 comments

Comments

@satazor
Copy link

satazor commented Nov 20, 2018

Hello!

A common requirement for projects is to have concurrent version history. This is a use-case in Discussify where one must have the ability to view a comment history (updates & removes). Instead of resolving this common problem in an adhoc manner, what if we created a module that can be reused across projects?

After brainstorming with @pgte, where's what I came up with:


versidag

A library that helps you create versions of a value of any type.

Underneath, it creates a DAG of versidag nodes. The way that concurrent versions get ordered is up to the developer via the comparator function. It's important that entries added via .add() contain information for you to use in the comparator.

Note that the library is agnostic to IPFS.

import createVersidag from 'versidag';

const versidag = createVersidag({
  // deterministic comparison of versidag nodes, should return -1, 1 or 0
  // this is used to sort concurrent versions
  comparator: (version1, version2) => {},
  // reads a versidag node from the underlying storage
  read: (versidagNodeCid) => // versidagNode
  // writes a versidag node to the underlying storage
  write: (versidagNode) => // versidagNodeCid
});

// Adds a new version, where `version` can be any type
// The `parentVersidagNodeId` is optional in case this is the first version
versidag.add(version, parentVersidagNodeCid) // -> The new versidagNodeCid

// Merges two or more versidag nodes
// Useful when two replicas forked from a common version
versidag.merge([versidagNodeIds...]) // -> The new versidagNodeCid

// Resolves all the versions given a versidag cid, returning a reverse ordered array of versions
// The limit option limits the amount of versions to retrieve, diminishing the number of traverse hops
versidag.resolve(versidagNodeCid, { limit: 5 }) // -> { versions, nextNodeCid }

Here's how a normal versidag node looks like:

{
    parents: [A],
    version: <version>,
}

Here's how a merge versidag node looks like:

{
    parents: [B, C],
}

ipfs-versidag

Wrapper to versidag with the write and read functions already configured:

import createIpfsVersidag from 'ipfs-versidag';

const versidag = createIpfsVersidag({
  // The ipfs instance
  ipfs,
  // Deterministic comparison, should return -1, 1 or 0
  comparator: (version1, version2) => {},
});

It would be awesome to get some feedback on this!

@satazor
Copy link
Author

satazor commented Nov 20, 2018

Note that this API is stateless. I will make another proposal using a stateful API shortly to see how they compare.

@satazor
Copy link
Author

satazor commented Nov 20, 2018

Here's a proposal for a stateful version of the API:

versidag

import { createVersidag, resumeVersidag } from 'versidag';

// Creates a new instance, without any versions
const versidag = createVersidag({
  // deterministic comparison of versidag nodes, should return -1, 1 or 0
  // this is used to sort concurrent versions
  comparator: (version1, version2) => {},
  // reads a versidag node from the underlying storage
  read: (versidagNodeCid) => // versidagNode
  // writes a versidag node to the underlying storage
  write: (versidagNode) => // versidagNodeCid
});
// Creates a new instance, but resuming from a previously known versidagNodeCid
const versidag = resumeVersidag(versidagNodeCid, /* same config object as `createVersidag` */);

// Adds a new version, where `version` can be any type
versidag.add(version) // -> the new versidag node cid

// Merges two or more versidag nodes
// Useful when two replicas forked from a common version
versidag.merge([versidags... or versidagNodeCids...]) // -> the new versidag node cid

// Resolves all the versions, returning a reverse ordered array of versions
// The limit option limits the amount of versions to retrieve, diminishing the number of traverse hops
// The `fromNodeCid` option allows to fetch starting from that node
versidag.resolve({ limit: 5, fromNodeCid }) // -> { versions, nextNodeCid }

// Gets the current versidag cid
versidag.getNodeCid();

ipfs-versidag

Wrapper to versidag with the write and read functions already configured:

import { createVersidag, resumeVersidag } from 'ipfs-versidag';

const versidag = createVersidag({
  // The ipfs instance
  ipfs,
  // Deterministic comparison, should return -1, 1 or 0
  comparator: (version1, version2) => {},
});

I don't want to spoil anyone's feedback, but I prefer the stateful API.

@satazor satazor changed the title versidag Concurrent version history - versidag Nov 20, 2018
@marcooliveira
Copy link
Contributor

I like it, maybe use create and resume instead of createVersidag and resumeVersidag? It's already contextualised, seems a bit redundant, and I can rename if I want when importing.

Re stateless vs stateful, don't have a strong opinion. Don't see huge wins in any.

Stateful seems a bit more ergonomic when using on a daily basis.

On the other hand, with the stateless version, there might be a marginal memory footprint improvement for cases where you have large amounts (thousands+) of versioned dags that share the same base configuration, maybe useful for some more advanced use cases of complex data structures. Even then, I would argue that in those cases, then we should probably go the extra mile, and allow providing the comparator, read and write functions on a per-operation basis, and we would completely lose the concept of version dag instance, making it more efficient, at the expense of ergonomy.

@olizilla
Copy link

Could you say a little more about the use-case? Can you show how it'd be used in discussify? What would it look like if we used it in jenkins to track website release versions like https://github.com/ipfs/jenkins-libs/blob/9c3778503a6dc0ef1147dbff7cf043d032846bd1/vars/website.groovy#L113-L115

@satazor
Copy link
Author

satazor commented Nov 20, 2018

@olizilla sure, I will try to explain Discussify use-case:

Lets say that I've created a comment with foo. Later, I've edited the comment concurrently in two difference devices generating two different versions: one with foo 1 and another with foo 2. By concurrent, I don't mean only at the same point in time but also other scenarios, such as when both devices were offline. When the devices meet again, they will converge by the rules of the underlying CRDT. In case of Discussify, the most recent one wins and it's the one that will be shown visually. Nevertheless, the history should contain 3 entries, foo, foo 1 and foo 2. More specifically, when the user clicks to view the history of that comment, it should see exactly 3 entries by that order.

Now, I could store this history/versions in the CRDT itself, but it would make the state unnecessarily large. The main role of the CRDT is to provide the comments hierarchy and not the history. That's why we don't store the comments themselves in the CRDT and instead simply store the cid for the comment, which in turn contains the author, body and other data. The history should also live outside of the main CRDT so that it remains as small as possible to diminish the sync time between replicas.

One solution would be to use another CRDT to store the history, but this has a problem: that CRDT could be out-of-sync with the main one. For instance, foo 2 could not yet be in your local replica, causing visual inconsistencies.

@pgte suggested using some sort of side-chain for this, which culminated in embracing IPLD to store the versions. With this solution, we just store the heads in the main CRDT, keeping the CRDT small. In fact, the cid and updatedAt fields can be removed as they are redundant.

I though that this problem is common enough to worth the effort of creating a library. Here's an example:

  1. @satazor on device A enters a discussion
  2. @satazor on device A creates a new comment:
const versidag = createVersidag({ ... });

const versidagNodeCid = await versidagA.add({ commentCid: 'xxx', timestamp: Date.now() });
// `versidagNodeCid` is then stored in the CRDT and will be the only head
  1. @satazor on device B enters a discussion and receives the state from device A
  2. After a few hours, and while offline, @satazor edits the comment in both the devices:
const versidagNodeCid = /* read versidagNodeCid from CRDT */

const versidag = resumeVersidag(versidagNodeCid, { ... });
const versidagNodeCid = await versidagA.add({ commentCid: 'xxx', timestamp: Date.now() });
// The head will be updated to `versidagNodeCid` in the CRDT
  1. After a day, both @satazor devices become online and sync together

At this point, the CRDT contains two different heads: X and Y. This is when .merge() gets called to generate a single head. The .merge() will occur in both devices, but since it's deterministic, the return value of the merge will have the same versidag cid. Afterwards, the CRDT will be updated to have just the merged versidag cid.

For reference, the comparator would compare the timestamps and fallback to compare the cids.


I hope I made things clearer. If not, we can jump into a call so that I can explain in more detail.

@aschmahmann
Copy link
Collaborator

I agree that a versioning API would likely make life easier for development. A couple of things to think about:

You can already use a DAG as a versioned data structure with every sibling node being a branch/concurrent edit, this means that we don't need any new data structures just an API for examining and merging branches. I'm actually using this technique to synchronize changes in a multiwriter environment.

I'm not sure how well I understand the Discussify use cases. If I understood correctly Discussify uses CRDTs, so the merge process is very simple (e.g. any time there's a branch it can be replaced with a single line of nodes ordered with any deterministic sort). If this is the case it might be more useful to create an event driven versioning API that wraps an event driven data channel API so that when you receive an update it is automatically merged and an event is emitted.

Happy to talk more about this over GitHub or a call.

@satazor
Copy link
Author

satazor commented Nov 20, 2018

You can already use a DAG as a versioned data structure with every sibling node being a branch/concurrent edit, this means that we don't need any new data structures just an API for examining and merging branches. I'm actually using this technique to synchronize changes in a multiwriter environment.

That's exactly what this proposal uses underneath, a DAG.

I'm not sure how well I understand the Discussify use cases. If I understood correctly Discussify uses CRDTs, so the merge process is very simple (e.g. any time there's a branch it can be replaced with a single line of nodes ordered with any deterministic sort).

Correct 👍

If this is the case it might be more useful to create an event driven versioning API that wraps an event driven data channel API so that when you receive an update it is automatically merged and an event is emitted.

I haven't though about using an event driven approach. Would you have some time to sketch it out?

@aschmahmann
Copy link
Collaborator

Sure, this week is turning into spec week for me. I'll try and get to it today/tomorrow.

@aschmahmann
Copy link
Collaborator

Also @magik6k may have some input from the IGiS side of things (where versions are tracked, but there is no automerge function).

@magik6k
Copy link

magik6k commented Nov 21, 2018

So to apply this to git commit history usecase I think this would need 2 things:

  • Read-only mode (merge shouldn't be needed as history already has common head)
  • Ability to filter non-commit objects out (as an option to versidag.resolve)

cc @dirkmc

@dirkmc
Copy link
Collaborator

dirkmc commented Nov 23, 2018

👍

For IGiS we also have the comments use case - when someone leaves comments on a PR or issue they can edit their own comment

@dirkmc
Copy link
Collaborator

dirkmc commented Nov 23, 2018

Also worth noting that there are some similarities with OrbitDB log:
https://github.com/orbitdb/ipfs-log

@satazor
Copy link
Author

satazor commented Nov 26, 2018

@dirkmc yes there are a lot of similarities. The differences I spot are:

  • versidag is agnostic to IPFS
  • versidag is not opinionated in how the the entries are merged (ipfs-log uses vectorclocks)
  • ipfs-log doesn't use the ipld but ipfs-versidag will; this allows use to use the dag explorer to see the version history
  • ipfs-log seems to store more stuff in-memory than it needs, e.g.: if 10k stuff are appended to the log they are kept in memory (I can be wrong, but it looks like it by analyzing the code)

@aschmahmann
Copy link
Collaborator

@satazor I was thinking that an event-based versidag could be useful in the case of listening to a channel for updates and then processing them as if they were a versioned document. I'm not sure exactly which use cases are most applicable for you, so please chime in with any suggestions and changes.

versidag-event

import { createVersidag, resumeVersidag } from 'versidag';

// Creates a new instance, without any versions
const versidag = createVersidag({
  // deterministic comparison of versidag nodes, should return -1, 1 or 0
  // this is used to sort concurrent versions
  comparator: (version1, version2) => {},
  // reads a versidag node from the underlying storage
  read: (versidagNodeCid) => // versidagNode
  // writes a versidag node to the underlying storage
  write: (versidagNode) => // versidagNodeCid


  // receives updates of the form (parentCid, childCid) and performs writes
  update: (updater) => {}
  // registers an event handler that takes as input (parentCid, childCid, childSiblingNumber)
  addUpdateReceivingHandler: (handler) => {}
});

// Creates a new instance, but resuming from a previously known versidagNodeCid
const versidag = resumeVersidag(versidagNodeCid, /* same config object as `createVersidag` */);

// Adds a new version, where `version` can be any type
versidag.add(version) // -> the new versidag node cid

// Gets the current versidag cid
versidag.getNodeCid();


// receives batches of updates
versidag.receiveUpdate([versidags... or versidagNodeCids...]) => {}

// Returns the versidagNode of the parent/previous version
versidag.previous() // -> the previous/parent versidag node

// Returns an array of the changes/children
versidag.next() // -> { versidagNode[] }

// Deep resolves all the versions, returning a reverse ordered array of versions
// The limit option limits the amount of versions to retrieve, diminishing the number of traverse hops
// The `fromNodeCid` option allows to fetch starting from that node
versidag.resolve({ limit: 5, fromNodeCid }) // -> { versions, nextNodeCid }

Notice that merge is missing. The idea here is that merge is simply an extension on processing updates that is done differently in CRDTs, OT, and non-automatic resolving systems like Git. Instead the more important features are stepping through the updates, being able to retrieve a list of next (or deeply resolve next) updates, and pushing new updates.

@haadcode
Copy link

haadcode commented Dec 2, 2018

Hi everyone 👋

As the original author of ipfs-log, I would like to provide some thoughts on what's discussed here with the hope that, perhaps, there's good grounds to collaborate. What's been discussed is very much the same ideas and questions we've gone through over the years in ipfs-log and I feel it'd be great if we, as members of the IPFS community, could find a way to combine our ideas, learnings and efforts to provide a solid building block for wider community to use. I also don't want to impose my/our (orbitdb community) perspective, decisions and solutions on anyone, so take it all for what it's worth.

A log is a beautiful data structure and enables a multitude of use cases, comes with a bunch of flexibility for how things on top of a log can be implemented and the properties it provides are very useful in building p2p applications and systems. As with all data structures and algorithms, there are many trade-offs to be considered when designing an implementation.

Reading the proposals and discussion here, there are many, almost direct, similarities with ipfs-log. I'll try to offer some background and clarifications here as to what and why we've done things the way we've done them in ipfs-log. It has served us well for the years, but there's still a lot we can and will improve.

First, I think it's important to identify and define what a "log" is: (a deterministic) ordering for events. That's all. However, when combined with Merkle-DAGs, we gain some extremely useful (and I would argue required) properties when it comes to peer-to-peer networks: integrity, verifiability and replication method with automatic deduplication (of log entries). Anything else: other data structures or models, histories, CRDTs, "custom" or dedicated merge operations/events, etc., I believe, should be a level up in the abstractions.

Second, I think it's important to recognize that an implementation can have different user APIs, as in what and how users calls the data structure to work with it. For example, a log can provide an iterator-based API, events, streams, and more. They all have different UX and ergonomics from the user perspective and may involve different trade-offs in terms of performance ("raw throughput"), memory usage, etc.

To compare ipfs-log and what's proposed and discussed here, I'll go through some of the individual points or comments made.

ipfs-log doesn't use the ipld but ipfs-versidag will; this allows use to use the dag explorer to see the version history

This was one asked in ipfs-log issues, but I never got around to answer it, so here goes: why ipfs-log doesn't use IPLD?

IPLD was still an idea (can't remember exact timings, perhaps there was a spec) when the work on ipfs-log started. It wasn't fully fleshed out and there wasn't an implementation to use. The IPFS object API provided the needed components to create the log and we've been happy to use it ever since. Second part to it is, that IPLD comes with a specific resolver/fetching algorithm which (correct me if I'm wrong) can't be controlled on the user level, and we need the ability to implement different algorithms in order to make the trade-offs (and optimizations) we make. For example, parallelizing the loading of two separate "branches" and verification of entry signatures as the log is traversed. Third, we never had the user need for example to "explore the DAG" like one can do with IPLD, so moving to IPLD was never a huge priority. Fourth, working with work-in-progress and alpha-stage software can be a huge hit on productivity on user level as you end up fixing things on lower levels and as IPLD was still being worked on at the time the main work on ipfs-log was done, for me personally, it was more important to be able to move forward quickly.

Some of these, or all, may have changed and we'd be happy to explore the possibility of using IPLD in ipfs-log (given thorough consideration to the aforementioned, and other, trade-offs).

versidag is agnostic to IPFS

ipfs-log CAN also be agnostic to IPFS. We only use ipfs.object.get and ipfs.object.put functions, which could simply be replaced with an abstraction that wraps IPFS or other systems to a more generic "content-address-storage" module (effectively: put and get). In fact, we had an attempt to move towards to that in ipfs-log some time ago, but decided to go back to call it ipfs (see orbitdb-archive/ipfs-log#182). The general idea, that was never followed through, is described here and a prototype that shows how it could work and what it would give us was done here.

In short, there's no reason ipfs-log needs to be tied to IPFS specifically (other than the name maybe? :)).

versidag is not opinionated in how the the entries are merged (ipfs-log uses vectorclocks)

We use Lamport Clocks in ipfs-log (a "weaker version" of vector clocks). This is another trade-off we made early on: version clocks require to have the clocks of all participants and the clocks may grow huge over time or depending on how many peers/nodes/users are using (updating) a particular log.

Re. merging: what ipfs-log does, is to provide a deterministic ordering for events. This is different from how things are merged on higher levels. For example, a G-Counter has different merge semantics to a KV store (Map) and to Git. It is crucial to be able to have determinism when it comes to the order of events, but at the same time the user-level data structures should be able to provide a custom merge/sort function to control their specific data structures. We don't currently do this in a convinient way in ipfs-log and it's something we've been wanting to improve, just haven't gotten to it so far. It could be done trivially by allowing the user to pass in a function (exactly like what is proposed above with the comparator), pass that to the sorting/traversal part and effectively replace the current "ultimate conflict resolution" function. By doing so, the user can provide their logic as to how to handle concurrent events and let them choose which event/entry is the "correct" one.

ipfs-log seems to store more stuff in-memory than it needs, e.g.: if 10k stuff are appended to the log they are kept in memory (I can be wrong, but it looks like it by analyzing the code)

You're correct, we keep the entries in memory. This is a consicous trade-off we've made, but again, there's no reason the log needs to be stored in memory. The trade-off is two-fold: 1) keeping them in-memory allows for faster reads (as we don't need to traverse the log and especially hit the disk to read entries from IPFS) and 2) keeping the entries in memory gives us the ability to do efficient comparisons of "known" and "unknown" entries when joining two logs together.

We opted to optimize for high write-throughput at the cost of memory usage and read speed and that made us to choose keeping the log in memory. In real-life the memory consumption hasn't become a problem yet, but obviously it can and to enable more use cases we're currently considering changing this. This could be a very concrete thing to collaborate on, but again, there are trade-offs to be made and they need to be thoroughly and carefully considered. (As a side, I personally feel the user should be able to make those trade-offs, but it's not always possible to provide that to users)

Read-only mode (merge shouldn't be needed as history already has common head)

If I understand the "read-only mode" here correctly, in relation to the merge proposed above: what we do is to NOT have a dedicated merge event/entry that gets added to the log. Instead, ipfs-log keeps all known heads in an Array. Upon user adding a new entry, the new entry will then point to all those heads (and replace the heads internally), which achieves the same semantics but doesn't require an "additional" entry in the data structure itself.

For reference, the comparator would compare the timestamps and fallback to compare the cids.

This is exactly what we do currently in ipfs-log's sorting function LastWriteWins.

versidag.resolve({ limit: 5, fromNodeCid }) // -> { versions, nextNodeCid }

While ipfs-log doesn't currently have a direct resolve() function, we do have a traverse(rootHash) which deliver the same functionality, I believe. We're considering (as part of the work mentioned above re. read/write trade-offs) to add an iterator to ipfs-log which we currently do in OrbitDB's EventStore. This would add further capabilities for the user to traverse the log between specified hashes (by specifying gt and lt, "greater than" and "later than" respectively). There's also been some discussion re. breadth-first vs. depth-first traverse which yet again would allow greater flexibility for the user to decide how to interpret the log.


Considering all these perspectives, I believe there would be a lot of synergies in collaborating and improving ipfs-log to enable the "version histories" use case.

It might also be interesting to think of the "version histories" as higher level "data model" that builds on a log (similar to how OrbitDB's stores are done): while the log has a deterministic and "opinionated" sorting, traversal, loading, pointers to the child nodes etc., a "VersionHistory" data model/structure could provide a more "version history" specific functionality and API. For example, the payload of a log entry could contain something like {..., previousVersions: [<list of log entry hashes specific to the document/file>]}

I hope this helps to understand ipfs-log a little better and see if it would be a viable approach for what you're trying to achieve here.

I think it could be, and improving it would help the wider community tremendously. As many projects are building on it (by proxy), we'd be very happy and grateful to have more people collaborate on it ❤️ 🙏

@satazor
Copy link
Author

satazor commented Dec 5, 2018

@haadcode Thanks for shimming into this issue, your comment was very valuable.

I see an opportunity for both versidag and ipfs-log to use a common module, perhaps named dag-log. It would essentially be a less opinionated version of ipfs-log/versidag and it would be capable of:

  • Adding new entries to a Merkle-DAG, in a deterministic way
  • Merging the current heads with other replica's heads, in a deterministic way
  • Traverse the Merkle-DAG in a consistent & deterministic order

If dag-log existed today, I would be able to build versidag on top of it by storing meaningful data for my use case, which is versions. I think that you could also leverage dag-log in ipfs-log the same way.

If this looks reasonable, we can collaborate and work together on dag-log. Let me know what your thoughts are.


I've finished the initial implementation of versidag: https://github.com/ipfs-shipyard/js-versidag. Feedback is welcome!

@aschmahmann This initial version is similar to the API I wrote above. Neverthless, we may explore event-driven approaches and add them to the API, or even write a new module on top of this one.

@aschmahmann
Copy link
Collaborator

@satazor looks good to me. I need something similar on the Go side. I'm currently exploring interfaces as described at https://github.com/ipld/replication/pull/3/files and applying it to multiwriter IPNS as described https://github.com/aschmahmann/ipshare/blob/master/sync/MultiWriterIPNS.md.

They look pretty similar though which is certainly a good thing. We should probably talk at some point about how closely we want the APIs to mirror each other.

@haadcode
Copy link

@satazor thanks for the reply! 👍

If dag-log existed today, I would be able to build versidag on top of it by storing meaningful data for my use case, which is versions.

I tried to convey in my previous reply that I believe what you're after does exist today in ipfs-log and thus wanted to propose to a) look into using it and b) collaborate to improve/fix where it falls short. It ticks all the boxes as per your requirements described in this issue.

It would essentially be a less opinionated version of ipfs-log/versidag and it would be capable of...

Where do you feel ipfs-log is too opinionated? Perhaps we can drill into the details to make sure I've not overlooked your requirements and see if (and what) changes would be needed to facilitate your use case.

If you want, we can do a more real-time chat on Gitter to discuss details and clarify. Feel free to ping me on Gitter!

@satazor
Copy link
Author

satazor commented Dec 11, 2018

I tried to convey in my previous reply that I believe what you're after does exist today in ipfs-log and thus wanted to propose to a) look into using it and b) collaborate to improve/fix where it falls short. It ticks all the boxes as per your requirements described in this issue.

In short, here's the requirements:

  • Add a new node to a merkle dag and attach data to it
  • Ability to merge the current merkle dag heads with other heads, creating a merge node and attach data to it
  • It's important that the merge is deterministic, e.g.: we need to make sure that non-concurrent heads are removed, otherwise the replicas won't converge to the same dag node *1
  • Ability to just concatenate heads, without creating any merge node
  • Ability to traverse the dag node in a deterministic order, preserving the parcial order + resolve concurrent entries based on a comparator that receives the data attached to the nodes
  • Ability to stop & resume the traversal of the dag at any point
  • Keep memory footprint small as this will be used in the browser and in low-end devices
  • It would be cool if all of this was agnostic to IPFS, but not really a requirement

*1 Imagine the following dag:

          A
    |-----|----|
    |     |    |
    B     C    E
    |--|--|
       |
       D

If one tries to merge B, D and E, the merge node should NOT contain a reference to B. This is because D and B are non-concurrent heads. This allows us to have deterministic merges amongst replicas that just know their heads and not the full dag. If this wasn't done at all, they would be in an infinite loop of merging.


@haadcode Could you please analyze if it's easy to change ipfs-log to meet these requirements, or even if they are met already?

@satazor
Copy link
Author

satazor commented Dec 11, 2018

Regarding this issue, both https://github.com/ipfs-shipyard/js-versidag and https://github.com/ipfs-shipyard/js-ipfs-versidag are ready! Should we close this issue and keep discussing in a new issue or just keep this open?

@pgte
Copy link
Contributor

pgte commented Dec 11, 2018

@satazor that's great! I say we keep this one open to discuss collaboration with ipfs-log..

@olizilla
Copy link

@satazor it'd be rad to organise a call with @haadcode to see how we can collaborate on ipfs-log. Ideally we'd help improve the existing thing and support that.

@satazor
Copy link
Author

satazor commented Dec 11, 2018

I've created a doodle so that we can agree on a time to have a call: https://doodle.com/poll/n7rpk36cy7x9wthk

@haadcode and everyone interested, please fill it. 🕐

@pgte
Copy link
Contributor

pgte commented Dec 11, 2018

@satazor thanks, filled in the doodle!

@haadcode we literally came up with the versidag requirements as @satazor was developing it, it was pretty much a learning experience.
Now that we think we know what we need, I'd like us to think seriously about merging efforts. :)

@haadcode
Copy link

Would be happy to jump to a call and go through everything discussed here. Filled in the doodle, thanks @satazor for coordinating! Will answer your comment above in more detail either in the call or before/after in here (short, non-complete answer is: yes, it should be able to all that right now, save for some possible changes/additions).

Looking forward to discuss more! ❤️ 👍

@satazor
Copy link
Author

satazor commented Dec 11, 2018

We have a match for tomorrow 11:00 gmt, I will create an event at the calendar. See you tomorrow!

@satazor
Copy link
Author

satazor commented Dec 11, 2018

@haadcode could you please email me or dm me on IRC/twitter your email?

@haadcode
Copy link

@satazor reached out to you on Twitter.

@satazor
Copy link
Author

satazor commented Dec 13, 2018

First of all, sorry @aschmahmann and Mark, there was a confusion about the exact timezone and we had to shift one hour earlier otherwise @haadcode wouldn't make it.

Me, @pgte and @haadcode chatted about this matter and there were two possible solutions:

  1. Introduce some changes to ipfs-log in order to fulfil versidag requirements, or
  2. Implement a common module that would be shared by ipfs and versidag

We decided to go for the 1 option. Here are the summary of the points we discussed, some of them will require further analysis and some will require changes in ipfs-log:

  • Remove lamport clocks as the implicit tie breaker and instead make it a configuration. The reason is that lamport clocks occupy a lot of space when there are a lot of replicas, which is unnecessary in some use-cases, such as the one in Discussify
  • Analyze the memory footprint of the log instance and make it smaller or configurable. There are scenarios where it's ok to have more I/O as opposed to increased memory usage. One example is precisely viewing the history of changes, which is a feature used sporadically.
  • Have a way to explicitly create a merge node when we are merging other replica's nodes. One use-case for that is when a certain amount of heads is reached, we want to have it reduced to just one (the merge node), so that it occupies less space inside a CRDT
  • Provide a way to efficiently and incrementally traverse the log. If I request the last 5 entries, there should be as less traversal and I/O as possible to accomplish that. Also, it should be able to have some kind of resume of the traversal, for pagination use-cases.
  • Support IPLD now that js-ipfs has it via the dag.put and dag.get functions. This allows us to use the IPLD query language and use explorer.ipld.io to debug.
  • Ensure that we have timeouts in place when reading/writing nodes in ipfs. At the moment, ipfs doesn't really handle this, so we must do that ourselves.
  • Ensure that the traversal of the tree happens concurrently, as much as possible. We can obviously improve later on.

@haadcode hopefully I didn't forget anything. How should we proceed next? Create an issue for each of those things in ipfs-log repo?

@aphelionz
Copy link

@satazor No worries about the call time. Just spoke with @haadcode and opening up all those as issues is a great next step. Thanks for organizing all those thoughts.

@satazor
Copy link
Author

satazor commented Dec 13, 2018

Let me know when those issues are created so that I can start contributing and also close this issue.

@satazor
Copy link
Author

satazor commented Dec 13, 2018

Closing as the issues were created in ipfs-log. I will be examining the ipfs-log codebase and be contributing to those issues.

Thanks everyone for participating in this.

@satazor satazor closed this as completed Dec 13, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants