Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: More immutability in webnative #226

Closed
Tracked by #501
matheus23 opened this issue May 6, 2021 · 9 comments
Closed
Tracked by #501

Discussion: More immutability in webnative #226

matheus23 opened this issue May 6, 2021 · 9 comments

Comments

@matheus23
Copy link
Contributor

matheus23 commented May 6, 2021

(Came up in a call with @expede)

Recently a couple of concurrency issues surfaced. They stem from data races, so the root problem in these cases is that some mutable state is modified by multiple entities.

We should think about making more of our data structures immutable.

Concretely, this would mean e.g. making the MMPT class only have immutable members, and e.g. its add() method return a new reference to a new MMPT.

We also have data races - as the concurrency.test.ts sometimes reveals - in the private file system code in (I think) PrivateTree (I did some testing in the matheus23/fix-concurrency).

We should think about also making this data structure immutable.


This is obviously a bigger refactor, so we should plan this a little better. Maybe we do this incrementally?

Also, at some point we have to put a mutable API over the immutable code, as that's more in line of how e.g. an app developer thinks about data storage (duh!).

We think we might want to have a very thin "mutability" layer, think of it like an index into the individual, immutable private file system trees

I'm thinking of also exposing the immutable file system stuff, so it's still composable, because once you create a "mutable" layer on top of an immutable API, you'll never truly get back the immutable API. I don't want to lose the immutable API in case anyone wants to integrate it into an immutable-first codebase.

@expede
Copy link
Member

expede commented May 7, 2021

This is obviously a bigger refactor, so we should plan this a little better

Yeah, this is totally a "measure twice / cut once". It should probably also happen at the same time as the general FS spec revisions.

I've been doing some poking around. You're going to laugh when you see the state of the art: "In this paper we use optimistic concurrency techniques adapted from software transactional memory to develop a concurrent tree data structure." Soooo exactly what we were saying on the call 😆 Except that we're implementing something STM-like, so that doesn't really help us too much.

We think we might want to have a very thin "mutability" layer, think of it like an index into the individual, immutable private file system trees

Yeah, exactly. There's a few edge cases, but in theory a light reference index for the latest version feels like the right direction.

Maybe we do this incrementally?

That would be ideal. Let's get this fully specced out, and then see which bit can be done first 👍 Okasaki was surprisingly unhelpful on concurrent inserts, aside from the classic merge operation which doesn't totally work for use because we have namespace contention. I have some papers open; will share any findings!

@expede
Copy link
Member

expede commented May 7, 2021

⚠️ WIP WIP WIP ⚠️

Working thoughts on approaches:

Switching to a less mutable solution makes keeping track of various parts of the state much more straightforward, and we get to align with the immutability inherent in content addressing.

Linearization (Fully Immutable)

  • Fairly straightforward to implement: write-ahead log
  • Strips out the concurrency
    • i.e. doesn't solve for parallelism: take an arbitrary linear order, do batching
      • 🙋‍♀️ Need to check if true parallelism is actually impossible with js-ipfs (no web worker access?)
    • Lose locality of reference parallel performance advantages
  • Must hold all queued actions in memory / delayed IPFS flushing
  • Difficult to guarantee consistency between remote & local HEADs
    • Specifically: local app copy is a data structure, waiting log is action that materialize to a data structure
      • ...unless we build the local copy also off the same stream and check that the two implementations are equivalent 🤔
    • Can trade off performance to get consistency, but that's probably back to square one
  • Potential hidden complexity: multi-phase commits

Optimistic Fork/Merge (Fully Immutable)

A variant on linearization. This feels more sophisticated.

  • Works with separate local and remote HEADs
    • Probably makes sense to have one canonical version of what's committed
  • Fairly straightforward to implement
  • High number of retries, but with shrinking differences per retry
  • Strongly related to multi-device optimistic concurrency
    • e.g. merge algorithm will probably look the same

Rough outline:

  • Maintain an array of DAGs
    • Index 0 is the latest "clean"/ HEAD version
  • Fork the entire DAG on any update, push into the array
  • Add to IPFS, down to the root
    • NOTE may have multiple roots in local store
    • This is the expensive step, since we're copying the actual data to IPFS
  • Perform a DAG merge
    • If conflicts, pick the DAG with the lowest index (leftmost in index) as the "truer" version
      • Rebase the conflicting vnodes, (at same position in the array, mutably)
      • Thanks to locality of reference, conflicting nodes should be few & shallow on average
    • This may be associative, and thus parallelizable via web workers or similar if they have access to IPFS

Global Rebuilding (Fully Immutable)

  • Per Okasaki ~p102, Overmars, Hood & Melville realtime queues
  • Successful in the wild for realtime applications
  • Separation of working copy and queryable (finalized) structure
    • AKA persistent version of the classic clean and dirty states
      • Slightly cleaner than in the mutable case, because persistence
    • Leave in a Promise until available in finalized version?
  • Okasaki warns that these tend to be very difficult to implement correctly
    • ...for the same reasons that the our current implementation is suffering
      • (i.e. handlig multiple concurrent edits of the same node)

Immutable Data / Mutable Spine (Mostly Immutable)

Need to circle back to this; it's late and I'm forgetting why we liked this earlier 😛 The multiple hours of unrelated tasks between when Philipp and I discussed this and now probably doesn't help 😅 🧠

  • Essentially an STM index pointing at an immutable store
  • Doubles as a snapshot of the current local HEAD
  • Mutable tree simply provides HEAD name/path layout
  • Need to maintain dirty and clean segments
    • Dirty is the mutable portion, that's not yet resolved by IPFS

Links

@matheus23
Copy link
Contributor Author

I find the distinction between fully and mostly immutable confusing.

If the surface level API is supposed to look like this:

const permissions = ...
const { fs } = await webnative.initialise({ permissions })
fs.write("private/a.txt", "sth")
fs.write("private/b.txt", "sth")

Then we will always have to have a layer of mutability in there.

The only way we can ever be fully immutable is when we expose an API like this:

const permissions = ...
const { fs } = await webnative.initialise({ permissions })
const fs2 = fs.write("private/a.txt", "sth")
const fs3 = fs2.write("private/b.txt", "sth")

(At least that's how I am thinking about this in my head)

I must be misunderstanding what you refer to when you/Okazaki says "can be implemented fully immutable".

Also, I admit I haven't read the okazaki book. I should really do that :)

@matheus23
Copy link
Contributor Author

Ok, so I've been chewing on this for a bit.

Given that we need some kind of merging operation on filesystems when we implement conflict resolution (rebasing), I think it makes a lot of sense to implement the optimistic fork/merge strategy.

This reminds me that this discussion is probably also highly relevant for filesystem transactions in webnative. (isn't a transaction basically a fork & merge?)

@expede
Copy link
Member

expede commented May 7, 2021

I find the distinction between fully and mostly immutable confusing.

It is! This is fully implementation detail, and a question of if any part can update their fields, or if you're only constructing new data structures with pointers back into old data.

A surface level API is a totally separate concern; we can do that with any of these techniques underneath

@expede
Copy link
Member

expede commented May 7, 2021

This reminds me that this discussion is probably also highly relevant for filesystem transactions in webnative. (isn't a transaction basically a fork & merge?)

They're closely related! The atomic transactions are higher granularity than this: literally toss the existing tree, do a rebase (with the same merge) and replay the anonymous function(s).

Can we make this one mechanism? Probably! Needs a bit more thought, though.

@expede
Copy link
Member

expede commented May 7, 2021

(also FWIW I also have a gut reaction in favour of the fork/merge option)

@expede
Copy link
Member

expede commented May 7, 2021

Random thought while in an unrelated call, so just a quick sketch of a stray idea:

In fork/merge, if we're doing associative merges with history, we don't want to have to walk back in the linked list to fix all previous references when there's no contention.

It may make sense to do this with CPS / manual laziness while in the array, to make merging to an arbitrary spot more performant (where the continuation is the part pointing back at unknown history).

@icidasset
Copy link
Contributor

Closed by #500
Part of #501

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants