Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Add Irmin intro blog post #186

Merged
merged 2 commits into from

5 participants

Thomas Gazagnaire Arto Bendiken Daniel Bünzli Richard Mortier Anil Madhavapeddy
Thomas Gazagnaire
Owner

edit: One of the posts for mirage/mirage#257 - AC

Thomas Gazagnaire samoht referenced this pull request in mirage/mirage
Closed

Blog posts in the run up to OSCON #257

13 of 13 tasks complete
Arto Bendiken

You might consider adding a mention and/or reference to content-addressable storage (CAS), for anyone interested to learn more about the notion of block addresses as identity.

Also, I wouldn't exactly agree that CAS is just an implementation detail, as it's a pretty fundamental design decision (and a good one). Recall that old chestnut, "There are only two hard things in Computer Science: cache invalidation and naming things"; well, as it happens, CAS addresses both these hard things, greatly simplifying everything.

I'm done. As usual: interesting and understandable read.

Daniel Bünzli
Owner

if you are a pedantic writer of english, data is always considered a plural (cf the singular, datum). more modern usage has data as a singular collective noun (vs dataset).

i personally don't care so long as it's consistent in a given document :)

Ah, thanks for the tip.

Daniel Bünzli

This sentence doesn't "work" (e.g. I read "like all the components of Mirage OS irim is not a proper database engine"...). Split the sentence.

Daniel Bünzli

Again to many things in the sentence starting on this line. Split it in smaller chunks.

Daniel Bünzli

I'd rather say "what I explain below holds both for in-memory databases and for fancy disk serialization like browser local-storage or the Git format."

Owner

...below... (not bellow)

(and i'd say "what i explain below" not "what i will explain below")

Daniel Bünzli

immutable rather than read-only ?

Daniel Bünzli

"the popular Distributed Version Control System" is a distraction in this sentence.

Daniel Bünzli

Add commas after Semantics and Complexity

Daniel Bünzli

remove the "to" : and put it in each bullet after the commas, otherwise the continuous sentence is unreadable.

Daniel Bünzli

Name the thing directly after you present them, otherwise the sentences make it needlessly hard to connect the terminology. Something like:

In both cases, you need a global pool (the irmin block store) and a way to name values in that pool (the irmin tag store).

Richard Mortier

...this makes...

Daniel Bünzli

what is a principle point-of-view ?

Richard Mortier

dave is a dangling ref :)

See for instance, instance of what ? I don't think it's an instance of the previous assertion.

Daniel Bünzli

Or rather it seems to be tag store + block store = high-level irmin store, this should be made somehow clearer.

Daniel Bünzli

what is a store context ?

Daniel Bünzli

initial common state ?

Daniel Bünzli

is that sequence of updates ? if it's the case put it that way.

Daniel Bünzli

This sentence doesn't make sense, e.g. where are the edges. Maybe split it and kill a few parens.

Daniel Bünzli

persistent and immutable ? or are you allowed to rewrite history ?

Daniel Bünzli

operation (singular).

Daniel Bünzli

IIRC you changed raising exception to returning a value, so I'd rather not use "conflict exception" but just "conflict" or "conflict value".

Daniel Bünzli

Which functions exactly ? you were just talking about 3-way merge here.

Daniel Bünzli

heterogeneous here feels like if applied to backend (singular) different backends ? Also what does it take to write a new backend ? Just implement the Irmin Store sig ? Then also maybe say that the work to implement a backend is also rather trivial which should help aswel to have a lot of different backends.

Thomas Gazagnaire
Owner

Thanks you for your feedback! I tried to fix all the remarks ...

Anil Madhavapeddy avsm merged commit e19a15d into from
Arto Bendiken

The CAS link definition is missing, even in the final published article: http://openmirage.org/blog/introducing-irmin. As a result, that link erroneously shows up as "[content-addressable storage][cas]" in the rendered version. The link should probably be to http://en.wikipedia.org/wiki/Content-addressable_storage

Owner

woops. thx for reporting the issue, I'll fix that.

Owner

done in mirage#191

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jul 17, 2014
  1. Thomas Gazagnaire

    Add Irmin intro blog post

    samoht authored
  2. Thomas Gazagnaire

    Fix remarks from @dbuenzli @bendiken and @mor1

    samoht authored
    Thanks!!!
This page is out of date. Refresh to see the latest.
8 src/data.ml
View
@@ -250,6 +250,12 @@ module Blog = struct
subject = "Fitting the modular Mirage TCP/IP stack together";
body = "intro-tcpip.md";
permalink = "intro-tcpip";
+ };
+ { updated = date (2014, 07, 21, 13, 0);
+ authors = [thomas];
+ subject = "Irmin: a Git-like distributed database";
+ body = "introducing-irmin.md";
+ permalink = "introducing-irmin";
}
]
end
@@ -511,7 +517,7 @@ module Links = struct
date=day(2014,04,08);
stream=blog;
};
-
+
{ id="mindy-unikernel-1";
uri=Uri.of_string "http://www.somerandomidiot.com/blog/2014/03/14/its-a-mirage/";
title="It's a Mirage! (or, How to Shave a Yak.)";
292 tmpl/blog/introducing-irmin.md
View
@@ -0,0 +1,292 @@
+## Introducing Irmin
+
+> This is the first post in a series which will describe Irmin, the
+ new storage layer of Mirage 2.0. In this post, I will give an
+ high-level description on the project and its overall
+ architecture. Later posts will detail how to use Irmin in
+ real systems.
+
+[Irmin][irmin] is a library to persist and synchronize distributed
+data structures both on disk and in memory. It enables a style of
+programming very similar to the [Git](git) workflow, where
+distributed *agents* fork, fetch, merge and push data between
+each-other. The general idea is that you want every active "agent" to
+get a local (partial) copy of a global database and always be very
+explicit about how and when data are shared and migrated.
+
+Irmin is *not*, stricly speaking, a proper database engine. Indeed, it
+is, as all others components of the Mirage OS, a collection of
+libraries designed to solve different flavors of the challenges raised
+by the [CAP][cap] theorem. Each application can then pick the right
+combination of libraries to solve its specific use-case. More
+precisely, Irmin consists of a core of well-defined low-level
+constructs and design principles, which specify how data might persist
+and be shared. It also define algorithms to synchronize efficiently
+those distributed low-level constructs. And finally, it offers a
+collection of useful higher-level data structures that can be used
+transparently by developers, i.e., without having to know precisely
+how Irmin works underneath.
+
+Irmin does not make strong assumptions on how the low-level constructs
+are implemented. This makes the system very portable: what I explain
+below holds for in-memory databases and for fancy disk serialization
+like disk browser local-storage or the Git format.
+
+### Persistent Data Structures
+
+Persistent data structures are well known and used pervasively in many
+different areas. The *Programming Languages* community has
+investigated the concepts [widely][okasaki] -- and this is [not
+limited][shallow] to functional programming folks -- in the meantime,
+the *Systems* community experimented with various persistent
+strategies for memory and storage, such and [copy-on-write][cow]
+filesystems. In all of these systems, the main concern is to optimize
+the space complexity by maximizing the sharing of immutable
+sub-structures.
+
+Irmin design ideas share roots with previous works on persistent data
+structures, as it provides an efficient way to *fork* data structures,
+but it also explores new strategies and mechanisms to be able to
+efficiently *merge* back these forked structures. This offers
+programming constructs very similar to the Git workflow.
+ Irmin focuses on two main aspects:
+(i) *Semantics*, what properties the resulting merged objects should
+verify; and (ii) *Complexity*, how to design efficient merge and
+synchronization primitives, taking advantage of the immutable nature
+of the underlying objects.
+
+Although it is pervasively used, *data persistence* has a very broad and
+fuzzy meaning. In this blog post, I will refer to data persistence as
+a way:
+
+- For a single process, to lazily populate a process memory on startup --
+ you need this when you want the process to be able to resume while
+ holding part of its previous state if it crashes
+
+- For concurrent processes, to share references between objects living in
+ a global pool of data. Sharing references, as opposed to sharing
+ value, leads to better performance and allow different processes to
+ concurrently update a shared store.
+
+In both cases, you need a global pool of data (the Irmin *block store*)
+and a way to name values in that pool (the Irmin *tag store*).
+
+### The Block Store: a Virtual Heap
+
+Even high-level data structures need to be allocated in memory, and it
+is the purpose of the runtime to map such high-level constructs into
+low-level memory graph blocks. One of the strength of [OCaml][ocaml]
+is the very simple and deterministic mapping from high-level data
+structures to low-level block representations (the *heap*): see for
+instance, the excellent series of blog posts on [OCaml
+internals][runtime] by Richard W. Jones. An Irmin *block store* can be
+seen as a virtual OCaml heap, using a more abstract way of relating
+heap blocks together. Instead of using the concrete physical memory
+addresses of these, Irmin uses the hash of the block contents as an
+address. As for any [content-addressable storage][cas], this gives to
+Irmin block-stores a lot of nice properties and greatly simplify the
+way distributed stores can be synchronized.
+
+*Persistent* data structures are immutable: once a block is created in
+the block store, its contents will never change again. In that
+context, updating a data structure means returning a completely new
+structure, while trying to share common sub-parts to avoid the cost of
+making new allocations as much as possible. For instance, modifying a
+value in a persistent tree means creating a chain of new blocks, from
+the root of the tree to the modified leaf. For convenience (and also
+because it is hard, in a non-lazy pure language to generate complex
+cyclic value), Irmin only considers acyclic block graphs.
+
+Conceptually, an Irmin block store has the following signature:
+
+```ocaml
+type t
+(** The type for Irmin block store. *)
+
+type key
+(** The type for Irmin pointers *)
+
+type value = ...
+(** The type for Irmin blocks *)
+
+val read: t -> key -> value option
+(** [read t k] is the block stored at the location [k] of the
+store. It is [None] if no block is available at that location. *)
+
+val add: t -> key -> value -> t
+(** [add t k v] is the *new* store storing the block [v] at the
+location [k]. *)
+```
+
+Persistent data structure are very efficient to store in memory and on
+disk as you do not need [write barriers][barriers] and writes
+can be done [sequentially][seq-writes], easing the interaction with
+filesystem caches.
+
+### The Tag Store: Controlled Mutability and Concurrency
+
+So far, we have only discussed a purely functional data structure,
+where updating a structure means returning a pointer to a new
+structure in the heap, sharing most of its contents with the previous
+one. This style of programming is appealing when implementing complex
+[protocols][tls] as it leads to better compositional properties.
+
+However, in that context, sharing information between processes is
+difficult: you need a way to "inject" the state of one structure in an
+other process memory. In order to do so, Irmin borrows the concept of
+*branches* from Git: in practice every operation is related to a given
+branch name, and it can modify the tip of the branch if it has
+side-effects. The *tag store*, in Irmin, is the only mutable part of
+the whole system and it is responsible to map some global (branch)
+names to blocks in the block store -- these names can then be used to
+pass block references between different processes.
+
+It is then possible to combine a block store and a tag store to build
+a higher-level store (the Irmin store) with fine concurrency control
+and atomicity guarantees. As mutation happens only in the tag store,
+we can ensure that as long a given tag is not updated, no change done
+in the block store will be visible by anyone. This also gives a nice
+story for concurrency: as in Git, creating a concurrent view of the
+store is straightforward: it is simply a matter of creating a new tag,
+denoting a new branch. All concurrent operations can then happen on
+different branches:
+
+```ocaml
+type t
+(** The type for Irmin store. *)
+
+type tag
+(** Mutable tags *)
+
+type key = ...
+(** The type for user-defined keys (for instance a list of strings) *)
+
+type value = ...
+(** The type for user-defined values *)
+
+val read: t -> ?branch:tag -> key -> value option
+(** [read t ?branch k] reads the contents of the key [k] in the branch
+[branch] of the store [t]. If no branch is specified, then use the
+["HEAD"] one. *)
+
+val update: t -> ?branch:tag -> key -> value -> unit
+(** [update t ?branch k v] *updates* the branch [branch] of the store
+[t] the association of the key [key] to the value [value]. *)
+```
+
+Interactions between concurrent processes are completely explicit and
+need to happen via synchronization points and merge events (more on
+this below). It is also possible to emulate the behavior of
+transactions by recording the sequence of operations (`read`s and
+`update`s) on a given branch -- that sequence is used before a merge
+to check that all the operations are valid (i.e. that all reads in the
+transaction still return the same result on the current tip of the
+store) and it can be discarded after the merge takes place.
+
+### Merging Data Structures
+
+To merge two data structures in a consistent way, one has to compute
+the sequence of operations which leads, from an initial common state, to two
+diverging states (the ones that you want to merge). Once these two
+sequences of operations have been found, they must be combined (if
+possible) in a sensible way and then applied again back on the initial
+state, in order to get the new merged state. This mechanism sounds
+nice, but in practice it has two major drawbacks:
+
+1. It does not say how we find the initial state from two diverging
+ states -- this is generally not possible (think of diverging
+ counters); and
+2. It means we need to compute the sequence of `update`s which leads
+ from one state to an other -- which is easier than 1. but it is
+ generally not very efficient.
+
+In Irmin, we solve these problems using two mechanisms.
+
+First of all, an interesting observation is that that we can model the
+sequence of store tips as a purely functional data-structure. We model
+the partial order of tips as a directed acyclic graph where nodes are
+the tips and there is an edge between two tips if either (i) one is
+the result of applying a sequence of `update`s to the other, or (ii)
+one is the result of a merge operation between the other and some
+other tips. Practically speaking, that means that every tip should
+contains the list of its predecessors as well as the actual data it
+associated to. As it is purely functional, we can (and we do) store
+that graph in an Irmin block store.
+
+Having a persistent and immutable history is good for various obvious
+reasons, such as access to a nice forensic tool if an error occurs or
+snapshot and rollback features for free. But an other less obvious
+nice property is the fact that we can now find the greatest common
+ancestors of two data structures. This solves the first point above.
+
+The second mechanism is that we require the data structures used in
+Irmin to be equipped with a well-defined 3-way merge operation, which
+takes two diverging states, the corresponding initial state (computed
+using the previous mechanism) and that return either a new state or a
+conflict (similar to the `EAGAIN` exception that you get when you try
+to commit a conflicting transaction in more traditional transactional
+databases). Having access to the common ancestors makes a great
+difference when designing new merge functions, as usually no
+modification is required to the data-structure itself. At the
+opposite, the usual approach is more invasive as it requires to
+enrich the data-structure itself to carry more information about the
+operation history, see for instance the [conflict-free replicated
+datatypes][crdt], which relies on unbounded vector clocks.
+
+We started to design interesting data structure equipped with a 3-way merge, as
+counters, [queues][merge-queues] and ropes.
+
+This is what the implementation of distributed and mergeable counters
+looks like:
+
+```ocaml
+type t = int
+(** distributed counters are just normal integers! *)
+
+let merge ~old t1 t2 = old + (t1-old) + (t2-old)
+(** Merging counters means:
+ - computing the increments of the two states [t1] and [t2]
+ relatively to the initial state [old]; and
+ - and add these two increments to [old]. *)
+```
+
+### Conclusion
+
+From a designer point-of-view, having access to the history means it
+is easier to design complex data structure with good compositional
+properties, to use in an application using Mirage and Irmin. Moreover,
+as we made little assumptions on how the substrate of the low-level
+constructs needs to be implemented, we hope to see the Irmin
+engine to be ported to a lot of different backends: this is just a
+matter of implementing a rather trivial [signature][irmin-AO].
+
+From an end-user point of view, this means that the full history of
+operations is available to inspect, and that the history model is very
+similar to the Git workflow. So similar, in fact, that we've developed
+a bi-directional mapping between Irmin data structure and the Git
+format! See for instance what [Dave Scott][dave] did with the new version of
+[xenstore][xenstore], booting a Xen VM, where the internal database is
+stored in a prefix-tree Irmin data-structure and exposed as
+a Git repository which is live-updated:
+
+<center>
+<iframe width="480" height="360" src="//www.youtube-nocookie.com/embed/DSzvFwIVm5s" frameborder="0" allowfullscreen="1"> &nbsp; </iframe>
+</center>
+
+[irmin-AO]: https://github.com/mirage/irmin/blob/4b06467ddee1e20c35bad64812769587fb9fa8a4/lib/core/irminStore.mli#L61
+[dave]: http://dave.recoil.org/
+[shallow]: http://en.wikipedia.org/wiki/Object_copy
+[cap]: http://en.wikipedia.org/wiki/CAP_theorem
+[irmin]: https://github.com/mirage/irmin
+[ocaml]: http://ocaml.org
+[okasaki]: https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
+[cow]: http://en.wikipedia.org/wiki/Copy-on-write
+[git]: http://git-scm.com/
+[runtime]: http://rwmj.wordpress.com/2009/08/04/ocaml-internals/
+[crdt]: http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf
+[merge-queues]: https://github.com/mirage/merge-queues
+[barriers]: http://en.wikipedia.org/wiki/Write_barrier
+[seq-writes]: http://en.wikipedia.org/wiki/Write_amplification#Sequential_writes
+[tls]: http://openmirage.org/blog/ocaml-tls-api-internals-attacks-mitigation
+[xirminstore]: https://www.youtube.com/watch?v=DSzvFwIVm5s
+[xenstore]: http://wiki.xen.org/wiki/XenStoreReference
2  tmpl/blog/oscon13-trip-report.md
View
@@ -36,7 +36,7 @@ content and well done" and "one of the most excited projects I heard about"
(though we are suspicious that just refers to Anil's usual high-energy
presentation style...).
-<iframe align="right" style="margin-left: 10px;" width="420" height="235" src="//www.youtube-nocookie.com/embed/2Mx8Bd5JYyo" frameborder="0" allowfullscreen="1"> </iframe>
+<iframe align="right" style="margin-left: 10px;" width="420" height="235" src="//www.youtube-nocookie.com/embed/2Mx8Bd5JYyo" frameborder="0" allowfullscreen="1"> &nbsp; </iframe>
Probably the most interesting chat after the talk was with the Rust authors
at Mozilla ([@pcwalton](http://twitter.com/pcwalton) and
Something went wrong with that request. Please try again.