New API for 1.0 #397

Merged
merged 43 commits into from Feb 15, 2017

Projects

None yet

5 participants

@samoht
Member
samoht commented Jan 29, 2017 edited

I am opening the PR to prepare the 1.0 release with a cleaner API. I am already pretty happy with the state of that PR (I spent a couple of weeks working on that). Any feedback would be welcome, especially from existing users, e.g. @talex5, @kayceesrk, @avsm, @hnrgrgr, etc. but given the amount of changes, I won't except a detailed review :p

The main changes in the new API are:

  • no more mirage-tc to define new contents, but simpler type-based combinators based on depyt (note: I will probably just move depyt.ml inside irmin project so the combinators could be tailored for mergeable datastructures in the future)
  • subtrees now have first-class support in the API. No more mutable views, use immutable subtrees (with full caching and lazy writes), strongly inspired by #357 and the datakit tree API.

And lots of more minor improvement/fixes that I will document in the next few days. Check the raw diff for irmin.mli if you are impatient.

TODO:

  • document the changes
  • remove Irmin_git.LOCK => need a new release of ocaml-git to include test-and-set
  • move core of Depyt in Irmin.Type (remove read/write, just keep jsonm converters? the goal is to add protobof converters next)
samoht added some commits Jan 25, 2017
@samoht samoht Update Copyright years 5bdac5d
@samoht samoht En route pour 1.0: Major API revision
For Irmin 1.0, the main design principles are:
- the backends has to provide the smallest possible API
- the user-facing Irmin API can be different from the backend one and should
  be nice/easy to use.

In 0.* versions, one of the goal was to make the backend API a subset of the
Irmin API. This was an interesting design approach but it caused more issues
that expected. Be more relax now, and allow the two APIs to evolve separately:
the backend API could be as minimal as possible -- the Irmin API could be
as expressive as possible.
3819797
@samoht samoht Split the tests into their respective subpackages
This allows the backends to run some tests early instead of
waiting for irmin-unix to be installed and tested.
8bb75ee
@samoht samoht the tests for irmin compile and pass ec3901c
@samoht samoht git: adapt irmin-git to the API changes 669b253
@samoht samoht git: add tests for irmin-git 65ec2a9
@samoht samoht http: adapt irmin-http to the new API 662ad83
@samoht samoht http: add unit tests for irmin-http 7b71bc7
@samoht samoht unix: adpat the examples to use the new API 1178785
@samoht samoht mirage: adapt irmin-mirage to the new API
77e64e8
@samoht samoht unix: adapt irmin-unix to the new API fcf61e4
@samoht samoht unix: add unit-tests for irmin-unix bde8102
@hannesm
Member

any good reason not to use result here?

Member

to be compatible with Cmdliner (pp + of_string = Cmdliner.Arg.converter). But maybe that's indeed better to return a result.

Use can use Rresult.R.to_presult to convert them for cmdliner (and I some point I hope I can provide a path to use results in cmdliner.

@talex5

This certainly looks like a nice improvement over the old API. But it still seems much too easy to write racy, crashy software with it.

The main thing missing seems to be some kind of immutable commit object. Everything either works on commit IDs (which might not exist) or on branches (which can change under you at any time).

The only operations I really want on refs are:

  • Get the head commit.
  • (Test and) set the head commit.

Anything else (e.g. getting the parents, task or tree of a store) is asking for trouble because you can never know what commit the result applies to.

src/irmin.mli
- (** [with_contents t s c] replaces [t]'s contents for the step
- [s] by [c]. *)
+ val remove: t -> step -> t
+ (** [remove t s] is [t] where [find t v] is [None]. *)
@talex5
talex5 Jan 30, 2017 Contributor

Might be worth using a different name here, e.g. [remove t s] is [t2].

src/irmin.mli
- val node: t -> commit -> node Lwt.t
- (** Get the commit node.
+ val node: t -> commit -> node option Lwt.t
+ (** [node t c] is [c]'s commit node or [None] is [c] is not
@talex5
talex5 Jan 30, 2017 Contributor

"commit node" -> "root node"?
What if c is present in t but has no root node?
Do we need this function at all? The user can get the node from a commit easily enough using the commit API.

val contents_t: t -> Contents.t
val node_t: t -> Node.t
val commit_t: t -> Commit.t
- val ref_t: t -> Ref.t
+ val branch_t: t -> Branch.t
@talex5
talex5 Jan 30, 2017 Contributor

Why are we renaming ref to branch everywhere? It would be nice if Irmin could handle all types of ref (i.e. including tags).

@samoht
samoht Feb 7, 2017 Member

My plan is to expose the Git refs in the Git backend only. I am not totally sure how this would work, but I have some ideas that I'd like to test.

src/irmin.mli
module type S = sig
- (** {1 Irmin Store} *)
+ (** {1 IrminSstores}
@talex5
talex5 Jan 30, 2017 Contributor

Typo.

src/irmin.mli
+ (** The type of repository handles. *)
+
+ val v: config -> t Lwt.t
+ (** [v mk_task config] connects to a repository in a
@talex5
talex5 Jan 30, 2017 Contributor

Looks like mk_task has gone :-)

src/irmin.mli
+ [name]. Similar to [master], but use [name] instead
+ {!Branch.S.master}. *)
+
+ val of_commit: Repo.t -> commit -> t Lwt.t
@talex5
talex5 Jan 30, 2017 Contributor

Is this a commit or a commit ID? If it's a commit ID, I guess it should be returning an option (and maybe change the type name to be less confusing). If it's a commit, I'm not sure why we need to provide the repo (since a commit presumably knows where it's from).

+
+ val tree: t -> tree Lwt.t
+ (** [tree t] is [t]'s current tree. Contents is not allowed at the
+ root of the tree. *)
@talex5
talex5 Jan 30, 2017 Contributor

Should the return type be [> `Empty | `Node of node ] then?

@samoht
samoht Feb 7, 2017 Member

I've tried to do that, but then it starts to be annoying to cast subtypes everywhere. There is probably a more elegant solution to this, but I haven't found it yet.

@talex5
talex5 Feb 8, 2017 Contributor

Wouldn't OCaml handle that automatically, due to the >? i.e. if you want to use it as a tree, that will happen automatically.

src/irmin.mli
+ (** Managing the store's heads. *)
+ module Head: sig
+
+ val v: Repo.t -> task -> parents:commit list -> tree -> commit Lwt.t
@talex5
talex5 Jan 30, 2017 Contributor

I'd expect Head.v to create a head, not a commit.

+ [None] if the store has no contents. Similar to [git rev-parse
+ HEAD]. *)
+
+ val get: t -> commit Lwt.t
@talex5
talex5 Jan 30, 2017 Contributor

Can we get rid of these exception-throwing helpers? It's easy enough to write Head.find t c >|= crash_if_none if you really want that. Or maybe rename it to get_exn if it provides a useful error message. Otherwise, it's difficult to spot these in code reviews.

@samoht
samoht Feb 7, 2017 Member

I quite like to keep them, as they can be useful in some cases. The rule that I've picked is "find never throws" but "get" can throwInvalid_argument`".

src/irmin.mli
+ val merge: into:t -> task -> ?max_depth:int -> ?n:int -> commit ->
+ (unit, Merge.conflict) result Lwt.t
+
+ val parents: t -> commit list Lwt.t
@talex5
talex5 Jan 30, 2017 Contributor

This seems like it should be an operation on commits not stores.

src/irmin.mli
+ val parents: t -> commit list Lwt.t
+ (** [parents t] are [t]'s parent commits. *)
+
+ end
@talex5
talex5 Jan 30, 2017 Contributor

Generally, this API seems a bit confused about whether Store.Head is about stores (mutable refs) or commits (immutable values).

src/irmin.mli
+ in memory for efficiency, where reads are done lazily and
+ writes are done only when needed on commit: if you modify a
+ key twice, only the last change will be written to the store
+ when you commit. *)
@talex5
talex5 Jan 30, 2017 Contributor

Might be worth pointing out here that trees are immutable.

src/irmin.mli
+
+ end
+
+ (** {1 Reads} *)
@talex5
talex5 Jan 30, 2017 Contributor

Perhaps all of these functions could be moved to a sub-module called Not_threadsafe or similar, to prevent people using them by accident?

samoht added some commits Feb 2, 2017
@samoht samoht Fork Depyt into a stand-alone Irmin.Type module.
This will help to evolve that module into something specialized to mergeable
data-structures.
febd91c
@samoht samoht Fix doc typos 0f67db0
@samoht samoht merge: cleanup the merge combinators b39f8e0
@samoht samoht Use the result type everywhere
00d9aa7
@samoht samoht Add jsonm in .merlin 245bf25
@samoht samoht core: commits are immutable objects
Also rename Irmin.Task into Irmin.Info
faddd42
@samoht samoht test: update the test to the new commit API
b0283c4
@samoht
Member
samoht commented Feb 6, 2017 edited

First version of the API for immutable commits pushed. Still need to address the rest of the feedback.

src/ir_conf.ml
-type 'a parser = string -> [ `Error of string | `Ok of 'a ]
+open Result
+
+type 'a parser = string -> ('a, string) result
@dbuenzli
dbuenzli Feb 6, 2017

If you want to be compatible with what cmdliner will become this should be

type 'a parser = string -> ('a, [`Msg of string]) result
samoht added some commits Feb 7, 2017
@samoht samoht core: remove the Repo.t parameter when possible when manipulating com…
…mits
9872082
@samoht samoht core: use ('a, [`M`sg of string)) result` instead of `('a, string) re…
…sult)`

The will make the parser compatible with the next version of Cmdliner and more
generally more easily composable with Rresult.
1af393b
@samoht samoht core: simplify the base Sync signature 93522d2
@samoht samoht git: update irmin-git the the latest API updates b2ac297
@samoht samoht core: always sort the commit parents when building a commit object 6399ced
@samoht samoht git: speed-up translation of commit objects a862d78
@samoht samoht http: adapt irmin-http to latest API changes b7bf4de
@samoht samoht core: add pretty-printers for fetch/push errors f1879e8
@samoht samoht unix: update irmin-unix to the latest API changes
7dbcc30
@samoht samoht core: clarify some docs ad6f86a
@samoht samoht more API renaming
Use find_tree instead of getv and find_all instead of findm.
7f84c6a
@samoht samoht doc: fix bitrot of examples in the docs d10bf9c
@samoht samoht Fix more doc typos
05cbdc4
@samoht samoht More doc updates
78781ff
@samoht samoht mirage: update irmin-mirage to the latest API changes 3937671
@samoht samoht Complete renaming of task into info
20ffc97
@samoht
Member
samoht commented Feb 7, 2017

All the feedback have been taken into account, thanks for the reviews!

The last missing piece (that I discussed briefly with @talex5) is about avoiding the racy S.set. To address this, I have started to modify the API to implement:

  val set_tree: t -> ?allow_empty:bool -> info -> key -> (tree -> tree) ->
    unit Lwt.t
  (** [set_tree t k f] replaces {i atomically} the sub-tree [v] under
      [k] in the store [t] by the contents of the tree [f v], using
      the commit info [i]. If the subtree is modified between the
      initial tree read and the write, the transaction is aborted and
      tried again until it succeeds.

      If [allow_empty] is set (by default it is not), empty commits
      are allowed.

      {b Note:} Irmin transactions provides
      {{:https://en.wikipedia.org/wiki/Snapshot_isolation}snapshot
      isolation} guarantees: reads and writes are isolated in every
      transaction, but only write conflicts are visible on commi. *)

  val merge_tree: t -> info -> ?allow_empty:bool -> ?max_depth:int -> ?n:int ->
    key -> (tree -> tree) -> unit Lwt.t
  (** [merge_tree t i k f] is similar to {!set_tree} but try to merge
      the read and write sub-trees before aborting the transaction. *)
@kayceesrk
Contributor
kayceesrk commented Feb 10, 2017 edited

Thanks @samoht. Removing mirage-tc and using depyt is an excellent improvement. Looks like this could enable generic merges for arbitrary types if the changes are non-overlapping similar to merge-ropes. But why is depyt not a separate package which irmin depends on?

@samoht
Member
samoht commented Feb 10, 2017

I really would like to see that dynamic type library to specialise to mergeable data types - I am not sure this kind of library makes sense on its own, but I am fine to split it later when we understand the design space a bit better.

samoht added some commits Feb 8, 2017
@samoht samoht core: add with_tree for easy transactions
03585e3
@samoht samoht core: more consistent/simple info function 5473c76
@samoht samoht Update tests to use the more consistent info changes 51954cc
@samoht samoht git: adapt the changes in info functions ef42059
@samoht samoht unix: fix the compilation of the unix backend
3ba3cac
@samoht samoht Use mirage3 remote for CI 269c603
@samoht samoht update to ocaml-git 0.10.0
f815be2
@samoht samoht tests: use an init value when watching for changes
This avoids races. These sometimes happen when running the HTTP tests.
e6abb83
@samoht samoht merged commit 596d173 into mirage:mirage-dev Feb 15, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@samoht
Member
samoht commented Feb 15, 2017

All green, so merging!

I will make the remaining changes in separate PRs.

@samoht samoht deleted the samoht:1.0 branch Feb 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment