Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioning: Commit + Repo Datastructures #23

jbenet opened this issue Jul 2, 2015 · 13 comments

Versioning: Commit + Repo Datastructures #23

jbenet opened this issue Jul 2, 2015 · 13 comments


Copy link

@jbenet jbenet commented Jul 2, 2015

Versioning has been a long time coming.

We need to construct the necessary data types before we start making any tooling around it. The types

The SYNTAX of the "merkldag DSL" is still TBD (#22), but for now using go-like

first, some types we need

// Any is any merkledag Node

type Identity struct {
  Key SigningKey // link to a signing key

  Data struct {
    Name string // the "name" of the identity

type Authorship struct {
  Author Identity

  Data struct {
    Date string // ISO timestamp in UTC?

type Signature struct {
  Object Any        // link to the signed object
  Key    SigningKey // link to the signing key

  Data struct {
    Signature []byte // the signature bytes

// generic type that terminates in a certain other leaf type
type Tree<LEAF_TYPE> struct {
  NAME Link<Tree | LEAF_TYPE>

the versioning data types

type Commit struct {
  Parents   []Commit     // "parent0" ... "parentN"
  Author    Authorship   // link to an Authorship
  Committer Authorship   // link to an Authorship
  Object    Any          // what we version ("tree" in git)

  Data struct {
    Comment String // describes the commit

type VersionRepository struct {
  Refs Tree<Commit> // hierarchy of {branches, tags, heads, remotes, ... }
  Logs Tree<File>   // reflogs, etc... (maybe should be other than files...)
Copy link

@chriscool chriscool commented Jul 2, 2015

It seems to me that in Git when you sign a commit the signature is part of the commit. So you cannot remove the signature without changing the commit sha1.

And commit trailers (like Signed-off-by) are very useful in Git and may deserve something special.

Also Tree and VersionRepository are defined but not used.

Copy link

@christianlundkvist christianlundkvist commented Jul 9, 2015

Should we have email as part of Identity too, like Git does? Probably don't want that since they key is a stronger identifier.
In the example Commit struct there is no Signature, I guess this would be optional depending on if you want to sign the commit or not. Having the Signature in the commit would make the signature part of the commit as @chriscool mentioned.

Copy link

@ion1 ion1 commented Sep 22, 2015

Please consider adding a variant of a merge commit whose meaning is history rewriting.

The first parent will point to the new history. User interfaces are supposed to act as if it was the only parent unless the user requests otherwise. Where possible, the rewrite commit could be rendered as something like an unobtrusive collapsed bar between commit messages.

The hidden secondary parent will point to the history that was “rewritten”.

This would alleviate much of the need Git has for forced pushes in development branches to keep the commit history clean. It would also let one view what changed in the “rewrite” unlike with Git rewrites.

Copy link
Member Author

@jbenet jbenet commented Sep 23, 2015

@ion1 interesting idea!

@dignifiedquire dignifiedquire mentioned this issue Nov 30, 2015
3 of 14 tasks complete
@daviddias daviddias mentioned this issue Jan 9, 2016
0 of 5 tasks complete
@daviddias daviddias changed the title Commit + Repo Datastructures Versioning: Commit + Repo Datastructures Feb 23, 2016
Copy link


@chriscool as a relevant note, I'd often argue against adding first-class tagging to a git-ish system.

(There's some relevant discussion on my own approach to avoiding that on top of Git itself, see ELLIOTTCABLE/.gitlabels.)

Copy link

@nothingmuch nothingmuch commented May 25, 2016

Git's lack of multiple author support is an oft cited limitation, I think a logical AND of authorships would be useful to include instead of a post hoc way of embedding that in the identity, since that would require parsing, etc.

Copy link

@nothingmuch nothingmuch commented May 25, 2016

@ion1, @ELLIOTTCABLE I think the most appealing way to address that is to have more than just a "parent" relationship between commits, which ties this into the debate about first class tagging and potentially also various trailers in the comments.

Since there's nothing preventing the Object field from being a commit, parallel histories could be related by decorating both of them from the outside with a third one, for example, but that's far from the only approach.

Copy link

@mcast mcast commented Oct 5, 2016

Do the data structures imply that the native Git objects would need to be translated when crossing the ipfs boundary?

I understand that the ipfs hashtree structure is different from the Git blobid, so the two aren't directly compatible. Is it necessary to generate a new id to store a git object (or pack, if the tools could find out what any of them might be called) in the DHT?

My concern is that if the data structures don't provide an exact isomorphism with the Git objects used in any given repo, there will be a lossy translation. It has to be lossless, doesn't it?

(On objectid->packid, serving something like a 302 Found or an extra returned header might help efficiency, then you only need DHT entries for the commits and the rest can come from a pack. Or maybe I need to read more about ipfs.)

Copy link

@ghost ghost commented Oct 5, 2016

@mcast with CID and IPLD, we'll be able to just reference the unchanged git objects/packs/blobs/trees.

Copy link

@kehao95 kehao95 commented Jan 8, 2018

Hi, it's been a while since the last update. Is there any update on this topic? Thanks for all your hard work. We would like to try IPFS in our product but we need the versioning feather to be ready. Where can I track the status of this feature?

Copy link

@osarrouy osarrouy commented Feb 21, 2018

Hi everyone. Same question than @kehao95 here :)

Anyway to track the status of this issue ?

Copy link

@Stebalien Stebalien commented Feb 21, 2018

Unfortunately, no. We don't have native versioning.

We do now have git object support in IPLD:, However, that has some limitations (no sharding, for one).

Copy link

@RubenKelevra RubenKelevra commented Jun 17, 2020

It would be nice if we can add a diff file to each commit. This would enable us to remove the pinning for the sub-cid of the older version and just keep the diff pinned.

You may know the creation of patches/diffs of large binary files as very resource-intensive, but zstd now supports the ability to created diffs from two files up to 2 GB - which is extremely space-efficient and fast.

The diffs can just be used in one direction. So creating patches backward makes the most sense. This way IPFS can create on the fly older versions from the patch if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
12 participants
You can’t perform that action at this time.