Versioning: Commit + Repo Datastructures #23

Open
jbenet opened this Issue Jul 2, 2015 · 12 comments

Comments

Projects
None yet
@jbenet
Member

jbenet commented Jul 2, 2015

Versioning has been a long time coming.

We need to construct the necessary data types before we start making any tooling around it. The types

The SYNTAX of the "merkldag DSL" is still TBD (#22), but for now using go-like

first, some types we need

// Any is any merkledag Node

type Identity struct {
  Key SigningKey // link to a signing key

  Data struct {
    Name string // the "name" of the identity
  }
}

type Authorship struct {
  Author Identity

  Data struct {
    Date string // ISO timestamp in UTC?
  }
}

type Signature struct {
  Object Any        // link to the signed object
  Key    SigningKey // link to the signing key

  Data struct {
    Signature []byte // the signature bytes
  }
}

// generic type that terminates in a certain other leaf type
type Tree<LEAF_TYPE> struct {
  NAME Link<Tree | LEAF_TYPE>
  ...
}

the versioning data types

type Commit struct {
  Parents   []Commit     // "parent0" ... "parentN"
  Author    Authorship   // link to an Authorship
  Committer Authorship   // link to an Authorship
  Object    Any          // what we version ("tree" in git)

  Data struct {
    Comment String // describes the commit
  }
}

type VersionRepository struct {
  Refs Tree<Commit> // hierarchy of {branches, tags, heads, remotes, ... }
  Logs Tree<File>   // reflogs, etc... (maybe should be other than files...)
}
@chriscool

This comment has been minimized.

Show comment
Hide comment
@chriscool

chriscool Jul 2, 2015

It seems to me that in Git when you sign a commit the signature is part of the commit. So you cannot remove the signature without changing the commit sha1.

And commit trailers (like Signed-off-by) are very useful in Git and may deserve something special.

Also Tree and VersionRepository are defined but not used.

It seems to me that in Git when you sign a commit the signature is part of the commit. So you cannot remove the signature without changing the commit sha1.

And commit trailers (like Signed-off-by) are very useful in Git and may deserve something special.

Also Tree and VersionRepository are defined but not used.

@christianlundkvist

This comment has been minimized.

Show comment
Hide comment
@christianlundkvist

christianlundkvist Jul 9, 2015

Should we have email as part of Identity too, like Git does? Probably don't want that since they key is a stronger identifier.
In the example Commit struct there is no Signature, I guess this would be optional depending on if you want to sign the commit or not. Having the Signature in the commit would make the signature part of the commit as @chriscool mentioned.

Should we have email as part of Identity too, like Git does? Probably don't want that since they key is a stronger identifier.
In the example Commit struct there is no Signature, I guess this would be optional depending on if you want to sign the commit or not. Having the Signature in the commit would make the signature part of the commit as @chriscool mentioned.

@ion1

This comment has been minimized.

Show comment
Hide comment
@ion1

ion1 Sep 22, 2015

Please consider adding a variant of a merge commit whose meaning is history rewriting.

The first parent will point to the new history. User interfaces are supposed to act as if it was the only parent unless the user requests otherwise. Where possible, the rewrite commit could be rendered as something like an unobtrusive collapsed bar between commit messages.

The hidden secondary parent will point to the history that was “rewritten”.

This would alleviate much of the need Git has for forced pushes in development branches to keep the commit history clean. It would also let one view what changed in the “rewrite” unlike with Git rewrites.

ion1 commented Sep 22, 2015

Please consider adding a variant of a merge commit whose meaning is history rewriting.

The first parent will point to the new history. User interfaces are supposed to act as if it was the only parent unless the user requests otherwise. Where possible, the rewrite commit could be rendered as something like an unobtrusive collapsed bar between commit messages.

The hidden secondary parent will point to the history that was “rewritten”.

This would alleviate much of the need Git has for forced pushes in development branches to keep the commit history clean. It would also let one view what changed in the “rewrite” unlike with Git rewrites.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 23, 2015

Member

@ion1 interesting idea!

Member

jbenet commented Sep 23, 2015

@ion1 interesting idea!

@jbenet jbenet referenced this issue in ipfs/archives Oct 28, 2015

Open

IPFS as a backend to a web archiving #28

@dignifiedquire dignifiedquire referenced this issue in ipfs/pm Nov 30, 2015

Closed

Sprint Nov 30 #60

3 of 14 tasks complete

@diasdavid diasdavid referenced this issue in ipfs/go-ipfs Jan 2, 2016

Closed

Commit Data Structure #1188

@diasdavid diasdavid referenced this issue in ipfs/http-api-spec Jan 9, 2016

Merged

Add Group add #17

0 of 5 tasks complete

@diasdavid diasdavid changed the title from Commit + Repo Datastructures to Versioning: Commit + Repo Datastructures Feb 23, 2016

@ELLIOTTCABLE

This comment has been minimized.

Show comment
Hide comment
@ELLIOTTCABLE

ELLIOTTCABLE Apr 1, 2016

@chriscool as a relevant note, I'd often argue against adding first-class tagging to a git-ish system.

(There's some relevant discussion on my own approach to avoiding that on top of Git itself, see ELLIOTTCABLE/.gitlabels.)

@chriscool as a relevant note, I'd often argue against adding first-class tagging to a git-ish system.

(There's some relevant discussion on my own approach to avoiding that on top of Git itself, see ELLIOTTCABLE/.gitlabels.)

@nothingmuch

This comment has been minimized.

Show comment
Hide comment
@nothingmuch

nothingmuch May 25, 2016

Git's lack of multiple author support is an oft cited limitation, I think a logical AND of authorships would be useful to include instead of a post hoc way of embedding that in the identity, since that would require parsing, etc.

Git's lack of multiple author support is an oft cited limitation, I think a logical AND of authorships would be useful to include instead of a post hoc way of embedding that in the identity, since that would require parsing, etc.

@nothingmuch

This comment has been minimized.

Show comment
Hide comment
@nothingmuch

nothingmuch May 25, 2016

@ion1, @ELLIOTTCABLE I think the most appealing way to address that is to have more than just a "parent" relationship between commits, which ties this into the debate about first class tagging and potentially also various trailers in the comments.

Since there's nothing preventing the Object field from being a commit, parallel histories could be related by decorating both of them from the outside with a third one, for example, but that's far from the only approach.

@ion1, @ELLIOTTCABLE I think the most appealing way to address that is to have more than just a "parent" relationship between commits, which ties this into the debate about first class tagging and potentially also various trailers in the comments.

Since there's nothing preventing the Object field from being a commit, parallel histories could be related by decorating both of them from the outside with a third one, for example, but that's far from the only approach.

@mcast

This comment has been minimized.

Show comment
Hide comment
@mcast

mcast Oct 5, 2016

Do the data structures imply that the native Git objects would need to be translated when crossing the ipfs boundary?

I understand that the ipfs hashtree structure is different from the Git blobid, so the two aren't directly compatible. Is it necessary to generate a new id to store a git object (or pack, if the tools could find out what any of them might be called) in the DHT?

My concern is that if the data structures don't provide an exact isomorphism with the Git objects used in any given repo, there will be a lossy translation. It has to be lossless, doesn't it?

(On objectid->packid, serving something like a 302 Found or an extra returned header might help efficiency, then you only need DHT entries for the commits and the rest can come from a pack. Or maybe I need to read more about ipfs.)

mcast commented Oct 5, 2016

Do the data structures imply that the native Git objects would need to be translated when crossing the ipfs boundary?

I understand that the ipfs hashtree structure is different from the Git blobid, so the two aren't directly compatible. Is it necessary to generate a new id to store a git object (or pack, if the tools could find out what any of them might be called) in the DHT?

My concern is that if the data structures don't provide an exact isomorphism with the Git objects used in any given repo, there will be a lossy translation. It has to be lossless, doesn't it?

(On objectid->packid, serving something like a 302 Found or an extra returned header might help efficiency, then you only need DHT entries for the commits and the rest can come from a pack. Or maybe I need to read more about ipfs.)

@lgierth

This comment has been minimized.

Show comment
Hide comment
@lgierth

lgierth Oct 5, 2016

Member

@mcast with CID and IPLD, we'll be able to just reference the unchanged git objects/packs/blobs/trees.

Member

lgierth commented Oct 5, 2016

@mcast with CID and IPLD, we'll be able to just reference the unchanged git objects/packs/blobs/trees.

@kehao95

This comment has been minimized.

Show comment
Hide comment
@kehao95

kehao95 Jan 8, 2018

Hi, it's been a while since the last update. Is there any update on this topic? Thanks for all your hard work. We would like to try IPFS in our product but we need the versioning feather to be ready. Where can I track the status of this feature?

kehao95 commented Jan 8, 2018

Hi, it's been a while since the last update. Is there any update on this topic? Thanks for all your hard work. We would like to try IPFS in our product but we need the versioning feather to be ready. Where can I track the status of this feature?

@osarrouy

This comment has been minimized.

Show comment
Hide comment
@osarrouy

osarrouy Feb 21, 2018

Hi everyone. Same question than @kehao95 here :)

Anyway to track the status of this issue ?

Hi everyone. Same question than @kehao95 here :)

Anyway to track the status of this issue ?

@Stebalien

This comment has been minimized.

Show comment
Hide comment
@Stebalien

Stebalien Feb 21, 2018

Unfortunately, no. We don't have native versioning.

We do now have git object support in IPLD: https://github.com/ipfs/go-ipfs/blob/master/docs/plugins.md, https://github.com/ipfs/go-ipld-git/. However, that has some limitations (no sharding, for one).

Unfortunately, no. We don't have native versioning.

We do now have git object support in IPLD: https://github.com/ipfs/go-ipfs/blob/master/docs/plugins.md, https://github.com/ipfs/go-ipld-git/. However, that has some limitations (no sharding, for one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment