Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Good first issues / Help wanted & libgit2 #26

Closed
sameer opened this issue Dec 4, 2018 · 8 comments
Closed

Good first issues / Help wanted & libgit2 #26

sameer opened this issue Dec 4, 2018 · 8 comments

Comments

@sameer
Copy link
Contributor

sameer commented Dec 4, 2018

Hello,
I came across go-ipld-git while working on a university project for putting git repos on IPFS. For now, I simply add the repo as a folder via ipfs. I would like to use go-ipld-git instead and was wondering where I could get started with helping.

Is there a reason go-ipld-git doesn't use the go bindings for libgit2 or some other library for parsing git-related information? Some of the issues like #16 could be solved by parsing the date in the commit. Are you avoiding using the cgo compiler?

@Stebalien
Copy link
Member

Are you avoiding using the cgo compiler?

This, mostly. We do a lot of cross compiling.

@sameer
Copy link
Contributor Author

sameer commented Dec 4, 2018

Are you avoiding using the cgo compiler?

This, mostly. We do a lot of cross compiling.

Ok, that makes sense. Are there any issues that I could help out with? I've worked with go but not libp2p or ipld in particular before.

@Stebalien
Copy link
Member

Caching the CID (#6/#21) are probably good. Or just other code cleanups.

@magik6k this is really your domain. Need help with something git related?

@magik6k
Copy link
Member

magik6k commented Dec 5, 2018

putting git repos on IPFS

There is https://github.com/ipfs-shipyard/git-remote-ipld, which uses https://github.com/src-d/go-git and uses operates directly on IPLD objects (using this repo).

There is also https://github.com/larsks/git-remote-ipfs which is likely similar to what you are doing currently

Is there a reason go-ipld-git doesn't use the go bindings for libgit2 or some other library for parsing git-related information?

go-git should be able to provide 'proper' parsing, the main reason it's not used here is that this repo started as a 'quick hack' and wasn't ever properly rewritten/cleanud up.

A rewrite may eventually be needed, even if it means breaking few things. If it happens and will need to be coordinated with https://github.com/ipld/js-ipld-git, which isn't much better.

Also note that current ipld-git things don't touch anything related to pack-files which creates huge overheads in some places (for the Linux kernel repo there is about 40x size difference IIRC), and there is no nice way of integrating pack-files into ipfs/ipld ecosystem(it may be possible with some extensions to IPLD selectors (which themselves are in the planning stage now)).

As for good git/ipld related issues to pick - there is what Stebalien mentioned, for something more challenging - ipfs-shipyard/git-remote-ipld#12 (a generalization of this idea to smaller objects/parts may help reduce the overhead problem, but can introduce new problems too)

@sameer
Copy link
Contributor Author

sameer commented Dec 5, 2018

putting git repos on IPFS

There is https://github.com/ipfs-shipyard/git-remote-ipld, which uses https://github.com/src-d/go-git and uses operates directly on IPLD objects (using this repo).

I remember seeing this one -- so it is like adding a new type of remote to git, right?

There is also https://github.com/larsks/git-remote-ipfs which is likely similar to what you are doing currently

This looks pretty useful, thanks for sharing!

Is there a reason go-ipld-git doesn't use the go bindings for libgit2 or some other library for parsing git-related information?

go-git should be able to provide 'proper' parsing, the main reason it's not used here is that this repo started as a 'quick hack' and wasn't ever properly rewritten/cleanud up.

A rewrite may eventually be needed, even if it means breaking few things. If it happens and will need to be coordinated with https://github.com/ipld/js-ipld-git, which isn't much better.

Also note that current ipld-git things don't touch anything related to pack-files which creates huge overheads in some places (for the Linux kernel repo there is about 40x size difference IIRC), and there is no nice way of integrating pack-files into ipfs/ipld ecosystem(it may be possible with some extensions to IPLD selectors (which themselves are in the planning stage now)).

By 40x size difference do you mean that keeping the pack file in ipld has that overhead? Could they just be unpacked into the individual objects?

As for good git/ipld related issues to pick - there is what Stebalien mentioned, for something more challenging - ipfs-shipyard/git-remote-ipld#12 (a generalization of this idea to smaller objects/parts may help reduce the overhead problem, but can introduce new problems too)

I can look into the ones Stebalien mentioned first to get started. Thanks for the guidance.

@magik6k
Copy link
Member

magik6k commented Dec 5, 2018

By 40x size difference do you mean that keeping the pack file in ipld has that overhead? Could they just be unpacked into the individual objects?

Nope, I mean the overhead of individual objects vs pack files (git-remote-ipld deals with individual objects as this is the only way to make this work without complex ipld selectors and potentially other complex extensions to ipld which we don't have currently)

@sameer
Copy link
Contributor Author

sameer commented Dec 6, 2018

By 40x size difference do you mean that keeping the pack file in ipld has that overhead? Could they just be unpacked into the individual objects?

Nope, I mean the overhead of individual objects vs pack files (git-remote-ipld deals with individual objects as this is the only way to make this work without complex ipld selectors and potentially other complex extensions to ipld which we don't have currently)

So not being able to store the pack files themselves leads to the overhead?

@sameer sameer closed this as completed Dec 19, 2018
@magik6k
Copy link
Member

magik6k commented Dec 19, 2018

So not being able to store the pack files themselves leads to the overhead?

Yep, because pack-files can store diffs between objects. There is a quite good doc on how they do that in https://github.com/git/git/blob/master/Documentation/technical/pack-heuristics.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants