Skip to content

Add Archive operation to ObjectDatabase #246

Merged
merged 5 commits into from Dec 13, 2013

9 participants

@yorah
yorah commented Nov 14, 2012

Fix #228

@yorah yorah referenced this pull request Nov 14, 2012
Closed

Implement `git-archive` #228

@dahlbyk
libgit2 member
dahlbyk commented Nov 14, 2012

I think we can get by without a base class since the only thing that we really need from the archiver is a handler for a path/stream. Everything else is implementation-specific. How about something like this:

https://github.com/dahlbyk/libgit2sharp/compare/topic;git-archive

@yorah
yorah commented Nov 14, 2012

I think I like it a lot. It definitely looks simpler without a base class, thanks! I guess I was over-engineering it a bit.

I'll push again between tomorrow and the day after.

@dahlbyk
libgit2 member
dahlbyk commented Nov 15, 2012

Refactored a bit, and switched to an in-memory test too...

@yorah
yorah commented Nov 16, 2012

c144fb3 is currently really dirty: I didn't have much time to play it, so I focused on modifying tar-cs as little as possible to make it generate a tar file in our context (removed some unneeded code in our context, changed some signatures, reorganized it in 2 files). It still needs a lot of work to make it generate a correct tar file (e.g.: userId is currently hardcoded, as well as the mode), but I wanted to share it before going on.

My current feeling is that to be confident in the tar implementation we propose, we'll need:

  • to spend some time looking at the tar format (doesn't seem overly complicated though, and make sure tar-cs implements all what we need
  • to make sure the implementation works correctly on windows, linux
  • to cover the implementation with tests Basically, it would mean rewriting/refactoring a good chunk of tar-cs (which is nevertheless a good starting point).

@nulltoken do we want to go that way?

In the meantime, I'll keep playing with the tar-cs thing during the week-end, it's actually interesting 😄

@dougrathbone

When will this make it into a release?

@yorah
yorah commented Dec 3, 2012

I didn't have much time to spend on this topic. The basic infrastructure is there, as can be seen from the tests. What is missing is the tar plugin (kinda working as is, but I'm not currently confidend with the code, and how it behaves compared to what git.git does). This is the next item on my TODO list.

@nulltoken Do you want to release this feature without any archiving plugin, or do you prefer to wait for a default tar implementation?

@nulltoken
libgit2 member

@nulltoken Do you want to release this feature without any archiving plugin, or do you prefer to wait for a default tar implementation?

@yorah I'd prefer to provide, at least, one working plugin for. However, it also depends on what @dougrathbone requires on his side ;-) Provided he intends to not rely on tar based archiving, this could be released more or less as is.

@nulltoken
libgit2 member

Some things popped up in my mind and I thought I’d share them with you guys. The current handler only deals with Blob paths (Tree structure being inferred from the path segments).

Beside files and directories, a TreeEntry may also point at

  • a symlink or an executable file. Should we hint the handler about this in order to let the implementation potentially cope with this? (Surprisingly, Linux guys tend to be very excited about those tiny little bits ;-) )
  • a commit (submodule). Should its content be (optionally) part of the archive (à la git-archive-all, for instance)

/cc @carlosmn

@carlosmn
libgit2 member
carlosmn commented Dec 7, 2012

This all depends on what the goal and needs are. For something that does what the git archive command does, you'd have to put symlinks and the correct mode in the tarball or zip archive and then ignore submodules, as they don't belong to the repo.

If you want to export everything including the submodules, mentioning git-archive might be a red herring, as what you really seem to want is a release script that runs on your dev machine (as you're unlikely to have all the required components anywhere else). Why bother with git-archive then? Loop through the submodules just as you would for the super-repo and tell your archiving library to add those files.

@yorah yorah referenced this pull request in yorah/tar-cs-lb2s Jan 5, 2013
Open

Tiny bits #1

@martinwoodward
libgit2 member

This looks good to me. Code appropriately credits http://code.google.com/p/tar-cs/ and reproduces the BSD license in the source code as well as links back to the source project.

@yorah
yorah commented Jan 14, 2013

Sorry for being a bit slow on this topic, I didn't get as much time as I wanted to work on it, but I still managed to make some progress!

The code is not yet finished, as it needs:

  • some prettifying (especially the TarWriter and TarArchiver classes)
  • more tests, more tests, more tests

I'm pushing this in an unfinished state because I stumbled across an interesting issue: during a normal git-archive, the checkout filters must be applied (see this thread from René Scharfe, and also the code of archive-tar.c and archive.c). The problem is that when iterating a Tree in LibGit2Sharp, there is no way (afaik) to get the content with the checkout filters applied.
It means that currently, the expected tar files (which have been generated using git-archive on my windows workstation), and the ones that I generated through ObjectDatabase.Archive(), do not have the same line ending (expected: \r\n, actual: \n). I think it is actually a quite blocking issue, so any feedback would be appreciated!

@carlosmn after playing a lot with archive-tar.c and archive-zip.c from git.git, I realized that if one wants to generate exactly the same archive files than the ones generated by git.git, it may make sense to have the archive functions in libgit2 instead of higher layers. There are quite a few tricks being applied when creating an archive (like ignoring the submodules, placing custom data in ustar fields when paxheaders are generated, ...), and it seems a bit strange to have each libgit2 wrapper re-implement it when it could be centralized.
I may be mistaken here, but I wanted to share this feedback with you, and also know what you happen to think about all that.

@yorah
yorah commented Jan 14, 2013

Here is a workaround that would allow to unblock the situation

  • generate the expected tar files without applying the filters (@nulltoken proposal: .gitattributes with "* binary\n")
  • compare the generated tar files with those tweaked expected tar files

However, I'm not comfortable with this workaround, because it means that LibGit2Sharp wouldn't generate the exact same tar/zip files as the ones generated by git.git (no checkout filters applied).

@carlosmn
libgit2 member

What do you mean when you say "exactly"? A byte-for-byte equal archive is not IMO a realistic thing to try to get, as not even all git versions produce the same results. There were recently some changes to make the zipfile generator compatible with more tools wrt Unicode paths, and even already this year there was a patch that changes the output yet again (then there's something about timestamps, which apparently gitweb has an issue with).

So as "stable" output is already not something you can get, I worry about putting this into the library for two main reasons:

  1. Creating a tarball or zipfile has nothing to do with managing a git repository, which is what the library is about. The git tool has this as a porcelain feature but there is no particular reason why it needs to
  2. Fixes to the way you produce a zipfile/tarball (particularly wrt to zip path encoding which seems to be a wild-west scenario where each tool has their slightly different way of specifying what it should be) should not depend on the version of the git library you're using.

If the issue is that the result should have checkout filters applied and there's no way to do that from outside the library, then that's what we should fix.

@yorah
yorah commented Jan 21, 2013

@carlosmn Thanks for your answer, after taking a step back, I agree with you. I think I was too much focusing on the code, and not so much on the bigger picture (what made be go click in my head is when you said "Fixes to the way you produce a zipfile/tarball [...] should not depend on the version of the git library you're using", and also the fact that even for git.git there are no "stable" output).

I will open an issue in libgit2 to be able to apply checkout filters from the outside.

Thanks again.

@nulltoken What would you like me to do with this PR in the meantime? Do you want to deliver the feature even if it's not 100% correct/complete, or should we wait for the filter thing in libgit2?

@yorah yorah referenced this pull request in libgit2/libgit2 Jan 21, 2013
Closed

Provide an API to apply filters #1264

@nulltoken
libgit2 member

@nulltoken What would you like me to do with this PR in the meantime? Do you want to deliver the feature even if it's not 100% correct/complete, or should we wait for the filter thing in libgit2?

@yorah I think I'd rather have it built on top on libgit2/libgit2#1264. Let's keep this PR open until it's ready.

@peterfearn

Hi, I'm new here but am interested in this archive feature making it. Can I do anything to help?

@dahlbyk
libgit2 member
dahlbyk commented Jul 29, 2013

Once libgit2/libgit2#1683 is merged, we can start binding the filter API and @yorah can update Archive() to use it.

@peterfearn

@dahlbyk ah ok, not a lot I can do there ;) - if I get some time though I might pick up an issue here

@nulltoken
libgit2 member

ZOMG tests are now passing!

@nulltoken nulltoken and 1 other commented on an outdated diff Dec 11, 2013
LibGit2Sharp.Tests/ArchiveTarFixture.cs
+using Xunit;
+
+namespace LibGit2Sharp.Tests
+{
+ public class ArchiveTarFixture : BaseFixture
+ {
+ [Fact]
+ public void CanArchiveACommitWithDirectoryAsTar()
+ {
+ var path = CloneBareTestRepo();
+ using (var repo = new Repository(path))
+ {
+ var sb = new StringBuilder();
+ sb.Append("* text eol=crlf\n");
+
+ Touch(Path.Combine(repo.Info.Path, "info"), "attributes", sb.ToString());
@nulltoken
libgit2 member
nulltoken added a note Dec 11, 2013

Please add a comment describing why a .gitattributes file is needed

@yorah
yorah added a note Dec 13, 2013

Done ,please tell me if it is explicit enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nulltoken nulltoken and 1 other commented on an outdated diff Dec 11, 2013
LibGit2Sharp/Core/TarWriter.cs
@@ -0,0 +1,445 @@
+/*
+ * BSD License
+ *
+ * Copyright (c) 2009, Vladimir Vasiltsov
@nulltoken
libgit2 member
nulltoken added a note Dec 11, 2013

@yorah Can you please add a link to the source repository?

@martinwoodward How do you feel about this inclusion of a third-party code?

@yorah
yorah added a note Dec 13, 2013

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nulltoken nulltoken and 1 other commented on an outdated diff Dec 11, 2013
LibGit2Sharp/TarArchiver.cs
+ public override void BeforeArchiving(Tree tree, ObjectId oid, DateTimeOffset modificationTime)
+ {
+ if (oid == null)
+ {
+ return;
+ }
+
+ // Store the sha in the pax_global_header
+ using (var stream = new MemoryStream(Encoding.ASCII.GetBytes(string.Format("52 comment={0}\n", oid.Sha))))
+ {
+ writer.Write("pax_global_header", stream, modificationTime, "666".OctalToInt32(),
+ "0", "0", 'g', "root", "root", "0", "0", oid.Sha, false);
+ }
+ }
+
+
@nulltoken
libgit2 member
nulltoken added a note Dec 11, 2013

E_TOOMANYNEWLINES

@yorah
yorah added a note Dec 13, 2013

done ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nulltoken nulltoken and 1 other commented on an outdated diff Dec 11, 2013
acknoledgments.md
@@ -0,0 +1,2 @@
+LibGit2Sharp is making use of the following OSS projects:
@nulltoken
libgit2 member
nulltoken added a note Dec 11, 2013

Beside the typo in the filename, I wonder if there's some standard regarding this kind of files (such as README, LICENCE, COPYING, ...)

/cc @martinwoodward

@yorah
yorah added a note Dec 13, 2013

done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nulltoken
libgit2 member

I like this very much! Very neat job!

Two requests:

  • Could you please make sure your comments match the expectations of the CommentPolicyOffice (cf #548 #458) :trollface:
  • How complex would it be to add some kind of optional callback based progress report?
@nulltoken nulltoken commented on the diff Dec 12, 2013
LibGit2Sharp/TarArchiver.cs
@@ -0,0 +1,89 @@
+using System;
+using System.IO;
+using System.Text;
+using LibGit2Sharp.Core;
+
+namespace LibGit2Sharp
+{
+ /// <summary>
+ /// Logic for tar archiving (not the actual tar format, but the overal logic related to tar+git) is taken
+ /// from https://github.com/git/git/blob/master/archive-tar.c.
@nulltoken
libgit2 member
nulltoken added a note Dec 12, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nulltoken
libgit2 member

How complex would it be to add some kind of optional callback based progress report?

@yorah Scratch this retarded request. As we recursively walk the Trees, I can't find an easy way to know beforehand the final number of objects to process.

@Aimeast
Aimeast commented Dec 12, 2013

I have a question, in my understanding, lg2# is a managed wrapper for lg2, why the Archiver implemented in lg2# rather than lg2?

/cc @nulltoken @yorah

@ethomson
libgit2 member

@Aimeast I believe that this is answered by the discussion at libgit2/libgit2#918

Thanks!

@Aimeast
Aimeast commented Dec 12, 2013

@ethomson Thank you! I agree with the discussion libgit2/libgit2#918. It is real easy to implement in app level as well as I have my custom Archiver.
So that, why lg2# intrdonce the TarArchiver? I mean, I think the function not must to have.

@carlosmn
libgit2 member

As we recursively walk the Treess, I can't find an easy way to know beforehand the final number of objects to process.

There's probably some value in giving feedback on the number of files you've already archived and their cumulative size, even if you don't know the full size.

As for calculating the full number, you would need to do two passes, essentially. For future reference, in case anybody cares enough to implement it, the way I'd go about it is to read the tree into an in-memory index, which means that you get a list of full paths, their mode and their id. The size for each file would need yet another pass, I think, unless you're filling the index yourself. You can then go on and work though the list, knowing how many files there are in total.

@nulltoken
libgit2 member

For future reference, in case anybody cares enough to implement it, the way I'd go about it is to read the tree into an in-memory index, which means that you get a list of full paths, their mode and their id.

You're right! git_index_read_tree() to the rescue...

@nulltoken
libgit2 member

For future reference, in case anybody cares enough to implement it, the way I'd go about it is to read the tree into an in-memory index, which means that you get a list of full paths, their mode and their id.

You're right! git_index_read_tree() to the rescue...

Although, I wouldn't put this PR on hold waiting for this to be implemented. @yorah once this is merged, would you please be so kind a to create an issue describing this enhancement?

yorah added some commits Nov 16, 2012
@yorah yorah Add Archive() to ObjectDatabase 2eb72b5
@yorah yorah Add tarring capabilities
This commit adds [tar-cs](http://code.google.com/p/tar-cs/) to LibGit2Sharp without any modifications to the original source code.
5701f42
@yorah yorah Trim down tar-cs to keep only the tarring capabilities needed for git…
…-archive

The resulting `TarWriter` file is a low-level abstraction that can be used to write tar files.

Included:
- only keep a unique `TarWriter` file
- merging `UsTarHeader` and `TarHeader`, and adding them as private classes of `TarWriter`
- removal of unneeded `Write` overloads

Some changes to the tar format has been included to better comply with the one described here: http://en.wikipedia.org/wiki/Tar_%28computing%29#Format_details
f2d5b19
@yorah yorah Add archive as tar functionality
Implemented as an extension method of `ObjectDatabase`.
Lots of logic taken from [archive-tar.c](https://github.com/git/git/blob/master/archive-tar.c).
36d40bd
@yorah yorah Add tar archiving tests 88bf65e
@yorah
yorah commented Dec 13, 2013

Could you please make sure your comments match the expectations of the CommentPolicyOffice (cf #548 #458) :trollface:

Done!
Note to self: don't keep long-running PR like this one 😉

@yorah once this is merged, would you please be so kind a to create an issue describing this enhancement?

Sure

@martinwoodward
libgit2 member

👍 Looks good to me! Thanks for your persistence @yorah

@nulltoken nulltoken merged commit 88bf65e into libgit2:vNext Dec 13, 2013

1 check passed

Details default The Travis CI build passed
@nulltoken
libgit2 member

Achievement unlocked: Amazing PR which took an amazingly long time to land.

It was worth the wait! 💖

@nulltoken
libgit2 member

@yorah once this is merged, would you please be so kind a to create an issue describing this enhancement?

Sure

@yorah It's been merged. Feel free to create the issue 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.