Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression, manually build tree is corrupt #2489

Closed
jdavid opened this issue Jul 29, 2014 · 12 comments
Closed

Regression, manually build tree is corrupt #2489

jdavid opened this issue Jul 29, 2014 · 12 comments

Comments

@jdavid
Copy link
Member

jdavid commented Jul 29, 2014

The code below works with libgit2 0.20.0 but does not with 0.21.0

When I have time I will rewrite the test script to C (and test against a checkout of libgit2, right now pygit2 does not build against a checkout).

To test the script:

$ python test.py
index.write_tree() 7d1d25de5e8a5dfaafda45bd4274908f94319c2e
manually           8b9c6b3b6aafa0751fed36f12a6de5199e86d24d
$ cd /tmp/test
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   foo.txt

The test code does not touch foo.txt at all. The tree built manually is wrong.

This is the script:

from os import mkdir
from os.path import join
from pygit2 import init_repository
from pygit2 import GIT_FILEMODE_BLOB, GIT_FILEMODE_TREE


def create_repo():
    repo = init_repository('/tmp/test')
    # Create file
    path = join(repo.workdir, 'foo.txt')
    open(path, 'w').write('foo.txt\n')
    # Stage changes
    index = repo.index
    index.add('foo.txt')
    index.write()
    # Make first commit
    author = repo.default_signature
    root = index.write_tree()
    repo.create_commit('HEAD', author, author, 'bar', root, [])
    # Ok
    return repo


if __name__ == '__main__':
    # Initialize the repo, the real test is just below
    repo = create_repo()

    # Make changes to the working copy
    path = join(repo.workdir, 'foo')
    mkdir(path)
    path = join(path, 'bar')
    open(path, 'w').write('bar\n')

    # Stage changes
    index = repo.index
    index.add('foo/bar')
    index.write()

    # Commit
    parent = repo.revparse_single('HEAD')
    author = repo.default_signature

    foo = repo.TreeBuilder()
    # Note: Change .id to .oid if you want to test with pygit2 0.20.3
    foo.insert('bar', index['foo/bar'].id, GIT_FILEMODE_BLOB)
    foo = foo.write()

    root = repo.TreeBuilder(parent.tree)
    root.insert('foo', foo, GIT_FILEMODE_TREE)
    root = root.write()

    print 'index.write_tree()', index.write_tree()
    print 'manually          ', root

    repo.create_commit('HEAD', author, author, 'bar', root, [parent.oid])
@carlosmn
Copy link
Member

Does git fsck in this repo complain about either tree? It's quite odd that we would see a difference, since the index uses the treebuilder to create its trees.

@jdavid
Copy link
Member Author

jdavid commented Jul 30, 2014

Yes:

$ git fsck
Checking object directories: 100% (256/256), done.
error in tree 8b9c6b3b6aafa0751fed36f12a6de5199e86d24d: not properly sorted

@jdavid
Copy link
Member Author

jdavid commented Jul 30, 2014

Below the test code in C.

#include <git2.h>
#include <fcntl.h>

#define FILEMODE (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)
#define DIRMODE (S_IRUSR | S_IWUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH)


void main()
{
    git_repository *repo;
    git_index *index;
    const git_index_entry *ientry;
    git_oid commit_id, tree_id;
    git_commit *commit;
    git_tree *tree;
    git_signature *author;
    git_treebuilder *tb;
    int file;

    /* Initialize the repo */
    git_repository_init(&repo, "/tmp/test", 0);
    git_signature_default(&author, repo);
    /* Create file */
    file = creat("/tmp/test/foo.txt", FILEMODE);
    write(file, "foo\n", 4);
    close(file);
    /* Stage changes */
    git_repository_index(&index, repo);
    git_index_add_bypath(index, "foo.txt");
    git_index_write(index);
    /* First commit */
    git_index_write_tree(&tree_id, index);
    git_tree_lookup(&tree, repo, &tree_id);
    git_commit_create_v(&commit_id, repo, "HEAD", author, author, NULL, "foo", tree, 0);
    git_commit_lookup(&commit, repo, &commit_id);

    /* Make changes */
    mkdir("/tmp/test/foo", DIRMODE);
    file = creat("/tmp/test/foo/bar", FILEMODE);
    write(file, "bar\n", 4);
    close(file);
    /* Stage changes */
    git_index_add_bypath(index, "foo/bar");
    git_index_write(index);
    ientry = git_index_get_bypath(index, "foo/bar", 0);
    /* Create the foo tree manually */
    git_treebuilder_create(&tb, NULL);
    git_treebuilder_insert(NULL, tb, "bar", &(ientry->id), GIT_FILEMODE_BLOB);
    git_treebuilder_write(&tree_id, repo, tb);
    /* Create the root tree manually (BUG) */
    git_treebuilder_create(&tb, tree);
    git_treebuilder_insert(NULL, tb, "foo", &tree_id, GIT_FILEMODE_TREE);
    git_treebuilder_write(&tree_id, repo, tb);
    /* Commit */
    git_tree_lookup(&tree, repo, &tree_id);
    git_commit_create_v(&commit_id, repo, "HEAD", author, author, NULL, "bar", tree, 1, commit);
}

@jdavid
Copy link
Member Author

jdavid commented Jul 30, 2014

The bug is fixed in the master branch, apparently fixed by the work in the treebuilder-perf branch.

Now, this is in my opinion a serious enough bug to deserve a release.

@vmg
Copy link
Member

vmg commented Jul 30, 2014

Yes it is. We'll ship a minor point release.

@carlosmn
Copy link
Member

I can reproduce here, and worryingly enough it seems to be related to using git_vector_insert_sorted(). instead of git_vector_insert(). Reverting that change makes it work again.

When using _sorted(), we end up with "foo" being sorted before "foo.txt", which makes it look like _sorted() is buggy.

@arrbee
Copy link
Member

arrbee commented Jul 31, 2014

Is it just that the comparison function changes between what is used by git_vector_sort and git_vector_insert_sorted?

@carlosmn
Copy link
Member

We only have one comparison function, which is set at the beginning and (should be) used throughout.

@jacquesg
Copy link
Contributor

Shot in the dark here (didn't look at the code for this), does the comparison function satisfy "strict weak ordering" criteria?

@carlosmn
Copy link
Member

Creating a tree entry happens in two steps, first we allocate it, passing the filename so we can allocate enough space, and then we set the attributes and copy the id.

In the current code we set the attributes together with the id after inserting into the vector. This lets us use the same two lines to prepare a new entry and to revive an deleted one (which we keep around because we may have returned a pointer to the user, even though now we're changing it from under them, but that's a different story).

As we are now sorting on insert, we need to have the attributes correctly set, as the Git tree sorting order behaves as though tree entries had a trailing slash which is not in the entry's name. As we haven't set the entry to be a tree yet, we're sorting it as though it were a blob's name, leading to incorrect sorting.

@carlosmn
Copy link
Member

There's a fix in the cmn/treebuilder-set-attribute branch. We don't have a stable branch anymore, so there's no PR for this.

@arrbee arrbee mentioned this issue Jul 31, 2014
1 task
@jdavid
Copy link
Member Author

jdavid commented Aug 1, 2014

The fix from #2494 merged into the maint/v0.21branch works for me. Guess we can close this issue, and just follow the release at #2495. Thanks!

@jdavid jdavid closed this as completed Aug 1, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants