Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch does not have a reference to underlying blob(s) #754

Closed
brandonio21 opened this issue Dec 3, 2017 · 13 comments · Fixed by #761
Closed

Patch does not have a reference to underlying blob(s) #754

brandonio21 opened this issue Dec 3, 2017 · 13 comments · Fixed by #761

Comments

@brandonio21
Copy link
Contributor

Please feel free to comment with your thoughts, investigations, inquiries, etc. I'm not sure if what I am reporting is expected behavior or a bug, but I'd like to get to the cause anyway.

In #744 , tests were added which were broken for some users, as reported #748. In #753 I fixed the problem by not using create_blob, as create_blob causes the issue.

What happened? Why did this happen?

@jdavid
Copy link
Member

jdavid commented Dec 4, 2017

Could you post a simple script that produces the problem, so I can start investigating?

@brandonio21
Copy link
Contributor Author

@jdavid : Well that seems like an extremely reasonable suggestion.

Consider the following:

old = self.repo[self.repo.create_blob(OLD_CONTENT)]
new = self.repo[self.repo.create_blob(NEW_CONTENT)]

patch = old.diff(new)
print(patch.patch)

The expected outcome is something like:

diff --git a/file b/file
index a520c24..d675fa4 100644
--- a/file
+++ b/file
@@ -1,3 +1 @@
-hello world
-hola mundo
-bonjour le monde
+foo bar

Actual output is sometimes as expected, sometimes unprintable due to unicode/ascii conversion errors, and sometimes mangled as such:

diff --git a/file b/file
index a520c24..d675fa4 100644
--- a/file
+++ b/file
@@ -1,3 +1 @@
-

@brandonio21
Copy link
Contributor Author

@jdavid : Looking a little more into it, if I patch.c

@@ -199 +199 @@
- py_patch = to_unicode(buf.ptr, NULL, NULL);
+ py_patch = to_unicode(buf.ptr, "utf-8", "strict");

Then a crash occurs instead of this "corruption" due to a utf-8 conversion error. I don't know if this helps at all. This either says that to_unicode is broken (which it's not) or somehow the patch is getting corrupted.

If I further:

@@ -199 +199 @@
- py_patch = to_unicode(buf.ptr, NULL, NULL);
+ py_patch = to_unicode_n(buf.ptr, buf.size, "utf-8", "replace");

I get similar behavior.

Maybe this helps. If you have any information or any insight, I'd love to hear it!

@brandonio21
Copy link
Contributor Author

Okay, I started doing some gdb investigations. Granted, I'm not very good at any of this.. but I've determined that in the test_patch_multi_blob (in #758), when patch.patch is called from a patch that has been added to the list, the self->patch->lines.ptr[x] data is corrupt for the objects.

I thought this meant that we were missing a Py_INCREF somewhere and the actual content of the patch was being prematurely deallocated.. but I tried inserting a INCREF almsot everywhere and couldn't make a noticable change.

Hopefully this aids in progress towards a solution

@jdavid
Copy link
Member

jdavid commented Dec 9, 2017

Hi @brandonio21 , thanks for your research. Everything works here, including the news tests in #758 , but it may be pure luck. I am reviewing Patch.patch

@jdavid
Copy link
Member

jdavid commented Dec 9, 2017

Pushed a commit, could you try?

The error may be that git_patch_to_buf was called twice on success. Changing the parameters to to_unicode should have no effect, as the content is ascii.

@brandonio21
Copy link
Contributor Author

brandonio21 commented Dec 9, 2017

Great news! I've tracked down the problem. The repro code is:

def test_patch_delete_blob(self):                                        
    blob = self.repo[BLOB_OLD_SHA]                                       
    patch = pygit2.Patch.create_from(                                    
        blob,                                                            
        None,                                                            
        old_as_path=BLOB_OLD_PATH,                                       
        new_as_path=BLOB_NEW_PATH,                                       
    )                                                                    
                                                                         
    # Make sure that even after deleting the blob the patch still has the
    # necessary references to generate its patch                         
    del blob                                                             
    self.assertEqual(patch.patch, BLOB_PATCH_DELETED)                    

Alternatively,

def test_patch_delete_blob(self):                                        
    blob = self.repo[BLOB_OLD_SHA]                                       
    patch = blob.diff_to_buffer(None)                                                            
                                                                         
    # Make sure that even after deleting the blob the patch still has the
    # necessary references to generate its patch                         
    del blob                                                             
    self.assertEqual(patch.patch, BLOB_PATCH_DELETED)       

I will submit a patch to fix it now :^)

@brandonio21
Copy link
Contributor Author

Now that I understand the problem more fully, I added more tests to #758 which fail in CI due to the problem described in this issue.

@jdavid
Copy link
Member

jdavid commented Dec 10, 2017

Great job @brandonio21 ! Now that's something I can reproduce. Unfortunately the fix in #760 is not correct, PyObject_New already returns a new reference, incrementing it again will produce a leak. The bug is somewhere else.

@jdavid
Copy link
Member

jdavid commented Dec 10, 2017

Below is the equivalent C code reproducing the same behaviour. This surprises me a little bit, but I don't know so much about libgit2 internals. Maybe @carlosmn or @ethomson could enlighten us a bit? I thought objects in libgit2 were reference counted, if git_patch depends on git_blob shouldn't it keep a reference to it?

#include <stdio.h>
#include <git2.h>

#define REPO_PATH "."
#define OID "3b18e512dba79e4c8300dd08aeb37f8e728b8dad"

void main()
{
    int err;

    git_libgit2_init();

    // Repository
    git_repository *repo;
    err = git_repository_open(&repo, REPO_PATH);
    if (err < 0) { printf("Error\n"); exit(1); }

    // Oid
    git_oid oid;
    err = git_oid_fromstr(&oid, OID);
    if (err < 0) { printf("Error\n"); exit(1); }

    // Blob
    git_blob *blob;
    err = git_blob_lookup(&blob, repo, &oid);
    if (err < 0) { printf("Error\n"); exit(1); }

    // Patch
    git_patch *patch;
    git_diff_options opts = GIT_DIFF_OPTIONS_INIT;
    err = git_patch_from_blob_and_buffer(&patch, blob, NULL, NULL, 0, NULL, &opts);
    if (err < 0) { printf("Error\n"); exit(1); }

    // Print patch (good)
    git_buf buf = {NULL};
    err = git_patch_to_buf(&buf, patch);
    if (err < 0) { printf("Error\n"); exit(1); }
    printf("%s", buf.ptr);
    git_buf_free(&buf);

    // Free blob
    printf("\n\n");
    git_blob_free(blob);

    // Print patch (bad)
    err = git_patch_to_buf(&buf, patch);
    if (err < 0) { printf("Error\n"); exit(1); }
    printf("%s", buf.ptr);
    git_buf_free(&buf);
}

@brandonio21
Copy link
Contributor Author

@jdavid Thanks for posting that. Now that we know more about the issue, I'm going to rename all the corresponding PRs and issues.

@brandonio21 brandonio21 changed the title [Investigation] Repository.create_blob patch problem Patch does not have a reference to underlying blob(s) Dec 10, 2017
@brandonio21
Copy link
Contributor Author

While waiting for the issue to be resolved on the libgit2 level, the following Python-level patch can be applied: https://github.com/brandonio21/pygit2/commit/c0eaa698d648af33a72c743ba296dd1427bcb81f

@jdavid
Copy link
Member

jdavid commented Dec 13, 2017

That looks good! Could you make a PR please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants