Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] OdbBackend: Python subclassing #948

Merged
merged 9 commits into from
Dec 21, 2019
Merged

Conversation

ddevault
Copy link
Contributor

@ddevault ddevault commented Oct 16, 2019

The basic approach here is to have backends created from C (loose, pack) circumvent the OdbBackend_init function. However, backends created from Python will not. We then use this backend to pull out the functions implemented in Python and rig them up through shim functions to the libgit2 git_odb_backend.

TODO:

  • read
  • read_prefix
  • read_header
  • write
  • exists
  • exists_prefix
  • refresh
  • foreach
  • Unit tests

Separate patch?

  • writestream
  • readstream
  • freshen
  • writepack
  • Custom refdbs
  • Create Repository with custom Odb and Refdb

@ddevault
Copy link
Contributor Author

All of the basics are here, would like some feedback on the approach. If this looks good, the remainder of the work is basically to repeat 52d7669 for each member of the git_odb_backend struct from libgit2.

I have a small script I've been using to "test" this:

from pygit2 import Odb, OdbBackend, OdbBackendPack

pack = OdbBackendPack(".git/objects")

class FooBackend(OdbBackend):
    def __init__(self):
        super().__init__(self)

    def read(self, oid):
        print("read called")
        return pack.read(oid)

    def read_prefix(self, oid):
        print("read_prefix called")
        # TODO: do this properly
        typ, data = pack.read(oid)
        return oid, typ, data

    def read_header(self, oid):
        print("read_header called")
        # TODO: do this properly
        typ, data = pack.read(oid)
        return typ, len(data)

fbe = FooBackend()
odb = Odb()
odb.add_backend(fbe, 1)
print(odb.read("209afad181e3f5f7b9f00088f2e9b6c63329a878"))

(This object ID can be found in pygit2). Proper unit tests will come prior to merge.

src/odb.c Show resolved Hide resolved
src/odb_backend.c Outdated Show resolved Hide resolved
src/odb_backend.c Outdated Show resolved Hide resolved
src/odb_backend.c Outdated Show resolved Hide resolved
@jdavid
Copy link
Member

jdavid commented Oct 20, 2019

It's a good start, but there're memory issues.

Maybe something like this?

struct pygit2_odb_backend
{
    git_odb_backend backend;
    void *payload;
};

typedef struct {
    PyObject_HEAD
    pygit2_odb_backend *odb_backend;
    PyObject *read_callable,
             *read_prefix_callable;
} OdbBackendUD;

(UD for User Defined)

So the C callbacks need a reference to the Python functions. Maybe libgit2's git_odb_backend should include a payload pointer, that's what is expressed above. Here payload would point to the OdbBackendUD instance, like in your code with py_odb_backend->self.

Note that we must Py_INCREF read_callable etc. and decref when deallocating. And free pygit2_odb_backend.

@ddevault
Copy link
Contributor Author

I'm not sure I understand the advantage (or difference) to the approach you've described here.

Would the current approach work if I made sure to incref the callables?

@jdavid
Copy link
Member

jdavid commented Oct 26, 2019

Most important is to handle memory correctly.

Otherwise the difference is small, it's about readability and maintenance:

  • Through the code we keep references to Python objects always in the type instance, so it
    would be consistent to keep the same programming style.

  • From the name py_odb_backend I'd say it's a Python type (which it's not), since we use the py_ prefix through the code (in variable names) to tell apart Python objects.

  • Then the self argument tells me again that it's a Python type, and that self points to himself?

So I'm not asking to change the approach, only about improving readability. A comment describing how pygit2_odb_backend works would help as well, basically you're passing a pygit2_odb_backend pointer to libgit2 functions that expect a git_odb_backend pointer.


Finally, the way I've written void *payload is stating that this maybe should be part of
git_odb_backend. That's a question for libgit2 developers, @carlosmn ? @pks-t ? @ethomson ?

We need a pointer back to Python to find our callbacks written in Python, this should be
useful not only for pygit2 but for other bindings as well.

Or maybe I've misunderstood something, or there's another way. Do other bindings support
writing custom backends in their languages?

@ethomson
Copy link
Member

👋 The custom backend types don’t add payloads since they’re extendable by the implementer. We’d encourage you to do what it looks like you recommended - with the git_odb_backend struct at the beginning of your own data struct. Then you can pass that to libgit2, who will read only into our backend data struct, but we’ll pass it back to you and you can refer to your bits at the end.

I’m not sure which bindings implement custom backends but the .NET bindings do. Others may? Not something that I’ve surveyed.

@ddevault
Copy link
Contributor Author

Regarding the use of an opaque pointer (e.g. void *payload), it seems libgit2 isn't designed for that. Like @ethomson stated, subtyping a struct is a common (but not universally understood) pattern in C which has a lot of advantages compared to opaque pointers. The approach I've used in this patch is based on that pattern.

Latest patchset includes the following changes:

  1. Rename py_odb_backend to pygit2_odb_backend to disambiguate between it and Python objects
  2. Add a comment in odb_backend.c explaining pygit2_odb_backend
  3. Rename pygit2_odb_backend.self to OdbBackend, which is more accurate.

I think that this settles the design constraints, if we like it then I think the approach can be generalized to the rest of the git_odb_backend members, tests can be written, and this patch will be ready for merge.

src/odb_backend.c Outdated Show resolved Hide resolved
@jdavid
Copy link
Member

jdavid commented Oct 27, 2019

Okay thanks @ethomson and @ddevault

Only made a tiny comment, you can go ahead @ddevault

@ddevault
Copy link
Contributor Author

Cheers! I should have this finished up tomorrow.

@ddevault
Copy link
Contributor Author

Actually, something else came up of a higher priority, I'm going to return to this in a few weeks at the latest.

@ddevault
Copy link
Contributor Author

ddevault commented Dec 1, 2019

Apologies for the wait - I'll be picking this up next week.

The basic approach here is to have backends created from C (loose, pack)
circumvent the OdbBackend_init function. However, backends created from
Python will not. We then use this backend to pull out the functions
implemented in Python and rig them up through shim functions to the
libgit2 git_odb_backend.
@ddevault
Copy link
Contributor Author

ddevault commented Dec 6, 2019

Okay, this should be ready for review. I've trimmed down the scope to just supporting read-only backends, added tests, and cleaned things up.

@ddevault
Copy link
Contributor Author

ddevault commented Dec 6, 2019

Next set of patches will add similar riggings for the refdb, and expanding both for read/write access separately.

@jdavid
Copy link
Member

jdavid commented Dec 8, 2019

Looks good, but tests are failing in AppVeyor.

@ddevault
Copy link
Contributor Author

ddevault commented Dec 8, 2019

Looks like some undecipherable Windows nonsense. I don't use or have any interest in supporting Windows, can someone who does take a look?

@jdavid
Copy link
Member

jdavid commented Dec 15, 2019

The error happens in the call to free:

    /* XXX: This assumes the default libgit2 allocator is in use and will
     * probably segfault and/or destroy the universe otherwise */
    free(data);

See https://ci.appveyor.com/project/jdavid/pygit2/builds/29550154

@ethomson
Copy link
Member

ethomson commented Dec 15, 2019 via email

@ddevault
Copy link
Contributor Author

Updated to use the new libgit2 functions, will rebase & revisit when the next libgit2 release lands.

@jdavid jdavid merged commit 053637c into libgit2:master Dec 21, 2019
@ddevault ddevault deleted the odb_backend_py branch December 21, 2019 13:49
@ddevault
Copy link
Contributor Author

Thanks! Happy holidays :)

@jdavid
Copy link
Member

jdavid commented Dec 22, 2019

you too!

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Mar 1, 2020
1.1.0 (2020-03-01)
-------------------------

- Upgrade to libgit2 0.99
  `#959 <https://github.com/libgit2/pygit2/pull/959>`_

- Continued work on custom odb backends
  `#948 <https://github.com/libgit2/pygit2/pull/948>`_

- New ``Diff.patchid`` getter
  `#960 <https://github.com/libgit2/pygit2/pull/960>`_
  `#877 <https://github.com/libgit2/pygit2/issues/877>`_

- New ``settings.disable_pack_keep_file_checks(...)``
  `#908 <https://github.com/libgit2/pygit2/pull/908>`_

- New ``GIT_DIFF_`` and ``GIT_DELTA_`` constants
  `#738 <https://github.com/libgit2/pygit2/issues/738>`_

- Fix crash in iteration of config entries
  `#970 <https://github.com/libgit2/pygit2/issues/970>`_

- Travis: fix printing features when building Linux wheels
  `#977 <https://github.com/libgit2/pygit2/pull/977>`_

- Move ``_pygit2`` to ``pygit2._pygit2``
  `#978 <https://github.com/libgit2/pygit2/pull/978>`_

Requirements changes:

- Now libgit2 0.99 is required
- New requirement: cached-property

Breaking changes:

- In the rare case you're directly importing the low level ``_pygit2``, the
  import has changed::

    # Before
    import _pygit2

    # Now
    from pygit2 import _pygit2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants