Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved support for navigating trees: #903

Merged
merged 21 commits into from
Jun 26, 2019

Conversation

rcoup
Copy link
Contributor

@rcoup rcoup commented Apr 26, 2019

Navigating tree objects is pretty crude via Pygit2 at the moment, requiring a lot of boilerplate lookups.

This change incorporates the same / operator approach as Pathlib and GitPython:

  • my_tree / 'path' resolves to a TreeEntry (same as my_tree['path'] currently)
  • If a TreeEntry is of type tree, then:
    • my_tree_entry / 'path' resolves to a child TreeEntry
    • my_tree_entry['path'] and my_tree_entry[0] similarly
    • 'path' in my_tree_entry returns a boolean.
  • this enables my_tree / 'path' / 'path' navigation

Ideally I'd suggest that maybe Tree should probably be a subclass of TreeEntry so it kinda magically works how the user expects, but that would require quite a few changes.

@jdavid
Copy link
Member

jdavid commented Apr 27, 2019

Looks good! Could you fix the tests in Travis? Don't worry about AppVeyor, they're broken for a different reason.
And update the documentation please. Thanks!

@rcoup
Copy link
Contributor Author

rcoup commented Apr 30, 2019

Been using this the last couple of days... it's much better to navigate the tree, but still a bit awkward to actually get to blobs.

Before:

meta_tree = repo[commit.tree['meta'].id]
features_tree = repo[meta_tree['features'].id]
info_blob = repo[features_tree['info.json'].id]
info = json.loads(info_blob.data)

Now:

feature_info = commit.tree / 'meta' / 'features' / 'info.json'
info = json.loads(repo[feature_info.id].data)

With latest commit incorporating .blob and .tree properties.

feature_info = commit.tree / 'meta' / 'features' / 'info.json'
info = json.loads(feature_info.blob.data)

Thoughts?

Submodules end up with a TreeEntry.type=commit, but I can't find a way to lookup the submodule from the oid, only from the absolute path (which we don't have in the TreeEntry, only the relative name) — so I've left that alone.

@rcoup
Copy link
Contributor Author

rcoup commented Apr 30, 2019

@jdavid can you give the ref-counting/free'ing a review?

  1. I'm Incrementing the repo refcount when it's associated with a TreeEntry
  2. and clearing it when the TreeEntry is dealloc'd.
  3. AFAICT using the various wrap_ methods means the libgit2-alloc'd items will get cleaned up
  4. I'm freeing the temporary git_tree subtrees I use in getitem/truediv

src/tree.c Outdated Show resolved Hide resolved
src/tree.c Outdated Show resolved Hide resolved
@jdavid
Copy link
Member

jdavid commented May 4, 2019

Good work! The ref-counting/free'ing looks correct. Just made a couple of tiny comments.

About the user interface:

  • Would it be simpler for the user to have treeentry.obj instead of treentry.blob and treeentry.tree?
  • Why not to support treeentry / 0? That would make _getitem and _truediv to behave the same.

I think these changes would be nice for the user, and they would simplify a bit the code as well.

@dsully
Copy link
Contributor

dsully commented May 7, 2019

These changes look fantastic!

@rcoup
Copy link
Contributor Author

rcoup commented May 8, 2019

I considered both of these, I'll add my thoughts below. Happy to be challenged.

Would it be simpler for the user to have treeentry.obj instead of treentry.blob and treeentry.tree?

I like the elegance, but I can't think of an actual use-case for it? Seems like either I'd expect a blob or another tree for any particular case? "Explicit is better than implicit" and all.

Commits are another type possibility wrt submodules, but I've left that for someone else when they need it.

Why not to support treeentry / 0? That would make _getitem and _truediv to behave the same.

If I'm iterating through everything in a directory then I'd always use an iterator or for/getitem. I know it makes it simpler for the implementation, but can't see how it helps the user?

@jdavid
Copy link
Member

jdavid commented May 10, 2019

The fact that these two changes make the code simpler is a big advantage. It's difficult to stress it enough, 150+ developers have contributed changes to pygit2, and today I maintain those changes/code (and when I leave someone else will do).

On the other side, the fact that we cannot imagine a use case, doesn't mean it doesn't exist, and more importantly it doesn't hurt, it's a feature superset anyway.

@rcoup
Copy link
Contributor Author

rcoup commented Jun 6, 2019

@jdavid sorry, been distracted with other work. I've made both the changes you suggested. 👍

- `my_tree / 'path'` resolves to a TreeEntry (same as `my_tree['path']`)
- If `my_tree_entry` is of type `tree`: `my_tree_entry / 'path'`, `my_tree_entry['path']`, and `my_tree_entry[0]` all resolve to another TreeEntry (same as `repo[my_tree_entry.id]['path']`)

This enables `my_tree / 'path' / 'path'` navigation ala Pathlib & GitPython.
Saves doing another `repo[tree_entry.id]` lookup

Tree navigation: `blob = (tree / 'path' / 'file').blob`
- drop .blob & .tree in favour of .obj
- use getitem implementation for truediv: allows `T / 'a' / 0` to correspond to `T['a'][0]`
@rcoup
Copy link
Contributor Author

rcoup commented Jun 10, 2019

Rebased on latest master

@rcoup
Copy link
Contributor Author

rcoup commented Jun 10, 2019

Don't worry about AppVeyor, they're broken for a different reason.

hmm, not so sure about that?

...
C:\Users\appveyor\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\projects\pygit2\venv\include -IC:\Python27\include -IC:\Python27\PC /Tcsrc\tree.c /Fobuild\temp.win32-2.7\Release\src\tree.obj
tree.c
src\tree.c(199) : error C2143: syntax error : missing ';' before 'type'
src\tree.c(200) : error C2065: 'err' : undeclared identifier
src\tree.c(201) : error C2065: 'err' : undeclared identifier
src\tree.c(208) : error C2143: syntax error : missing '{' before '*'
src\tree.c(220) : error C2143: syntax error : missing ';' before 'type'
src\tree.c(221) : error C2065: 'err' : undeclared identifier
src\tree.c(222) : error C2065: 'err' : undeclared identifier
src\tree.c(226) : warning C4133: 'return' : incompatible types - from 'PyObject *' to 'int *'
src\tree.c(240) : warning C4133: 'return' : incompatible types - from 'int *' to 'PyObject *'
src\tree.c(385) : warning C4028: formal parameter 1 different from declaration
src\tree.c(385) : warning C4133: 'initializing' : incompatible types - from 'TreeEntry *(__cdecl *)(TreeEntry *,PyObject *)' to 'binaryfunc'
src\tree.c(539) : warning C4090: 'function' : different 'const' qualifiers
src\tree.c(836) : warning C4028: formal parameter 1 different from declaration
src\tree.c(836) : warning C4133: 'initializing' : incompatible types - from 'TreeEntry *(__cdecl *)(Tree *,PyObject *)' to 'binaryfunc'
...

@rcoup
Copy link
Contributor Author

rcoup commented Jun 10, 2019

@jdavid at least one of the Windows build configs passed now 👍 Current appveyor failure looks intermittent, can you trigger a rebuild?

Config.__init__() does not always set self._config, which can cause
trouble during __del__(). This avoids seemingly spurious and not
really helpful exceptions.
Fixes libgit2#916
@jdavid
Copy link
Member

jdavid commented Jun 13, 2019

Thanks!

Yes I fixed AppVeyor in master. I've re-run the incomplete jobs several times, but there're still failures.

@rcoup
Copy link
Contributor Author

rcoup commented Jun 13, 2019

@jdavid aha, there's an actual compile error there now. Will figure it out somehow :)

- `my_tree / 'path'` resolves to a TreeEntry (same as `my_tree['path']`)
- If `my_tree_entry` is of type `tree`: `my_tree_entry / 'path'`, `my_tree_entry['path']`, and `my_tree_entry[0]` all resolve to another TreeEntry (same as `repo[my_tree_entry.id]['path']`)

This enables `my_tree / 'path' / 'path'` navigation ala Pathlib & GitPython.
Saves doing another `repo[tree_entry.id]` lookup

Tree navigation: `blob = (tree / 'path' / 'file').blob`
rcoup and others added 6 commits June 14, 2019 15:56
- drop .blob & .tree in favour of .obj
- use getitem implementation for truediv: allows `T / 'a' / 0` to correspond to `T['a'][0]`
@jdavid
Copy link
Member

jdavid commented Jun 16, 2019

fixed in commit f0f8cc2

@rcoup
Copy link
Contributor Author

rcoup commented Jun 25, 2019

@jdavid pulled your changes into here. ✅

@jdavid jdavid merged commit 0bbd717 into libgit2:master Jun 26, 2019
@jdavid
Copy link
Member

jdavid commented Jun 26, 2019

Merged thanks!

I still want to give a 2nd look. Specifically this adds a reference to the repository from TreeEntry. I wonder whether it would be possible to implement the same feature without that reference, and in a way that is as friendly to the user. But have not yet stopped to think about it, I'll open an issue if I've some idea.

@rcoup
Copy link
Contributor Author

rcoup commented Jun 26, 2019

Thanks ❤️

this adds a reference to the repository from TreeEntry. I wonder whether it would be possible to implement the same feature without that reference

Yeah, needs the repo reference to be able to look up the associated objects, but is a small wart implementation-wise.

One alternative is to keep a reference to the parent/root tree, store one repository reference there, and access everything that way — but then TreeEntry's would need a parent reference instead: I decided the outcome wasn't much different, though it's arguably more elegant (I guess that would allow support for mytreeentry / ".." / "foo" too).

Right at the start I mentioned that I think ideally Tree should be a subclass of TreeEntry so no matter where you start everything is the same, maybe that's something to change as well. Though right now at least they work effectively the same.

@rcoup rcoup deleted the better-tree-nav branch June 26, 2019 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants