Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix repository crash if path passed is not a str #593

Merged
merged 2 commits into from
Feb 28, 2016
Merged

Fix repository crash if path passed is not a str #593

merged 2 commits into from
Feb 28, 2016

Conversation

thomwiggers
Copy link
Contributor

This perhaps isnt' the most elegant fix, but otherwise all callers need to be patched.

Closes #588

@thomwiggers
Copy link
Contributor Author

Ok so that build was a catastrophe, will fix it in a bit.

@jdavid
Copy link
Member

jdavid commented Feb 27, 2016

Hello, could you please provide a unit test? If possible the unit test should be the first commit, and the fix would be the second commit. Extra points for making a new branch on top of current master.

Note too that pygit2 supports both Python 2.7 and 3, and that str is not the same thing in Python 2 and 3.

Thanks

@thomwiggers
Copy link
Contributor Author

Note too that pygit2 supports both Python 2.7 and 3, and that str is not the same thing in Python 2 and 3.

Yeah, that difference is the whole point. If pygit2 is passed a path represented as bytes, it throws a TypeError (#588).

I'm actually not sure if this is pygit2's problem or the problem of the applications using pygit2's API. I have the feeling that lots of other parts of pygit2 will also fail if they are passed paths as bytes-strings. However, if you reach this decision, it should probably be documented clearly.

(I mainly made this PR and issue #588 because tracking down where the bytes came from in powerline was more difficult and I couldn't find a clear position in the pygit2 docs.)

@thomwiggers
Copy link
Contributor Author

I have added a test and rebased on master.

def __init__(self, *args, **kwargs):
super(Repository, self).__init__(*args, **kwargs)
def __init__(self, path, *args, **kwargs):
if not isinstance(path, str):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be something like six.string_types here? Because this will surely not work/break on py3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

six wasn't a dependency of this project so I was hesitant to include it.

It will actually not break though. On py2, you'll pass a str or unicode to this function. unicode has .decode and will decode to str. py2 str is accepted by Repository just fine. On py3 you'll pass a str or bytes to this function. bytes will .decode to str. py3 str is also accepted by Repository.

If you pass anything else then it'll break with a vague error about not having .decode, but this is the best I could come up with without six.

@jdavid
Copy link
Member

jdavid commented Feb 28, 2016

  • With Python 2 paths and the like are supposed to be str (bytes), but the API should accept text (unicode) too.
  • With Python 3 paths and the like are supposed to be str (unicode), but should accept bytes too
    (decoded as utf-8).

Basically pygit2 tries to be a good Python 2 citizen and good Python 3 citizen, which develops a kind of
schizophrenia.

At the C level, the relevant line is at src/repository.c:100:

if (!PyArg_ParseTuple(args, "s", &path))

In Python 2 this is interpreted as bytes or unicode. In Python 3 this only accepts unicode.

@jdavid
Copy link
Member

jdavid commented Feb 28, 2016

I have applied the failing test. Now about the fix.

@jdavid
Copy link
Member

jdavid commented Feb 28, 2016

+1 to @pypingou idea of using six. As the Python side of pygit2 grows six will be useful for this and other
situations.

Then yes the test would be:

if not isinstance(path, six.string_types):

I leave you, @pypingou or @thomwiggers some time to add six as dependency 😉 .. otherwise I will do it.

It would be interesting too, but not required, to add tests for paths with non ascii chars.

@thomwiggers
Copy link
Contributor Author

FYI: I've looked into fixing it in the C API instead of in the python code, but that does not seem possible because the needed converter is only available in py3.2+: https://docs.python.org/3.5/c-api/unicode.html#c.PyUnicode_FSDecoder

@thomwiggers
Copy link
Contributor Author

I've rebased on master, slightly modified my tests and am now using six.string_types.

Add unit test for bytes repository paths
Add a unicode path test for Repositories
Tries to decode any non-string objects (such as bytes)

Introduces `six` as a dependency

Closes #588
@jdavid jdavid merged commit 735510f into libgit2:master Feb 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Repository crashes on bytes paths on Python 3
3 participants