Skip to content

Regression in 0.27.1: Commit.message no longer handles unspecified encodings correctly #839

@cjwatson

Description

@cjwatson

When upgrading an application built on top of pygit2 from 0.24.2 to 0.27.2, our test suite caught a regression in the handling of commit messages with unspecified encoding, which I tracked down to commit bbf4b79. You can see the problem with the very commit to git itself that's referenced in comments in to_unicode_n:

$ python
Python 2.7.6 (default, Nov 13 2018, 12:45:42)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygit2
>>> repo = pygit2.Repository('.')
>>> repo['c31820c2']
<_pygit2.Commit object at 0x7fa699661230>
>>> repo['c31820c2'].message
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf8 in position 126: invalid start byte

Would it perhaps make sense for Commit_message__get__ to call to_unicode(message, encoding, NULL) rather than to_unicode(message, encoding, "strict"), so that it continues to benefit from the fallback to "replace" even after this change?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions