-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix segfault in Signature.__repr__ & __str__ when the signature's encoding is incorrect or unknown (#1205) #1210
Conversation
`to_unicode(..., ..., NULL)` is equivalent to `to_unicode(..., ..., "strict")`. It returns NULL if the string cannot be decoded to the desired encoding. PyUnicode_FromFormat will crash if we pass NULL pointers to be formatted as %R or %U. Instead, let's use `to_unicode(..., ..., "replace")` so that we always get a non-NULL object to pass to PyUnicode_FromFormat. This ties in with libgit2#1155.
If instead of calling str/repr we access the name, then we get an exception:
A similar error is raised if trying to get the message:
For coherence should not the same error be raised when calling str/repr? |
For reference I wrote this test script: from pathlib import Path
import pygit2
def test_commit(commit_id):
commit = repo.get(commit_id)
print(f'{commit_id} encoding={commit.message_encoding}')
signature = commit.author
try:
print('mail', signature.email)
except UnicodeDecodeError:
print('raw ', signature.raw_mail)
print('mail', signature.raw_mail.decode('utf-8'))
try:
print('name', signature.name)
except UnicodeDecodeError:
print('raw ', signature.raw_name)
print('name', signature.raw_name.decode('utf-8'))
try:
print(commit.message)
except UnicodeDecodeError:
print(commit.raw_message)
print(commit.raw_message.decode('utf-8'))
print('STR ', str(signature))
print('REPR', repr(signature))
print()
if __name__ == '__main__':
name = 'javaWeb-bookManagementSystem'
if Path('javaWeb-bookManagementSystem').exists():
repo = pygit2.Repository(name)
else:
print('Clone repo...')
url = f'https://github.com/LJF2402901363/{name}.git'
repo = pygit2.clone_repository(url, name)
print('OK')
test_commit('7dc18ea6a765f778f136efa87c28eadef583ad60') # no encoding, defaults to utf-8, is utf-8 (OK)
test_commit('3e9d6b6f06d5abc25dd2a5b1b0f9fae10b09c20d') # claims GBK but it's UTF8
test_commit('4d47b509cdb3237cc5ea6841cfd707d7e55f8522') # claims GBK, msg is GBK, sigs are UTF-8 |
Good point. My thinking was that repr can be automatically invoked by debuggers and interpreters. In that context, a 'permissive' repr lets you know at a glance if there's anything salvageable in a signature (e.g. email, time) rather than erroring out completely. Either way, I don't feel strongly about raising an error or not, as long as we stop segfaulting ;) |
Back... Ok, agree with you. There is only a small regression, with the test script above (that I've updated). In master the output is:
With this PR:
The regression is in the representation for the first commit, the one that is correct. |
Thanks @jorio for another contribution |
Sorry @jdavid, I had been meaning to take a look at |
Please see issue #1205 for details about this bug.
The proposed fix avoids undefined behavior by never passing NULL to PyUnicode_FromFormat.