Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault in Signature.__repr__ & __str__ when the signature's encoding is incorrect or unknown (#1205) #1210

Merged
merged 2 commits into from
May 6, 2023

Conversation

jorio
Copy link
Contributor

@jorio jorio commented Apr 1, 2023

Please see issue #1205 for details about this bug.

The proposed fix avoids undefined behavior by never passing NULL to PyUnicode_FromFormat.

`to_unicode(..., ..., NULL)` is equivalent to `to_unicode(..., ..., "strict")`. It returns NULL if the string cannot be decoded to the desired encoding.

PyUnicode_FromFormat will crash if we pass NULL pointers to be formatted as %R or %U.

Instead, let's use `to_unicode(..., ..., "replace")` so that we always get a non-NULL object to pass to PyUnicode_FromFormat.

This ties in with libgit2#1155.
@jdavid
Copy link
Member

jdavid commented Apr 7, 2023

If instead of calling str/repr we access the name, then we get an exception:

>>> signature.name
[...]
UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 2: illegal multibyte sequence

A similar error is raised if trying to get the message:

>>> commit.message
[...]
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 22: illegal multibyte sequence

For coherence should not the same error be raised when calling str/repr?

@jdavid
Copy link
Member

jdavid commented Apr 7, 2023

For reference I wrote this test script:

from pathlib import Path
import pygit2


def test_commit(commit_id):
    commit = repo.get(commit_id)
    print(f'{commit_id} encoding={commit.message_encoding}')

    signature = commit.author

    try:
        print('mail', signature.email)
    except UnicodeDecodeError:
        print('raw ', signature.raw_mail)
        print('mail', signature.raw_mail.decode('utf-8'))

    try:
        print('name', signature.name)
    except UnicodeDecodeError:
        print('raw ', signature.raw_name)
        print('name', signature.raw_name.decode('utf-8'))

    try:
        print(commit.message)
    except UnicodeDecodeError:
        print(commit.raw_message)
        print(commit.raw_message.decode('utf-8'))

    print('STR ', str(signature))
    print('REPR', repr(signature))
    print()

if __name__ == '__main__':
    name = 'javaWeb-bookManagementSystem'
    if Path('javaWeb-bookManagementSystem').exists():
        repo = pygit2.Repository(name)
    else:
        print('Clone repo...')
        url = f'https://github.com/LJF2402901363/{name}.git'
        repo = pygit2.clone_repository(url, name)
        print('OK')

    test_commit('7dc18ea6a765f778f136efa87c28eadef583ad60')  # no encoding, defaults to utf-8, is utf-8 (OK)
    test_commit('3e9d6b6f06d5abc25dd2a5b1b0f9fae10b09c20d')  # claims GBK but it's UTF8
    test_commit('4d47b509cdb3237cc5ea6841cfd707d7e55f8522')  # claims GBK, msg is GBK, sigs are UTF-8

@jorio
Copy link
Contributor Author

jorio commented Apr 7, 2023

For coherence should not the same error be raised when calling str/repr?

Good point.

My thinking was that repr can be automatically invoked by debuggers and interpreters. In that context, a 'permissive' repr lets you know at a glance if there's anything salvageable in a signature (e.g. email, time) rather than erroring out completely.

Either way, I don't feel strongly about raising an error or not, as long as we stop segfaulting ;)

@jdavid
Copy link
Member

jdavid commented Apr 23, 2023

Back...

Ok, agree with you. There is only a small regression, with the test script above (that I've updated).

In master the output is:

$ python test_issue1205.py 
7dc18ea6a765f778f136efa87c28eadef583ad60 encoding=None
mail 58659173+LJF2402901363@users.noreply.github.com
name 陌意随影
Update pom.xml

修复了fastjson版本过低存在的漏洞问题。
STR  陌意随影 <58659173+LJF2402901363@users.noreply.github.com>
REPR pygit2.Signature('陌意随影', '58659173+LJF2402901363@users.noreply.github.com', 1641446748, 480, None)

3e9d6b6f06d5abc25dd2a5b1b0f9fae10b09c20d encoding=GBK
mail 2402901363@qq.com
raw  b'\xe4\xbf\xad\xe9\x94\x8b \xe6\xa2\x81'
name 俭锋 梁
b'\xe8\xa7\xa3\xe5\x86\xb3\xe4\xba\x86\xe5\x9c\xa8\xe6\xb7\xbb\xe5\x8a\xa0\xe8\xaf\xbb\xe8\x80\x85\xe5\x92\x8c\xe6\x9b\xb4\xe6\x96\xb0\xe7\x94\xa8\xe6\x88\xb7\xe6\x97\xb6\xe5\x8f\x91\xe7\x94\x9f\xe7\xa9\xba\xe6\x8c\x87\xe9\x92\x88\xe5\xbc\x82\xe5\xb8\xb8\xe7\x9a\x84\xe6\x83\x85\xe5\x86\xb5\n'
解决了在添加读者和更新用户时发生空指针异常的情况

Segmentation fault (core dumped)

With this PR:

$ python test_issue1205.py
7dc18ea6a765f778f136efa87c28eadef583ad60 encoding=None
mail 58659173+LJF2402901363@users.noreply.github.com
name 陌意随影
Update pom.xml

修复了fastjson版本过低存在的漏洞问题。
STR  陌意随影 <58659173+LJF2402901363@users.noreply.github.com>
REPR pygit2.Signature('陌意随影', '58659173+LJF2402901363@users.noreply.github.com', 1641446748, 480, 'None')

3e9d6b6f06d5abc25dd2a5b1b0f9fae10b09c20d encoding=GBK
mail 2402901363@qq.com
raw  b'\xe4\xbf\xad\xe9\x94\x8b \xe6\xa2\x81'
name 俭锋 梁
b'\xe8\xa7\xa3\xe5\x86\xb3\xe4\xba\x86\xe5\x9c\xa8\xe6\xb7\xbb\xe5\x8a\xa0\xe8\xaf\xbb\xe8\x80\x85\xe5\x92\x8c\xe6\x9b\xb4\xe6\x96\xb0\xe7\x94\xa8\xe6\x88\xb7\xe6\x97\xb6\xe5\x8f\x91\xe7\x94\x9f\xe7\xa9\xba\xe6\x8c\x87\xe9\x92\x88\xe5\xbc\x82\xe5\xb8\xb8\xe7\x9a\x84\xe6\x83\x85\xe5\x86\xb5\n'
解决了在添加读者和更新用户时发生空指针异常的情况

STR  淇�閿� 姊� <2402901363@qq.com>
REPR pygit2.Signature('淇�閿� 姊�', '2402901363@qq.com', 1608521800, 480, 'GBK')

4d47b509cdb3237cc5ea6841cfd707d7e55f8522 encoding=GBK
mail 2402901363@qq.com
raw  b'\xe4\xbf\xad\xe9\x94\x8b \xe6\xa2\x81'
name 俭锋 梁
更新了启动项目页面直接跳转到首页不用再自动输入对应的网址。

STR  淇�閿� 姊� <2402901363@qq.com>
REPR pygit2.Signature('淇�閿� 姊�', '2402901363@qq.com', 1603540603, 480, 'GBK')

The regression is in the representation for the first commit, the one that is correct.
The encoding appears as 'None' instead of None.

@jdavid jdavid merged commit 7768099 into libgit2:master May 6, 2023
@jdavid
Copy link
Member

jdavid commented May 6, 2023

Thanks @jorio for another contribution

@jorio
Copy link
Contributor Author

jorio commented May 7, 2023

Sorry @jdavid, I had been meaning to take a look at 'None' but I haven't been able to make time for it before you fixed it up. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants