Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD5SumTests.test_checksum_fodder fails on Windows #89216

Open
sobolevn opened this issue Aug 30, 2021 · 8 comments
Open

MD5SumTests.test_checksum_fodder fails on Windows #89216

sobolevn opened this issue Aug 30, 2021 · 8 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@sobolevn
Copy link
Member

BPO 45053
Nosy @vstinner, @ezio-melotti, @serhiy-storchaka, @sobolevn, @akulakov

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-08-30.18:22:59.032>
labels = ['type-bug', '3.9', '3.10', '3.11', 'tests', 'expert-unicode']
title = 'MD5SumTests.test_checksum_fodder fails on Windows'
updated_at = <Date 2021-11-10.19:51:57.602>
user = 'https://github.com/sobolevn'

bugs.python.org fields:

activity = <Date 2021-11-10.19:51:57.602>
actor = 'andrei.avk'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Tests', 'Unicode']
creation = <Date 2021-08-30.18:22:59.032>
creator = 'sobolevn'
dependencies = []
files = []
hgrepos = []
issue_num = 45053
keywords = []
message_count = 8.0
messages = ['400648', '400649', '400656', '400785', '400812', '401016', '401038', '406129']
nosy_count = 5.0
nosy_names = ['vstinner', 'ezio.melotti', 'serhiy.storchaka', 'sobolevn', 'andrei.avk']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue45053'
versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

@sobolevn
Copy link
Member Author

While working on #28060 we've noticed that test.test_tools.test_md5sum.MD5SumTests.test_checksum_fodder fails on Windows:

======================================================================
FAIL: test_checksum_fodder (test.test_tools.test_md5sum.MD5SumTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\lib\test\test_tools\test_md5sum.py", line 41, in test_checksum_fodder
    self.assertIn(part.encode(), out)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: b'@test_1772_tmp\xc3\xa6' not found in b'd38dae2eb1ab346a292ef6850f9e1a0d @test_1772_tmp\xe6\\md5sum.fodder\r\n'

For now it is ignored.

Related issue: https://bugs.python.org/issue45042

@sobolevn sobolevn added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error labels Aug 30, 2021
@sobolevn
Copy link
Member Author

I would love to work on this issue :)

@serhiy-storchaka
Copy link
Member

Test is failing because TESTFN contains now non-ASCII characters.

The path is written to stdout using the default stdout encoding on Windows (like cp1252), but test searches the path encoded with UTF-8. This test should fail also on other platforms with non-UTF-8 locale.

The simplest way to "fix" the test is using TESTFN_ASCII instead of TESTFN.

But there is also an issue in the script itself. It fails or produces a mojibake when the filesystem encoding and the stdout encoding do not match. There are similar issues in other scripts which output file names.

@vstinner
Copy link
Member

But there is also an issue in the script itself. It fails or produces a mojibake when the filesystem encoding and the stdout encoding do not match.

I don't know Tools/scripts/md5sum.py. Can you show an example which currently fails?

@serhiy-storchaka
Copy link
Member

$ touch тест
$ ./python Tools/scripts/md5sum.py тест
d41d8cd98f00b204e9800998ecf8427e тест
$ LC_ALL=uk_UA.koi8u PYTHONIOENCODING=koi8-u ./python Tools/scripts/md5sum.py тест
d41d8cd98f00b204e9800998ecf8427e тест
$ LC_ALL=uk_UA.koi8u PYTHONIOENCODING=utf-8 ./python Tools/scripts/md5sum.py тест
d41d8cd98f00b204e9800998ecf8427e я┌п╣я│я┌
$ PYTHONIOENCODING=koi8-u ./python Tools/scripts/md5sum.py тест
d41d8cd98f00b204e9800998ecf8427e ����
$ PYTHONIOENCODING=latin-1 ./python Tools/scripts/md5sum.py тест
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 93, in <module>
    sys.exit(main(sys.argv[1:], sys.stdout))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 90, in main
    return sum(args, out)
           ^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 39, in sum
    sts = printsum(f, out) or sts
          ^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 53, in printsum
    sts = printsumfp(fp, filename, out)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 69, in printsumfp
    out.write('%s %s\n' % (m.hexdigest(), filename))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 33-36: ordinal not in range(256)

@sobolevn
Copy link
Member Author

sobolevn commented Sep 3, 2021

Yes, it was encodings problem :)

This line solved it (here:

out.write('%s %s\n' % (m.hexdigest(), filename))
):

out.write('%s %s\n' % (m.hexdigest(), filename.encode(
        sys.getfilesystemencoding(),
    ).decode(sys.stdout.encoding)))

The simplest way to "fix" the test is using TESTFN_ASCII instead of TESTFN.

I haven't changed this, because right now it should work for non-ASCII symbols as well. I can even add an explicit ASCII test if needed.

Shouldn't #28060 be merge before I submit a new PR, so we can be sure that test now works? In the current state it will be just ignored.

@serhiy-storchaka
Copy link
Member

It will not work in all cases. For example if the stdio encoding is UTF-8 and the filesystem encoding is Latin1. Or the stdio encoding is CP1251 and the filesystem encoding is UTF-8. I am not also sure that it gives us the result which we want if it doesn't fail.

It is a general and complex issue, and every program which writes file names to stdout is affected.

For now I suggest just use TESTFN_ASCII instead of TESTFN. We will find better solution in future. I hesitate about merging PR 28060 because it can fail also on some non-Windows buildbots with uncommon locale settings.

@akulakov
Copy link
Contributor

This was fixed in dd7b816ac87, perhaps this should be closed as fixed?

It sounds like the general solution is beyond the scope of this issue and doesn't need to be tracked here.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants