New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD5SumTests.test_checksum_fodder fails on Windows #89216
Comments
While working on #28060 we've noticed that
For now it is ignored. Related issue: https://bugs.python.org/issue45042 |
I would love to work on this issue :) |
Test is failing because TESTFN contains now non-ASCII characters. The path is written to stdout using the default stdout encoding on Windows (like cp1252), but test searches the path encoded with UTF-8. This test should fail also on other platforms with non-UTF-8 locale. The simplest way to "fix" the test is using TESTFN_ASCII instead of TESTFN. But there is also an issue in the script itself. It fails or produces a mojibake when the filesystem encoding and the stdout encoding do not match. There are similar issues in other scripts which output file names. |
I don't know Tools/scripts/md5sum.py. Can you show an example which currently fails? |
$ touch тест
$ ./python Tools/scripts/md5sum.py тест
d41d8cd98f00b204e9800998ecf8427e тест
$ LC_ALL=uk_UA.koi8u PYTHONIOENCODING=koi8-u ./python Tools/scripts/md5sum.py тест
d41d8cd98f00b204e9800998ecf8427e тест
$ LC_ALL=uk_UA.koi8u PYTHONIOENCODING=utf-8 ./python Tools/scripts/md5sum.py тест
d41d8cd98f00b204e9800998ecf8427e я┌п╣я│я┌
$ PYTHONIOENCODING=koi8-u ./python Tools/scripts/md5sum.py тест
d41d8cd98f00b204e9800998ecf8427e ����
$ PYTHONIOENCODING=latin-1 ./python Tools/scripts/md5sum.py тест
Traceback (most recent call last):
File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 93, in <module>
sys.exit(main(sys.argv[1:], sys.stdout))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 90, in main
return sum(args, out)
^^^^^^^^^^^^^^
File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 39, in sum
sts = printsum(f, out) or sts
^^^^^^^^^^^^^^^^
File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 53, in printsum
sts = printsumfp(fp, filename, out)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/serhiy/py/cpython/Tools/scripts/md5sum.py", line 69, in printsumfp
out.write('%s %s\n' % (m.hexdigest(), filename))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 33-36: ordinal not in range(256) |
Yes, it was encodings problem :) This line solved it (here: cpython/Tools/scripts/md5sum.py Line 69 in 6f8bc46
out.write('%s %s\n' % (m.hexdigest(), filename.encode(
sys.getfilesystemencoding(),
).decode(sys.stdout.encoding)))
I haven't changed this, because right now it should work for non-ASCII symbols as well. I can even add an explicit ASCII test if needed. Shouldn't #28060 be merge before I submit a new PR, so we can be sure that test now works? In the current state it will be just ignored. |
It will not work in all cases. For example if the stdio encoding is UTF-8 and the filesystem encoding is Latin1. Or the stdio encoding is CP1251 and the filesystem encoding is UTF-8. I am not also sure that it gives us the result which we want if it doesn't fail. It is a general and complex issue, and every program which writes file names to stdout is affected. For now I suggest just use TESTFN_ASCII instead of TESTFN. We will find better solution in future. I hesitate about merging PR 28060 because it can fail also on some non-Windows buildbots with uncommon locale settings. |
This was fixed in dd7b816ac87, perhaps this should be closed as fixed? It sounds like the general solution is beyond the scope of this issue and doesn't need to be tracked here. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: