-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
email.generator.BytesGenerator corrupts data by changing line endings #63203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is a follow-up to bpo-16564. In that issue, BytesGenerator was changed to accept a bytes payload, however processing binary data that way leads to data corruption. Repost of the update I posted in bpo-16564: ~/build/Python-3.3.2$ ./python --version When modifying the test case in Lib/test/test_email/test_email.py like this: --- Lib/test/test_email/test_email.py 2013-05-15 18:32:55.000000000 +0200
+++ Lib/test/test_email/test_email_mine.py 2013-09-10 14:22:08.160089440 +0200
@@ -1461,17 +1461,17 @@
# Issue 16564: This does not produce an RFC valid message, since to be
# valid it should have a CTE of binary. But the below works in
# Python2, and is documented as working this way.
- bytesdata = b'\xfa\xfb\xfc\xfd\xfe\xff'
+ bytesdata = b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'
msg = MIMEApplication(bytesdata, _encoder=encoders.encode_noop)
# Treated as a string, this will be invalid code points.
- self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+ # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
self.assertEqual(msg.get_payload(decode=True), bytesdata)
s = BytesIO()
g = BytesGenerator(s)
g.flatten(msg)
wireform = s.getvalue()
msg2 = email.message_from_bytes(wireform)
- self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+ # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
self.assertEqual(msg2.get_payload(decode=True), bytesdata) then running: ./python ./Tools/scripts/run_tests.py test_email results in: ====================================================================== Traceback (most recent call last):
File "/localdisk/kruppaal/build/Python-3.3.2/Lib/test/test_email/test_email_mine.py", line 1475, in test_binary_body_with_encode_noop
self.assertEqual(msg2.get_payload(decode=True), bytesdata)
AssertionError: b'\x0b\n\xfa\xfb\xfc\xfd\xfe\xff' != b'\x0b\xfa\xfb\xfc\xfd\xfe\xff' The '\x0b' byte is incorrectly translated to '\x0b\n', i.e., a New Line character is inserted. Encoding the bytes array: results output data (MIME Header stripped): 0000000: 0001 0203 0405 0607 0809 0a0b 0a0c 0a0a ................ That is, a '\n' is inserted after '\x0b', '\x1c', '\x1d', and '\x1e', I suspect this is due to the use of self._write_lines(msg._payload) in BytesGenerator._handle_text(); since _write_lines() mangles line endings. |
Confirmed in Python 3.4.1. |
This patch added special behavior with MIMEApplication and may fix this issue. |
I can confirm this on 3.4.1 and is really annoying. But the patch should set '_is_raw_payload' to False if the payload is set via 'set_payload' (the operations in 'set_raw_payload' need to be switched). |
New changeset c0f5702e0f10 by R David Murray in branch '3.5': New changeset ccad4d142934 by R David Murray in branch 'default': |
I've fixed this to the extent that it is possible without adding support for the 'binary' CTE. That is, \r, \n, and \r\n are still replaced with the 'correct' line ending characters, which is the correct behavior under the RFCs even for binary data if the CTE is not 'binary'. bpo-18886 covers the enhancement of supporting the binary CTE. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: