Email header folded incorrectly #80701

JonathanHorn · 2019-04-03T23:15:00Z

BPO	36520
Nosy	@warsaw, @bitdancer, @miss-islington, @websurfer5, @iritkatriel
PRs	bpo-36520: Email header folded incorrectly #13608 [3.6] bpo-36520: Email header folded incorrectly (GH-13608) #13610 [3.8] bpo-36520: Email header folded incorrectly (GH-13608) #13909 [3.7] bpo-36520: Email header folded incorrectly (GH-13608) #13910
Files	bpo-36520-test.py: UTF-8 header encoding test cases

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-11-20.15:09:53.670>
created_at = <Date 2019-04-03.23:15:00.468>
labels = ['type-bug', '3.7', 'expert-email']
title = 'Email header folded incorrectly'
updated_at = <Date 2020-11-20.15:09:53.669>
user = 'https://bugs.python.org/JonathanHorn'

bugs.python.org fields:

activity = <Date 2020-11-20.15:09:53.669>
actor = 'iritkatriel'
assignee = 'none'
closed = True
closed_date = <Date 2020-11-20.15:09:53.670>
closer = 'iritkatriel'
components = ['email']
creation = <Date 2019-04-03.23:15:00.468>
creator = 'Jonathan Horn'
dependencies = []
files = ['48366']
hgrepos = []
issue_num = 36520
keywords = ['patch']
message_count = 10.0
messages = ['339419', '343267', '343268', '343606', '343612', '343730', '344863', '345287', '345288', '378380']
nosy_count = 6.0
nosy_names = ['barry', 'r.david.murray', 'miss-islington', 'Jonathan Horn', 'Jeffrey.Kintscher', 'iritkatriel']
pr_nums = ['13608', '13610', '13909', '13910']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue36520'
versions = ['Python 3.7']

JonathanHorn · 2019-04-03T23:15:00Z

I encountered a problem with replacing the 'Subject' header of an email. After serializing it again, the utf8 encoding was wrong. It seems to be occurring when folding the internal header objects.

Example:

> email.policy.default.fold_binary('Subject', email.policy.default.header_store_parse('Subject', 'Hello Wörld! Hello Wörld! Hello Wörld! Hello Wörld!Hello Wörld!')[1])
Expected output: b'Subject: Hello =?utf-8?q?W=C3=B6rld!_Hello_W=C3=B6rld!_Hello_W=C3=B6rld!?=\n Hello =?utf-8?q?W=C3=B6rld!Hello_W=C3=B6rld!?=\n' (or similar)
Actual output: b'Subject: Hello =?utf-8?q?W=C3=B6rld!_Hello_W=C3=B6rld!_Hello_W=C3=B6rld!?=\n Hello =?utf-8?=?utf-8?q?q=3FW=3DC3=3DB6rld!Hello=3F=3D_W=C3=B6rld!?=\n'

I'm running Python 3.7.3 on Arch Linux using Linux 5.0.

bitdancer · 2019-05-23T01:34:59Z

Can you demonstrate the problem with an actual email object? header_store_parse is not meant to be called directly.

bitdancer · 2019-05-23T01:39:58Z

Nevermind, I was testing with the wrong version of python. This bug was introduced somewhere after 3.4 :(

>>> from email.message import EmailMessage
>>> m = EmailMessage()
>>> m['Subject'] = 'Hello Wörld! Hello Wörld! Hello Wörld! Hello Wörld!Hello Wörld!'
>>> bytes(m)
b'Subject: Hello =?utf-8?q?W=C3=B6rld!_Hello_W=C3=B6rld!_Hello_W=C3=B6rld!?=\n Hello =?utf-8?=?utf-8?q?q=3FW=3DC3=3DB6rld!Hello=3F=3D_W=C3=B6rld!?=\n\n'

websurfer5 · 2019-05-27T03:43:45Z

To aid with debugging the code, the Subject line can be simplified:

>>> from email.message import EmailMessage
>>> m = EmailMessage()
>>> m['Subject'] = 'Hello =?utf-8?q?W=C3=B6rld!_Hello_W=C3=B6rld!_Hello_W=C3=B6rld!?= Hello Wörld!Hello Wörld!'
>>> print(bytes(m))
b'Subject: Hello =?utf-8?q?W=C3=B6rld!_Hello_W=C3=B6rld!_Hello_W=C3=B6rld!?=\n Hello =?utf-8?=?utf-8?q?q=3FW=3DC3=3DB6rld!Hello=3F=3D_W=C3=B6rld!?=\n\n'

websurfer5 · 2019-05-27T10:43:16Z

I uploaded a test script with some test cases:

The failure mode occurs when

line folding occurs
the first folded line has two or more words with UTF-8 characters
subsequent lines contain a word with UTF-8 characters located at a different offset than the last encoded substring in the first line

For example, the first folded and encoded line of 'Hello Wörld! Hello Wörld! Hello Wörld! Hello Wörld!Hello Wörld!' is

b'Subject: Hello =?utf-8?q?W=C3=B6rld!_Hello_W=C3=B6rld!_Hello_W=C3=B6rld!?='

and the second line should be

b' Hello =?utf-8?q?W=C3=B6rld!Hello_W=C3=B6rld!?='

but instead, it is

b' Hello =?utf-8?=?utf-8?q?q=3FW=3DC3=3DB6rld!Hello=3F=3D_W=C3=B6rld!?='

The function at fault is _refold_parse_tree() in Lib/email/_header_value_parser.py. In the first line, it encodes the first UTF-8 word and saves the starting offset in the output string (15). When it encounters the second UTF-8 word, it re-encodes the entire string starting at the saved offset. This is to help reduce the bloat added by multiple '=?utf-8?q?' start-of-encoding tokens. When it encodes the first UTF-8 word on the second line, it tries to store it at the saved offset into the second line output string, but that is past the end of the string so it just gets appended. When it encounter the second UTF-8 word in the second line, it re-encodes the entire second-line string starting at the saved offset (15), which is in the middle of the first encoded UTF-8 string.

The failure mode is not triggered if there is at most one UTF-8 word in each folded line. It also is not triggered when folding occurs in the middle of a word instead of at whitespace because the code follows a different path.

The solution is to set the saved starting offset to None when starting a new folded line when the fold-point is whitespace.

I will submit a pull request soon with a fix.

websurfer5 · 2019-05-28T04:49:19Z

The pull request has been submitted with both the code fix and tests.

warsaw · 2019-06-06T19:53:50Z

New changeset f6713e8 by Barry Warsaw (websurfer5) in branch 'master':
bpo-36520: Email header folded incorrectly (bpo-13608)
f6713e8

miss-islington · 2019-06-11T23:27:16Z

New changeset 0745cc6 by Miss Islington (bot) (Abhilash Raj) in branch '3.7':
[3.7] bpo-36520: Email header folded incorrectly (GH-13608) (GH-13910)
0745cc6

miss-islington · 2019-06-11T23:28:18Z

New changeset 36eea7a by Miss Islington (bot) (Abhilash Raj) in branch '3.8':
[3.8] bpo-36520: Email header folded incorrectly (GH-13608) (GH-13909)
36eea7a

iritkatriel · 2020-10-10T10:37:54Z

This seems complete, can it be closed?

JonathanHorn mannequin added 3.7 (EOL) end of life topic-email type-bug An unexpected behavior, bug, or error labels Apr 3, 2019

iritkatriel closed this as completed Nov 20, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Email header folded incorrectly #80701

Email header folded incorrectly #80701

JonathanHorn mannequin commented Apr 3, 2019

JonathanHorn mannequin commented Apr 3, 2019

bitdancer commented May 23, 2019

bitdancer commented May 23, 2019

websurfer5 mannequin commented May 27, 2019

websurfer5 mannequin commented May 27, 2019

websurfer5 mannequin commented May 28, 2019

warsaw commented Jun 6, 2019

miss-islington commented Jun 11, 2019

miss-islington commented Jun 11, 2019

iritkatriel commented Oct 10, 2020

Email header folded incorrectly #80701

Email header folded incorrectly #80701

Comments

JonathanHorn mannequin commented Apr 3, 2019

JonathanHorn mannequin commented Apr 3, 2019

bitdancer commented May 23, 2019

bitdancer commented May 23, 2019

websurfer5 mannequin commented May 27, 2019

websurfer5 mannequin commented May 27, 2019

websurfer5 mannequin commented May 28, 2019

warsaw commented Jun 6, 2019

miss-islington commented Jun 11, 2019

miss-islington commented Jun 11, 2019

iritkatriel commented Oct 10, 2020