Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-93158: Support obsolete email syntax, fieldnames that are followed by whitespace #93176

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
7 changes: 4 additions & 3 deletions Doc/library/email.policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -454,9 +454,10 @@ added matters. To illustrate::


The name is parsed as everything up to the '``:``' and returned
unmodified. The value is determined by stripping leading whitespace off
the remainder of the first line, joining all subsequent lines together,
and stripping any trailing carriage return or linefeed characters.
stripped of trailing whitespace. The value is determined by stripping
leading whitespace off the remainder of the first line, joining all
subsequent lines together, and stripping any trailing carriage
return or linefeed characters.


.. method:: header_store_parse(name, value)
Expand Down
10 changes: 5 additions & 5 deletions Lib/email/_policybase.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,15 +292,15 @@ def _sanitize_header(self, name, value):

def header_source_parse(self, sourcelines):
"""+
The name is parsed as everything up to the ':' and returned unmodified.
The value is determined by stripping leading whitespace off the
remainder of the first line, joining all subsequent lines together, and
stripping any trailing carriage return or linefeed characters.
The name is parsed as everything up to the ':' and returned stripped
of any trailing whitespace. The value is determined by stripping leading
whitespace off the remainder of the first line, joining all subsequent
lines together, and stripping any trailing carriage return or linefeed characters.

"""
name, value = sourcelines[0].split(':', 1)
value = value.lstrip(' \t') + ''.join(sourcelines[1:])
return (name, value.rstrip('\r\n'))
return (name.rstrip(' \t'), value.rstrip('\r\n'))

def header_store_parse(self, name, value):
"""+
Expand Down
2 changes: 1 addition & 1 deletion Lib/email/feedparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
NLCRE_crack = re.compile(r'(\r\n|\r|\n)')
# RFC 2822 $3.6.8 Optional fields. ftext is %d33-57 / %d59-126, Any character
# except controls, SP, and ":".
headerRE = re.compile(r'^(From |[\041-\071\073-\176]*:|[\t ])')
headerRE = re.compile(r'^(From |[\041-\071\073-\176]*[ \t]*:|[\t ])')
EMPTYSTRING = ''
NL = '\n'
boundaryendRE = re.compile(
Expand Down
7 changes: 7 additions & 0 deletions Lib/test/test_email/data/msg_48.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Subject: Regarding messages containing whitespace that follow field names
To: receiver@example.org
x-whitespace-after-fieldname : value
Date: Fri, 20 May 2022 18:13:19 +1200
From: sender@example.org

Field names can be followed by arbitrary whitespace
8 changes: 8 additions & 0 deletions Lib/test/test_email/test_email.py
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,14 @@ def test_get_param_funky_continuation_lines(self):
msg = self._msgobj('msg_22.txt')
self.assertEqual(msg.get_payload(1).get_param('name'), 'wibble.JPG')

def test_whitespace_after_fieldname(self):
# As part of obsolete email syntax, fieldnames can be followed by arbitrary whitespace
msg = self._msgobj("msg_48.txt")

self.assertEqual(msg["x-whitespace-after-fieldname"], "value")
self.assertEqual(msg.get_payload(),
"Field names can be followed by arbitrary whitespace\n")

# test_headerregistry.TestContentTypeHeader.semis_inside_quotes
def test_get_param_with_semis_in_quotes(self):
msg = email.message_from_string(
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
The :mod:`email` library now parses messages that use obsolete email syntax where
header field names can be followed by whitespace.