Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.parser header-only parsing records MultipartInvariantViolationDefect for valid multipart emails #106186

Open
me-and opened this issue Jun 28, 2023 · 3 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@me-and
Copy link

me-and commented Jun 28, 2023

Bug report

A valid multipart email message, when parsed with email.parser.HeaderParser(policy=email.policy.default) will record a email.errors.MultipartInvariantViolationDefect.

If the parser isn't going to attempt to parse the message body, it shouldn't report that as a defect.

Simple test script:

import email.parser
import email.policy

email_str = '''\
Date: 01 Jan 2001 00:01+0000
From: arthur@example.example
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=autocracy

--autocracy
Content-Type: text/plain

By hanging on to outdated imperialist dogma which perpetuates the economic and
social differences in our society.

--autocracy
Content-Type: text/html

<html><body><p>By hanging on to outdated imperialist dogma which perpetuates
the economic and social differences in our society.</p></body></html>

--autocracy--
'''

full_parser = email.parser.Parser(policy=email.policy.default)
parsed_email_full = full_parser.parsestr(email_str)
print(parsed_email_full.defects)  # Prints [] as expected

header_parser = email.parser.HeaderParser(policy=email.policy.default)
parsed_email_headers_only = header_parser.parsestr(email_str)
print(parsed_email_headers_only.defects)  # Prints [MultipartInvariantViolationDefect()]

Your environment

  • Debian 12
  • Raspberry Pi 4 (arm64)
  • Python 3.11.2 (Debian package 3.11.2-1+b1)

Linked PRs

@me-and me-and added the type-bug An unexpected behavior, bug, or error label Jun 28, 2023
@htsedebenham
Copy link
Contributor

I believe this is the expected behaviour. Per the documentation, HeaderParser acts like Parser with headersonly=True. Modifying the test script as follows, the printed value is [MultipartInvariantViolationDefect()].

email_str = '''\
Date: 01 Jan 2001 00:01+0000
From: arthur@example.example
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=autocracy

--autocracy
Content-Type: text/plain

By hanging on to outdated imperialist dogma which perpetuates the economic and
social differences in our society.

--autocracy
Content-Type: text/html

<html><body><p>By hanging on to outdated imperialist dogma which perpetuates
the economic and social differences in our society.</p></body></html>

--autocracy--
'''

full_parser = email.parser.Parser(policy=email.policy.default)
parsed_email_full = full_parser.parsestr(email_str)
print(parsed_email_full.defects)  # Prints [] as reported

full_parser = email.parser.Parser(policy=email.policy.default)
parsed_email_full = full_parser.parsestr(email_str, headersonly=True)
print(parsed_email_full.defects)  # Prints[MultipartInvariantViolationDefect()]

header_parser = email.parser.HeaderParser(policy=email.policy.default)
parsed_email_headers_only = header_parser.parsestr(email_str)
print(parsed_email_headers_only.defects)  # Prints [MultipartInvariantViolationDefect()]

@htsedebenham
Copy link
Contributor

I see the issue, looking into it now.

@ambv
Copy link
Contributor

ambv commented Jul 23, 2023

Per documentation of Parser.parse():

Optional headersonly is a flag specifying whether to stop parsing after reading the headers or not. The default is False, meaning it parses the entire contents of the file.

From this reading, the issue is valid and the fix in the attached PR is the correct bugfix.

ambv pushed a commit that referenced this issue Jul 23, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 23, 2023
…alid multipart emails when parsing header only (pythonGH-107016)

(cherry picked from commit c65592c)

Co-authored-by: htsedebenham <31847376+htsedebenham@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 23, 2023
…alid multipart emails when parsing header only (pythonGH-107016)

(cherry picked from commit c65592c)

Co-authored-by: htsedebenham <31847376+htsedebenham@users.noreply.github.com>
ambv pushed a commit that referenced this issue Jul 23, 2023
…valid multipart emails when parsing header only (GH-107016) (#107111)

(cherry picked from commit c65592c)

Co-authored-by: htsedebenham <31847376+htsedebenham@users.noreply.github.com>
ambv pushed a commit that referenced this issue Jul 23, 2023
…valid multipart emails when parsing header only (GH-107016) (#107112)

(cherry picked from commit c65592c)

Co-authored-by: htsedebenham <31847376+htsedebenham@users.noreply.github.com>
jtcave pushed a commit to jtcave/cpython that referenced this issue Jul 23, 2023
mementum pushed a commit to mementum/cpython that referenced this issue Jul 23, 2023
@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants
@ambv @iritkatriel @me-and @htsedebenham and others