Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double dots in quopri transported emails #88088

Closed
Julien00859 mannequin opened this issue Apr 23, 2021 · 6 comments
Closed

Double dots in quopri transported emails #88088

Julien00859 mannequin opened this issue Apr 23, 2021 · 6 comments
Labels
3.11 only security fixes topic-email type-feature A feature request or enhancement

Comments

@Julien00859
Copy link
Mannequin

Julien00859 mannequin commented Apr 23, 2021

BPO 43922
Nosy @warsaw, @bitdancer, @Julien00859
PRs
  • bpo-43922: escape email line starting with a dot #25562
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-04-30.13:18:23.508>
    created_at = <Date 2021-04-23.14:36:50.407>
    labels = ['type-feature', 'expert-email', '3.11']
    title = 'Double dots in quopri transported emails'
    updated_at = <Date 2021-04-30.13:18:23.507>
    user = 'https://github.com/Julien00859'

    bugs.python.org fields:

    activity = <Date 2021-04-30.13:18:23.507>
    actor = 'Julien Castiaux'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-04-30.13:18:23.508>
    closer = 'Julien Castiaux'
    components = ['email']
    creation = <Date 2021-04-23.14:36:50.407>
    creator = 'Julien Castiaux'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43922
    keywords = ['patch']
    message_count = 6.0
    messages = ['391698', '391707', '391718', '391726', '392319', '392428']
    nosy_count = 4.0
    nosy_names = ['barry', 'r.david.murray', 'Julien Castiaux', 'jev2']
    pr_nums = ['25562']
    priority = 'normal'
    resolution = 'third party'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue43922'
    versions = ['Python 3.11']

    @Julien00859
    Copy link
    Mannequin Author

    Julien00859 mannequin commented Apr 23, 2021

    Hello,

    We received multiple bug reports about broken links in rich html emails. Sometime, in some emails, a link like <a href="https://example.com"\> would become <a href="https://example..com\>, notice the double dot.

    After multiple researches both in the Python email source code and in the RFC, it turns out that Python correctly implements the standard but that the distant (non-python) smtp server used by some of our customers doesn't.

    The various email standard state the following:

    1. As a single dot (".", chr(0x2e)) in a line ends the SMTP transmission, such single dots must be escaped when they are part of the message. RFC 5321, section 4.5.2 requires to escape all dots when they appear at the beginning of a line, using a dot as escape symbol. That is, when the user message contains: "\r\n.\r\n", it is escaped to "\r\n..\r\n". The other smtp side is responsible to remove the extra dot.

    2. When we transport the email body using the quoted-printable encoding, RFC 2045 requires each line to have maximum 78 characters and define a single equal ("=", chr(0x3d)) as soft-warp sequence to fold lines too long. The RFC does only require to split the line outside of a quoted character (cannot split in the middle of "=2E"). Like any other character, it is allowed to split the line before a dot.

    Take the following example:

        from email.message import EmailMessage
        from email.policy import SMTP
    
        msg = EmailMessage(policy=SMTP)
        msg.set_context("Hello there, just need some text to reach that seventy-six character, example.com")
        #                                                                                             ^
        #                                                                                         78th char
    
        print(msg.as_string())
        # Content-Type: text/plain; charset="utf-8"
        # Content-Transfer-Encoding: quoted-printable
        # MIME-Version: 1.0
        #
        # Hello there, just need some text to reach that seventy-six character, example=
        # .com

    When the message is sent over smtp, smtplib escapes the line ".com" to become "..com" as required by the RFC. So no problem in the python implementation, it is the other side that is buggy.

    But! We have two solutions to "fix" the other side, the problem is that they do not correctly parse lines starting with a dot. A solution would be to ensure no line starts with the dot character. Two solutions : (1) quoted-printable encode dots when they are at the beginning of a line, (2) prevent the line folding code from splitting a line before a dot.

    (1) is allowed by the RFC, any character can be quoted-printable encoded even those that have a safe ascii representation already. In our "example=\n.com" example above, we can qp the code: "example=\n=2Ecom". The line starts with a "2" instead of a dot and the content is the same.

    (2) is allowed by the RFC, the RFC only states that a line must be at most 78 chars long, it also states it is allowed to fold a line anywhere but in a quoted-printable sequence. It is safe to split a line earlier than the 78th character. In our "example=\n.com" example above, we could split the line at the 77th character: "exampl=\ne.com". The line starts with a "e" instead of a dot and the content is the same.

    A pull request is coming shortly.

    @Julien00859 Julien00859 mannequin added 3.11 only security fixes topic-email type-feature A feature request or enhancement labels Apr 23, 2021
    @bitdancer
    Copy link
    Member

    Since python is doing the right thing here, I don't see a particularly good reason to put a hack into the stdlib to fix the failure of third party software to adhere to standards. (On the output side. We do follow Postel's rule on input and try hard to handle broken but recoverable input.) I don't actually *object* to it, though, as long as it follows the standard on output, and is a *simple* change.

    Please note that you can fix this locally by implementing and using a custom content manager.

    @Julien00859
    Copy link
    Mannequin Author

    Julien00859 mannequin commented Apr 23, 2021

    Hello David, thank you for your quick answer. I tried to keep it minimal with some unittests. Could you point me some resources to learn how to properly write a custom content manager ?

    @bitdancer
    Copy link
    Member

    As far as I know the only resources are the context manager docs and the source code. The stdlib content manager can serve as a model. I have to admit that it was long enough ago that I wrote that code that I'd have to re-read the docs and code myself to figure it out :)

    I'm afraid I don't really have time to do a complete review, but at a quick glance your patch doesn't look too complicated to me. Quick observation: the comment should explain why the dot check is done, and that it isn't needed for rfc compliance.

    @Julien00859
    Copy link
    Mannequin Author

    Julien00859 mannequin commented Apr 29, 2021

    Hello David,

    The third party smtp software that causes troubles have been identified ! We are still investigating how to fix the problem at its root, ultimately this "fix" would not even be necessary. I'll keep you informed, just don't review or close the PR yet.

    Regards,

    @Julien00859
    Copy link
    Mannequin Author

    Julien00859 mannequin commented Apr 30, 2021

    Fix deployed in the third party

    @Julien00859 Julien00859 mannequin closed this as completed Apr 30, 2021
    @Julien00859 Julien00859 mannequin closed this as completed Apr 30, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes topic-email type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant