Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str serialization of Message object may mutate the payload and CTE. #63971

Closed
bitdancer opened this issue Nov 25, 2013 · 12 comments
Closed

str serialization of Message object may mutate the payload and CTE. #63971

bitdancer opened this issue Nov 25, 2013 · 12 comments
Labels
easy stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error

Comments

@bitdancer
Copy link
Member

BPO 19772
Nosy @warsaw, @bitdancer, @vajrasky
Files
  • fix_serialization_message_object_mutation_v1.patch
  • fix_serialization_message_object_mutation_v2.patch
  • fix_serialization_message_object_mutation_v3.patch
  • fix_serialization_message_object_mutation_v4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-02-08.17:10:30.720>
    created_at = <Date 2013-11-25.16:14:11.223>
    labels = ['easy', 'type-bug', 'library', 'expert-email']
    title = 'str serialization of Message object may mutate the payload and CTE.'
    updated_at = <Date 2014-02-10.03:11:31.230>
    user = 'https://github.com/bitdancer'

    bugs.python.org fields:

    activity = <Date 2014-02-10.03:11:31.230>
    actor = 'vajrasky'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-02-08.17:10:30.720>
    closer = 'r.david.murray'
    components = ['Library (Lib)', 'email']
    creation = <Date 2013-11-25.16:14:11.223>
    creator = 'r.david.murray'
    dependencies = []
    files = ['32865', '32866', '32867', '33987']
    hgrepos = []
    issue_num = 19772
    keywords = ['patch', 'easy']
    message_count = 12.0
    messages = ['204356', '204564', '204565', '204567', '210534', '210586', '210593', '210647', '210661', '210665', '210666', '210797']
    nosy_count = 4.0
    nosy_names = ['barry', 'r.david.murray', 'python-dev', 'vajrasky']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue19772'
    versions = ['Python 3.3', 'Python 3.4']

    @bitdancer
    Copy link
    Member Author

    Currently the string generator will downcast 8bit body parts to 7bit by encoding them using a 7bit CTE. This is good. However, the way it does it is to modify the Message object accordingly. This is not good. Except for filling in required missing bits of data (such as MIME borders), serializing a Message should not mutate it. (And, for the curious, I am the one who made this mistake when I wrote the code :)

    @bitdancer bitdancer added stdlib Python modules in the Lib dir topic-email easy type-bug An unexpected behavior, bug, or error labels Nov 25, 2013
    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Nov 27, 2013

    I come out with two patches.

    The first patch using bytes generator to copy the message before mutating it in string generator, then restore it after printing the mutated message.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Nov 27, 2013

    The second patch copies the private variables of the message before mutating it in string generator, then restore the private variables after printing the mutated message.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Nov 27, 2013

    I got another inspiration.

    The third patch uses copy.copy to copy the Message before mutating it in string generator, then restore it after printing the mutated message.

    @bitdancer
    Copy link
    Member Author

    While the copy solution seems reasonable, unfortunately it looks like the solution isn't that simple.

    Your test method didn't start with 'test_', so it didn't get run. Your fix doesn't fix the problem, since other parts of the code are holding on to a pointer to the original message object. But even if I change your fix to only make a local copy and use it to get the transformed payload, that fix breaks other tests, since the *headers* aren't changed in the output, even if the payload is. This is because of the odd way the generator accumulates its output, and I unfortunately don't have time to remember how it all works in order to craft a fix before the 3.4 alpha :(.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Feb 8, 2014

    Your test method didn't start with 'test_', so it didn't get run.

    Ah sorry about that. This is the updated patch. It fixed the problem and passed all test.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Feb 8, 2014

    Actually, I am thinking of this approach (cloning the message just after entering the flatten method):

    diff -r b541ecd32115 Lib/email/generator.py
    --- a/Lib/email/generator.py	Fri Feb 07 16:11:17 2014 -0800
    +++ b/Lib/email/generator.py	Sat Feb 08 15:55:01 2014 +0800
    @@ -67,7 +67,7 @@
             # Just delegate to the file object
             self._fp.write(s)
     
    -    def flatten(self, msg, unixfrom=False, linesep=None):
    +    def flatten(self, msg_obj, unixfrom=False, linesep=None):
             r"""Print the message object tree rooted at msg to the output file
             specified when the Generator instance was created.
     
    @@ -86,6 +86,8 @@
             # from the msg, and _encoded_XXX constants for operating on data that
             # has already been converted (to bytes in the BytesGenerator) and
             # inserted into a temporary buffer.
    +        import copy
    +        msg = copy.copy(msg_obj)
             policy = msg.policy if self.policy is None else self.policy
             if linesep is not None:
                 policy = policy.clone(linesep=linesep)

    It works for this bug but apparently it fails one test.

    ======================================================================
    FAIL: test_make_boundary (main.TestMessageAPI)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "Lib/test/test_email/test_email.py", line 199, in test_make_boundary
        'multipart/form-data; boundary="==')
    AssertionError: 'multipart/form-data' != 'multipart/form-data; boundary="=='
    - multipart/form-data
    + multipart/form-data; boundary="==

    So somehow, according to the ultimate design of email library, msg can not leave flatten method unscratched. :)

    @bitdancer
    Copy link
    Member Author

    I'm guessing the problem is that copy is a shallow copy, and message objects can be deeply nested. Doing a deepcopy might work, but as things are currently, that could cost a lot of memory, so I don't want to do that.

    @bitdancer
    Copy link
    Member Author

    Ah, it is because the boundary computation intentionally modifies the message object. Which is a questionable design decision overall, but is now set in stone :(.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 8, 2014

    New changeset 34fb36972f8d by R David Murray in branch '3.3':
    bpo-19772: Do not mutate message when downcoding to 7bit.
    http://hg.python.org/cpython/rev/34fb36972f8d

    New changeset 2e97d3500970 by R David Murray in branch 'default':
    Merge bpo-19772: Do not mutate message when downcoding to 7bit.
    http://hg.python.org/cpython/rev/2e97d3500970

    @bitdancer
    Copy link
    Member Author

    Vajrasky: thanks for your work on this, it helped me find a reasonable (if ugly) solution. However, the 'if' statement in your patch that checks specifically for the combination of old and new cte of 8bit and base64 puzzles me. The same problem occurs here if the CTE we end up with is quoted-printable, which will be the case, for example, for latin-1. The logical way to think abut the problem here (we don't want to modify the message) would be "if the CTE changed, we need to restore it". So why check specifically for 8bit and base64? The straightforward translation of "if the CTE changed, we need to restore it" would have been: 'if self.orig_cte != cte'.

    In my solution I preferred to leave the original object unchanged, and re-copy it to re-change the headers when needed. This is ugly, but it is more future proof against the down-coding code someday changing other headers or other characteristics of the message. And I needed to use deepcopy, because copy would just copy a pointer to the _headers list, and therefore the original headers list would still get modified. The deepcopy isn't that costly here, though, because we know that it is only called on a message that is not a multipart, since the copies are triggered only from _handle_text. It's still not pretty, but I think it is the best we can do without completely restructuring the way the generator does its work.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Feb 10, 2014

    So why check specifically for 8bit and base64?

    I was in hurry when creating this patch. I think that was part of the debug code. This patch was created with the purpose to illuminate the culprit of this bug. And it served its purpose.

    After uploading this patch, I was about to refine the patch. Then I started to think about.... shouldn't there be a more elegant way to solve this problem? Cloning the msg after entering the flatten message? It is supposed to be a clean and simple solution. Then I hit the part where the boundary computation intentionally modifies the message object. Then I remembered the email API had too much magics on it. >.<

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant