-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC 6531 (SMTPUTF8) support in smtpd #65924
Comments
I thought there was already an issue for this, but I can't find it. This is part of this summer's GSOC work, and involves adding support for the SMTPUTF8 extension to smtpd. |
Milan: your patch looks good for the most part. Now that I committed the decode_data patch you should modify it so that SMTPUTF8 implies decode_data=False (otherwise we *don't* have an 8-bit-clean channel). Please either attach the modified patch here or link the repository such that the patch can be generated (for rietveld review). I'll need to play with it for a bit to make sure there's nothing more needed. |
I merged it into https://bitbucket.org/zvyn/cpython and made a diff from that. (I used bash for this so hopefully its compatible to the review system here.) |
Hmm. There's a conflict marker in that patch. I *think* that if you do an 'hg pull --rebase' and then give your repository URL to the tracker, the 'create patch' button will do the right thing. (I tried it with the URL you sent me and it generated a diff that contained unrelated changes, which is why I suggest the hg pull --rebase). |
For some reason the diff shown by the review link is very different from the one show by the patch file itself. I'm not sure what is causing that, since the diff appears to be in the correct hg format. I don't even know where reitveld is getting the stuff on the left hand side. So, comments: Instead of issuing a warning for decode_data=True when smtputf8 is specified, having decode_data explicitly set to True when smtputf8 is specified should be a ValueError. Your modifications to the handling of max_command_size are technically buggy (and clearly there's no test for it): when we aren't in extended smtp mode, the limit should be the hardcoded value, as in the original code. My reading of the RFC (specifically 4.1.1.5) indicates that it is legal to send a new HELO or EHLO command without issuing a QUIT, meaning a single SMTPChannel instance might have to handle both. This isn't a likely scenario except in testing...but testing is probably one of the main uses of smtpd, so I think we ought to support it. Also, the code you use to update the max length value for MAIL takes a bit of thought to understand. I think it would be clearer to use an unconditional set of the value based on the command_size_limit constant, which you'll have to restore to address my previous comment, instead of the pop (though the pop is a nice trick :). The answer to your 'XXX should we check this?' is no. Remember that just because the client *says* SMTPUTF8, that doesn't mean the message actually has to be SMTPUTF8. So whatever bytes it sends need to be passed through. When SMTPUTF8 is not turned on, the server should act like it doesn't know anything about it, which means that if SMTPUTF8 is specified on the BODY command it should be treated just like an unknown parameter would be. We shouldn't leak the fact that we "know about" SMTPUTF8 if the server hasn't been enabled for SMTPUTF8. There is one piece missing here: providing a way for process_message to know that the client claimed the message was using SMTPUTF8 (we know it might lie, but we need to transmit what the client said). Since process_message is an 'override in subclass' API, we can't just extend it's call signature. So I think we need to provide a new process_smtputf8_message method that will be called for such messages. Note that you also need to reset the 'this message is SMTPUTF8 flag' on RSET, HELO/EHLO, and at the completion of the call to the new process_message API. This notes should also give you some clues about what additional test need to be written. |
Correction on the XXX should we check this: I was thinking about the wrong section of the code. But it is still 'no': by postel's law we should accept dirty data. Currently the consumer of the library can then decide whether or not to reject the dirty data by building a subclass. It might be nice to provide an option to control rejection of invalid characters in commands, but that should be a separate issue. (And, we are ending up with so many options that we might want to think about whether or not there is a better API for controlling them). |
Thanks a lot for your feedback!
I agree that Section 4.1.1 should be interpreted that way and that My intention was to leave the command_size_limits untouched on HELO so
I already implemented the things that are clear but am not sure how to
|
The new patch implements what you suggested, with the following differences:
|
Review comments added. |
I replied to some review comments, and made a new patch implementing your suggestions. |
I have updated your patch, changing wording of some of the documentation, and applying style tweaks to the code. I have also done the refactoring of the _set_xxx_state method that I suggested. Seemed easier to show you what I meant in code rather than try to explain it. You should use Reitveld's 'patch diff' facility to look at the changes that I made to your patch. Things that remain to be done: I'm getting warnings when I run the tests. These should be either suppressed or checked for (assertWarns). (Or fixed, in the case of assertEquals :) In the process_message docs it says that it should use RFC 821 format for the return, but in the new process method it says RFC 6531. This makes sense for the new method, but is there a more recent RFC the old method should be referring to? Or is there an open issue about more modern return codes for smtpd? I seem to remember something, but don't have time to look now. There is an issue with the reset of the maximum command length, but that can be dealt with in the 'duplicate HELO/EHLO' issue. process_smtputf8_message in DebuggingServer should be receiving bytes, and so should error when it tries to split via '\n'. Presumably this means there is a missing test as well, and also the same issue if decode_data is False when using DebuggingServer. (Unless I"m missing something; I didn't try to test it.) Different issue, but have you given any thought to what it would take to make PureProxy support SMTPUTF8? |
Thanks for the review and improvements! I fixed the warnings (didn't see them with running tests as described in the dev guide; I may improve that later), updated the docs to RFC 5321 (bpo-19679 is referring to RFC 3463 which is not required/updated/obsoleted by RFC 5321), and fixed the _print_message method of the DebuggingServer. Testing DebuggingServer seems difficult because it prints to stdout so I'm just checking that no exceptions are raised. I'm aware of the maximum command length thing but would fix it in bpo-21783 (because it becomes a bug when dealing with that). |
You can use test.support.captured_stdout or whatever its name is. |
I tweaked your additions. Instead of trying to strip out the 'b' to make things "look pretty", I think it is better to print them and thus make it explicit when we are dealing with binary data and when we are dealing with strings. It also clues the user in to the fact that the escaping inside the string is that used for bytes display, as opposed to string display. I also expanded the tests to include headers so that the 'X-Peer' header addition would get tested. This reveals a bug; whether it is in the tests or the code I don't know: the value of self.peer in the process_ methods is 'peer', not the expected ('b', 0). If you can look in to this issue it would be great. That looks like the last issue that needs to be addressed before commit. (Oh, yeah, and there was a bug in the original process_message code...it was printing an extra blank line at the end of the message text if there was, as there should be, a newline at the end of the message...I fixed that by switching to splitlines instead of split.) |
I think that the peer arg is supposed to be set to the address of the peer connecting to our server. |
OK, it's a bug in mock_socket, then. getpeername should be returning a tuple (address, port). I went ahead and fixed it, and committed the patch. Thanks Milan! Sorry the reviews were so delayed. |
New changeset ca696ca204e0 by R David Murray in branch 'default': |
Adding new parameter in the middle of positional argument list can break existing code. If you want change arguments order, make their keyword-only. |
Thanks for your review comments, serhiy.storchaka! |
Sorry, I was blind (switching between languages to much)! Anyway there should be no existing code using decode_data as it was introduced in this development circle. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: