-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
email.utils.formataddr() should be rfc2047 aware #44784
Comments
formataddr() should rfc2047 encode its name argument if necessary. |
I am just responding so this will not show up on the 'unanswered issues' list. |
I implemented a basic test for the issue and an attempt for a fix. I am not entirely sure with my implementation, specifically I would like to get comments concerning the following points:
I am submitting this patch as part of my preparation for the Google Summer of Code to familiarize myself with the contribution process, any feedback on what I should do different is very welcome. |
The general approach of the patch looks good to me. Since formataddr is designed to be called from user code that is constructing a message, having it raise for non-ascii in the address is probably OK. However, there should be a test for that, and I'm curious to know what happens if you use such an address in an address field in the unmodified email package. Instead of directly calling bencode, you should use the charset module and its header_encode method. Note that you need to turn the charset into a Charset instance first. The advantage of doing this is that it will choose the "best" encoding to use based on the charset and the contents of the string. Your choice of location for the new tests is fine; TestMiscelaneous really should be split up a bit, but that will wait until I do a general refactoring of the tests. Thanks for working on this. |
I added a test to check if the exceptions get thrown when a address is invalid. I also added a small test to check how a resulting message should look, it looks good to me but I am not a specialist with email. Do you have any other ideas how to check if it does not have a negative impact to other parts of the module?
The code also uses email.charset.Charset now. |
You should check if 'charset' is a string, and call Charset on it only if it is (a Charset may be passed directly in other email package interfaces, and so should be supported here as well. The test doesn't need to cater for the fact that either b or B (or q or Q) are legitimate: we know which one the package is generating, so just test for that. For the Message['To'], I wasn't clear. What I would like is a test that includes non-ascii characters in the address part, *without* passing it through formataddr, to see what the package currently does with it. This may in fact reveal an additional bug. But, it is really out of scope for this issue, so you can just remove that test (sorry). There should also be an update to the docs (Doc/library/email.utils.rst) documenting the API change. |
I incorporated the changes as you suggested and added the text to the docs. Just out of curiosity, why are the docs repeated in email.util.rst when they are already in the docstrings? |
Thanks. Looks good except that it should check isinstance(string) rather than isinstance(Charset), that way someone can pass a custom class that implements the Charset API if they want. (Alternatively, the check could be for an encode_header method...actually that might be better, although it isn't what the other email modules do.) The doc strings are an abreviated version of what is in the docstrings, and the text is often not-quite-equivalent even when it is not a strict subset of the docs. We believe it produces higher quality documentation to maintain them separately and tune each one for its intended use case (though this does mean that they occasionally get out of sync due to oversights). |
I incorporated that change as well. My rationale behind the previous version was to be consistent with how Lib/email/header.py handled this, unfortunately I did not look around in the other classes and didn't think about that kind of compatibility. When formataddr() is called with a object which is not a string and which does not have a header_encode it will raise the following exception now:
Thank you for your patience, sorry that it took probably more of your time by taking 4 iterations for this patch than if you had just implemented it yourself. |
Ah, yes. Header is probably wrong there, I should fix that at some point. Sorry for the misytpes in my last message (it was late at night for me when I wrote it :) As for time, it probably didn't take any more time than it would have to write it myself, and the end product is almost certainly better for having had two sets of eyes on it. This kind of back and forth often happens even when it is an experienced developer writing the patch. But even if neither of those were true it would be worthwhile to do it in order to support you in learning to contribute. Thanks again for working on this, and I'll probably commit it some time today. |
New changeset 184ddd9acd5a by R David Murray in branch 'default': |
Finally got around to committing this; thanks, Torsten. As a reward, I'm going to make you nosy on a new, related issue I'm about to create. It is, of course, your option whether you want to work on it :) By the way, have you submitted a contributor agreement? This patch isn't really big enough to require one, but having one on file is always a good idea, especially if you are going to keep contributing (and I hope you do). |
Hi David, thank you for polishing up the patch and committing it. :) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: