Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imaplib should support international mailbox names #49555

Open
jamesh mannequin opened this issue Feb 18, 2009 · 20 comments
Open

imaplib should support international mailbox names #49555

jamesh mannequin opened this issue Feb 18, 2009 · 20 comments
Labels
stdlib Python modules in the Lib dir topic-email type-feature A feature request or enhancement

Comments

@jamesh
Copy link
Mannequin

jamesh mannequin commented Feb 18, 2009

BPO 5305
Nosy @loewis, @jcea, @vstinner, @mcepl, @bearbin
Dependencies
  • bpo-22598: Add mUTF-7 codec (UTF-7 modified for IMAP)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2009-02-18.05:36:08.878>
    labels = ['type-feature', 'library']
    title = 'imaplib should support international mailbox names'
    updated_at = <Date 2018-04-21.09:01:04.610>
    user = 'https://bugs.python.org/jamesh'

    bugs.python.org fields:

    activity = <Date 2018-04-21.09:01:04.610>
    actor = 'mcepl'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2009-02-18.05:36:08.878>
    creator = 'jamesh'
    dependencies = ['22598']
    files = []
    hgrepos = []
    issue_num = 5305
    keywords = []
    message_count = 20.0
    messages = ['82408', '82411', '82510', '82529', '82539', '82795', '82797', '127176', '132013', '148224', '151859', '215115', '215116', '215117', '228939', '228949', '228980', '229056', '248173', '315014']
    nosy_count = 11.0
    nosy_names = ['loewis', 'jcea', 'jamesh', 'vstinner', 'mcepl', 'Hiroaki.Kawai', 'astsmtl', 'BabakM', 'cfraire', 'dveeden', 'bearbin']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue5305'
    versions = ['Python 3.5']

    @jamesh
    Copy link
    Mannequin Author

    jamesh mannequin commented Feb 18, 2009

    The IMAP4rev1 specification allows for non-ASCII mailbox names using a
    modified UTF-7 encoding (section 5.1.3 of RFC 2060 or 3501). However,
    the imaplib routines taking a mailbox name just pass the string straight
    through without any encoding.

    It would be useful if Python provided an encoder/decoder for the
    modified UTF-7 encoding, and optionally if imaplib would perform the
    encoding and decoding at the appropriate points.

    @jamesh jamesh mannequin added the stdlib Python modules in the Lib dir label Feb 18, 2009
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Feb 18, 2009

    Can you provide a patch?

    @jamesh
    Copy link
    Mannequin Author

    jamesh mannequin commented Feb 20, 2009

    I'll have a go at implementing the algorithm. It looks like the
    modifications to UTF-7 are large enough that you can't do a search and
    replace on the output of the existing UTF-7 codec, so it'll probably
    require new code.

    Would String2Mailbox and Mailbox2String utility functions be appropriate
    here?

    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented Feb 20, 2009

    IMAP4 UTF-7 is implemented in Twisted -
    <http://twistedmatrix.com/trac/browser/trunk/twisted/mail/imap4.py#L5385\>,
    <http://twistedmatrix.com/trac/browser/trunk/twisted/mail/test/test_imap.py#L58\>.
    Feel free to re-use any of that code that would be helpful.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Feb 20, 2009

    I don't have a good understanding of imaplib; if you think it's
    appropriate to provide the conversion through two functions, I trust you.

    @vstinner
    Copy link
    Member

    The IMAP4rev1 specification allows for non-ASCII mailbox
    names using a modified UTF-7 encoding

    UTF-7 already sounds like something horrible for me, but a *modified*
    UTF-7 encoding is something a little bit more strange for me. Why not
    reusing directly UTF-7.

    (sorry, it's an off topic dummy question)

    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented Feb 27, 2009

    UTF-7 already sounds like something horrible for me, but a *modified*
    UTF-7 encoding is something a little bit more strange for me. Why not
    reusing directly UTF-7.

    UTF-7 wasn't horrible for its time, but its time has very likely passed.
    Alas, changing a standard like IMAP4 is so difficult, this mistake will
    be with us for a long time to come.

    As for why IMAP4 uses a modified form of UTF-7, the RFC addresses this:

    The purpose of these modifications is to correct the following
    problems with UTF-7:

      1) UTF-7 uses the "+" character for shifting; this conflicts with
         the common use of "+" in mailbox names, in particular USENET
         newsgroup names.
    
      2) UTF-7's encoding is BASE64 which uses the "/" character; this
         conflicts with the use of "/" as a popular hierarchy delimiter.
    
      3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with
         the use of "\" as a popular hierarchy delimiter.
    
      4) UTF-7 prohibits the unencoded usage of "~"; this conflicts with
         the use of "~" in some servers as a home directory indicator.
    
      5) UTF-7 permits multiple alternate forms to represent the same
         string; in particular, printable US-ASCII characters can be
         represented in encoded form.
    

    Whether you are convinced by these arguments or not is, of course,
    entirely up to you. Note also, however, that the modified UTF-7 is not
    mandated by the RFC:

    By convention, international mailbox names in IMAP4rev1 are specified
    using a modified version of the UTF-7 encoding described in [UTF-7].
    Modified UTF-7 may also be usable in servers that implement an
    earlier version of this protocol.

    However, it seems stupid to say that the choice if encoding is only a
    convention since there is no other way to communicate the choice of
    encoding between client and server.

    @HiroakiKawai
    Copy link
    Mannequin

    HiroakiKawai mannequin commented Jan 27, 2011

    twisted's code does not work good for "\t", "\r", "\n", those characters must encoded in modified base64 form according to RFC 3501.

    @astsmtl
    Copy link
    Mannequin

    astsmtl mannequin commented Mar 24, 2011

    So noone is working on this issue ATM?

    @BabakM
    Copy link
    Mannequin

    BabakM mannequin commented Nov 24, 2011

    There's a working implementation of this in PloneMailList.
    http://svn.plone.org/svn/collective/mxmImapClient/trunk/imapUTF7.py

    @cfraire
    Copy link
    Mannequin

    cfraire mannequin commented Jan 23, 2012

    I've used the PloneMailList implementation in another project. It works well to add 'imap4-utf-7' as codec.

    The twisted imap implementation seems to have been updated to properly support non-printable ASCII, but the twisted imap API is problematic for imaplib because twisted seems to expect its arguments to already be Python unicode.

    So can we be specific about what kind of API change would satisfy this issue:

    1. a number of API methods take one or more mailbox arguments. Of course, imaplib currently expects these to be ASCII, but what kind of argument should the methods take? UTF? Unicode? So would the library need a class property to describe an optional specified input encoding? Would it be expected to take Python unicode?

    2. some methods, such as list and lsub, return mailbox names UTF-7 encoded and embedded in larger ASCII strings. Would imaplib be expected to alter the contents of these large strings and transform them into another other encoding (when a switch as described in 1) is active)?

    @jcea
    Copy link
    Member

    jcea commented Mar 29, 2014

    Being bitten by this today.

    @jcea
    Copy link
    Member

    jcea commented Mar 29, 2014

    Point 2 of cfraire message is a big issue.

    What about leaving this problem to the library user simply providing two helper functions in the module to encode/decode mUTF-7?.

    @jcea
    Copy link
    Member

    jcea commented Mar 29, 2014

    Or a new encoder/decoder in "codecs" module.

    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented Oct 10, 2014

    the twisted imap API is problematic for imaplib because twisted seems to expect its arguments to already be Python unicode.

    Could you elaborate on this? As far as I can tell, it works fine:

        >>> import twisted.mail.imap4
        >>> print u"Hello, \N{SNOWMAN}".encode('imap4-utf-7')
        Hello, &JgM-
        >>> print b'Hello, &JgM-'.decode('imap4-utf-7')
        Hello, ☃
        >>> 

    What would you expect to work differently?

    @HiroakiKawai
    Copy link
    Mannequin

    HiroakiKawai mannequin commented Oct 10, 2014

    > the twisted imap API is problematic for imaplib because twisted seems to expect its arguments to already be Python unicode.
    Could you elaborate on this? As far as I can tell, it works fine:

    twisted imap4-utf-7 seems to be improved in this 2 years. :-)

    @jcea
    Copy link
    Member

    jcea commented Oct 10, 2014

    First step is to provide mUTF-7 in Python 3.5. Then we can try to update imaplib. I am specially worried about the points cfraire raises in http://bugs.python.org/issue5305#msg151859. Lets see.

    @cfraire
    Copy link
    Mannequin

    cfraire mannequin commented Oct 11, 2014

    > the twisted imap API is problematic for imaplib because twisted seems to expect its arguments to already be Python unicode.

    Could you elaborate on this? As far as I can tell, it works fine:

    I wasn't addressing encode/decode specifically. Both twisted and PloneMailList offer implementations with same encoding name, "imap4-utf-7".

    I meant that it's difficult for the twisted API to inform what might be done for imaplib since twisted takes full unicode but imaplib expects only unicode-ASCII subset.

    The first part of jamesh's original issue is just encoder/decoder, so either twisted or PloneMailList would seem to suffice. I was addressing jamesh's second part whether "optionally if imaplib would perform the encoding and decoding at the appropriate points."

    Point 2 of my response seems the more difficult. imaplib list and lsub return str instances with ASCII + utf-7 stuffed together. (twisted avoids this by returning tuples of unicode, if I understand correctly).

    @astsmtl astsmtl mannequin added the type-feature A feature request or enhancement label Jul 22, 2015
    @jcea
    Copy link
    Member

    jcea commented Aug 7, 2015

    Ping.

    @bearbin
    Copy link
    Mannequin

    bearbin mannequin commented Apr 6, 2018

    ssu

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-email type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants