New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate claims that UTF-7 encoding/decoding doesn't always work #187

Closed
mjs opened this Issue Dec 4, 2015 · 4 comments

Comments

3 participants
@mjs
Owner

mjs commented Dec 4, 2015

Originally reported by: Menno Smits (Bitbucket: mjs0)


See: https://github.com/MarechJ/py3_imap_utf7

If/when any problems are fixed, ask the author to update the page. Also add comments to SO where this is mentioned.


@mjs mjs added major bug labels Aug 29, 2016

@mjs mjs added this to the soon milestone Aug 29, 2016

@mjs mjs modified the milestones: 2.0.0, soon Aug 6, 2017

@mlorant

This comment has been minimized.

Show comment
Hide comment
@mlorant

mlorant Oct 5, 2017

Contributor

So should we take a peek at this module of MailPile then? Or anything built-in appeared since then (not likely I guess)?

It should have the 1.1.0 milestone too I think (instead of soon) since it's more a bug than a new feature...

Contributor

mlorant commented Oct 5, 2017

So should we take a peek at this module of MailPile then? Or anything built-in appeared since then (not likely I guess)?

It should have the 1.1.0 milestone too I think (instead of soon) since it's more a bug than a new feature...

@NicolasLM

This comment has been minimized.

Show comment
Hide comment
@NicolasLM

NicolasLM Oct 6, 2017

Collaborator

I don't know much about UTF-7 and how it is used in IMAP but:

>>> s = 'foo\r\n\nbar\n'
>>> s == s.encode('utf-7').decode('utf-7')
True

Isn't this enough?

Collaborator

NicolasLM commented Oct 6, 2017

I don't know much about UTF-7 and how it is used in IMAP but:

>>> s = 'foo\r\n\nbar\n'
>>> s == s.encode('utf-7').decode('utf-7')
True

Isn't this enough?

@mlorant

This comment has been minimized.

Show comment
Hide comment
@mlorant

mlorant Oct 6, 2017

Contributor

When reading the RFC 2152, I thought there was specificity needed by IMAP servers but in fact, the utf7 codec of CPython seems to be compliant with this same RFC.

I haven't dig into imap_utf7 code before, but I'm wondering why there is a difference imapclient result and the built-in one:

In [15]: 'A≠α'.encode('utf-7')
Out[15]: b'A+ImADsQ-'

In [16]: imap_utf7.encode('A≠α')
Out[16]: b'A&ImADsQ-'

The shift character is defined as "+" is the RFC, but twisted code (we forked in imapclient) is using "&"... Anyway, it is not as simple as that I guess.

# With imap_utf7.encode = s.encode('utf-7') and imap_utf7.decode = s.decode('utf-7')
In [4]: conn.list_folders()
Out[4]: 
[((b'\\HasChildren',), b'.', 'INBOX'),
 ((b'\\HasNoChildren',), b'.', 'INBOX.INBOX.Envoy&AOk-s'),
 ((b'\\HasNoChildren',), b'.', 'INBOX.A&ImADsQ-'),

# Set autoreload and revert to original code 
In [7]: %load_ext autoreload
In [8]: %autoreload 2
In [9]: conn = IMAPClient('...')

In [11]: conn.list_folders()
Out[11]: 
[((b'\\HasChildren',), b'.', 'INBOX'),
 ((b'\\HasNoChildren',), b'.', 'INBOX.INBOX.Envoyés'),
 ((b'\\HasNoChildren',), b'.', 'INBOX.A≠α'),
Contributor

mlorant commented Oct 6, 2017

When reading the RFC 2152, I thought there was specificity needed by IMAP servers but in fact, the utf7 codec of CPython seems to be compliant with this same RFC.

I haven't dig into imap_utf7 code before, but I'm wondering why there is a difference imapclient result and the built-in one:

In [15]: 'A≠α'.encode('utf-7')
Out[15]: b'A+ImADsQ-'

In [16]: imap_utf7.encode('A≠α')
Out[16]: b'A&ImADsQ-'

The shift character is defined as "+" is the RFC, but twisted code (we forked in imapclient) is using "&"... Anyway, it is not as simple as that I guess.

# With imap_utf7.encode = s.encode('utf-7') and imap_utf7.decode = s.decode('utf-7')
In [4]: conn.list_folders()
Out[4]: 
[((b'\\HasChildren',), b'.', 'INBOX'),
 ((b'\\HasNoChildren',), b'.', 'INBOX.INBOX.Envoy&AOk-s'),
 ((b'\\HasNoChildren',), b'.', 'INBOX.A&ImADsQ-'),

# Set autoreload and revert to original code 
In [7]: %load_ext autoreload
In [8]: %autoreload 2
In [9]: conn = IMAPClient('...')

In [11]: conn.list_folders()
Out[11]: 
[((b'\\HasChildren',), b'.', 'INBOX'),
 ((b'\\HasNoChildren',), b'.', 'INBOX.INBOX.Envoyés'),
 ((b'\\HasNoChildren',), b'.', 'INBOX.A≠α'),
@NicolasLM

This comment has been minimized.

Show comment
Hide comment
@NicolasLM

NicolasLM Oct 6, 2017

Collaborator

You're right, IMAP uses a modified version of the utf-7 encoding: https://tools.ietf.org/html/rfc3501#section-5.1.3

Collaborator

NicolasLM commented Oct 6, 2017

You're right, IMAP uses a modified version of the utf-7 encoding: https://tools.ietf.org/html/rfc3501#section-5.1.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment