Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Honour "quoted-printable" encoding #39

Closed
stromvirvel opened this issue Oct 17, 2017 · 6 comments · Fixed by #41
Closed

Honour "quoted-printable" encoding #39

stromvirvel opened this issue Oct 17, 2017 · 6 comments · Fixed by #41

Comments

@stromvirvel
Copy link
Contributor

stromvirvel commented Oct 17, 2017

Problem overview

Some mail clients encodes the content as "quoted-printable". This makes sure umlauts like ä, ö, é, ... (8-bit data) will be transmitted over a 7-bit data path [1].

The sending client therefore will change an "ä" for example, into an "=E4". The receiving client should read the following header and if set to quoted-printable, it should decode such chars back into 8-bit data.
Content-Transfer-Encoding: quoted-printable

Zeyple encrypts the encoded data "=E4", and removes the important header:

del out_message["Content-Transfer-Encoding"]

If you receive such an email, you only see encoded data (like "=E4" instead of "ä").

Reproduce

You can reproduce this behaviour with the "mail" tool (installed by running yum install -y mailx on CentOS 7).

Without encryption

Make sure you have removed the receiver's PGP public key and run:
echo "ä" | mail -S encoding=quoted-printable -s test -r sender@address.invalid receiver@address.invalid

Result

Mail client show the char "ä" correctly. The raw mail looks like:

[... omitted ...]
Content-Transfer-Encoding: quoted-printable
[... omitted ...]

=E4

Expected result

This is the expected result.

With encryption

Make sure you have installed the receiver's PGP public key and run:
echo "ä" | mail -S encoding=quoted-printable -s test -r sender@address.invalid receiver@address.invalid

Result

Mail client shows the char "ä" incorrectly as "=E4".

Content-Transfer-Encoding in the mail header isn't set. But it actually doesn't have to be set, because the payload is encrypted anyways and I think, setting this header is useless for encrypted data.

Expected result

Expected is, that the mail client shows "ä" instead of the encoded "=E4".

Suggested solution

Zeyple actually should act like a mail client. It should interpret its receiving mail like a mail client. Therefore, zeyple should honour the Content-Transfer-Encoding header. If it is set to quoted-printable, zeyple should decode the payload before it encrypts the payload. This could be done using the quopri module [2]. Afterwards, the header can be removed.

Maybe I'll file a pull request with a code suggestion later.

[1] https://en.wikipedia.org/wiki/Quoted-printable
[2] https://docs.python.org/2/library/quopri.html

@infertux
Copy link
Owner

infertux commented Oct 17, 2017

Thanks a lot for the thorough writeup. Looking at the comment above, it seems we're deleting the header to fix a bug in Thunderbird. IIRC this used to crash Thunderbird a while ago. I would suggest to simply remove this line thereby keeping the header intact then check if Thunderbird has been fixed. Do you use Thunderbird by any chance?

@stromvirvel
Copy link
Contributor Author

stromvirvel commented Oct 17, 2017

Simply leaving the header doesn't help aswell, as it applies to the encrypted text in the mail payload. Encrypted data never has umlauts in it. We really need to decode the quoted-printable text, re-encode to UTF8, and then encrypt.

I'm currently using Apple Mail :-)

@infertux
Copy link
Owner

Hmm okay I don't see why we need to decode the quoted-printable text though. Is this how Thunderbird and Apple Mail handle it? Sorry I can't investigate that right now.

On a side note, I'd like to avoid adding dependencies as much as possible, especially if it's not compatible with Python 3.

@stromvirvel
Copy link
Contributor Author

I may be wrong, but if you simply encrypt a string =E4, how could gpg know that =E4 is encoded? The header only applies to the mail client, but not to gpg (AFAIK).

But I have to do further investigation. I'll test if simply leaving the header alive would actually help, and come back to you later.

@stromvirvel
Copy link
Contributor Author

I can confirm, that leaving the header Content-Transfer-Encoding: quoted-printable doesn't fix this problem. It doesn't even help when I add the same header to the attached inline file "encrypted.asc".

I'll write a proof of concept (hopefully without adding dependencies ;-)) and come back to you later.

stromvirvel added a commit to stromvirvel/zeyple that referenced this issue Oct 17, 2017
This fixes infertux#39. Payload is decoded according to the Content-Transfer-Encoding
header. The issue did not occur when a multipart message was sent.
@stromvirvel
Copy link
Contributor Author

Okay, this was easier than I thought 😃

See:
https://docs.python.org/2/library/email.message.html?highlight=quoted%20printable#email.message.Message.get_payload

Optional decode is a flag indicating whether the payload should be decoded or not, according to the Content-Transfer-Encoding header.

I only had to set decode=true in get_payload(). Additionally, I cannot reproduce the same problem with a multipart message. I guess this is because of the different way you are getting the payload.

However, I could not find an according flag in python3 documentation:
https://docs.python.org/3/library/email.message.html

Can you test this code with py3 (I don't understand your python test file for now)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants