Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

◆? charcter appears in the subject when it has multiple MIME lines #5760

Closed
kmuto opened this issue Oct 30, 2021 · 6 comments · Fixed by #5765
Closed

◆? charcter appears in the subject when it has multiple MIME lines #5760

kmuto opened this issue Oct 30, 2021 · 6 comments · Fixed by #5765
Labels
type: bug Something is causing incorrect behavior or errors

Comments

@kmuto
Copy link

kmuto commented Oct 30, 2021

Describe the bug

Graphical character ( + ?) appears in the subject of some mails written in Japanese language.

To Reproduce

Post a mail with subject like:

Subject: =?ISO-2022-JP?B?GyRCRnxLXDhsJEhGfEtcOGwkSEZ8S1w4bCROJUElJyVDGyhC?=
 =?ISO-2022-JP?B?GyRCJS8bKEI=?=

Current K-9 shows 日本語と日本語と日本語のチェッ◆ク on both tray list mode and mail content mode.

Expected behavior
Though the example subject has a newline, this should be parsed as one-line
'日本語と日本語と日本語のチェック'
by MIME decoding rule for subject. It looks current K-9 simply trys to show (unexpected) newline char.

Screenshots
Tray mode
Screenshot_20211030-123812

Content mode has same problem also
Screenshot_20211030-123822

Environment (please complete the following information):

  • K-9 Mail version: 5.904
  • Android version: 12
  • Device: Google Pixel 6
  • Account type: IMAP

Additional context
It seems same as #3622.
I'm not sure what was done by 'Fixed in master'. (Because I hadn't used K-9 mail since 2019, I didn't know it was fixed at that time.)

@kmuto kmuto added the type: bug Something is causing incorrect behavior or errors label Oct 30, 2021
@cketti
Copy link
Member

cketti commented Oct 30, 2021

Works for me when copying the text from your post. It's possible the actual email contains additional invisible characters. Can you attach the unmodified source of such a message?

@kmuto
Copy link
Author

kmuto commented Oct 30, 2021

Sure, atached.
issue5760-mail.zip

@cketti
Copy link
Member

cketti commented Oct 31, 2021

Apologies. Decoding the subject worked fine when running the code on my computer. However, when running on an Android device I can reproduce the issue you're seeing.

The problem seems to be the change we made to support improperly encoded subjects (PR #2725). With the change we strip the Q- or B-encoding from the segments and then perform the character set decoding on the concatenated bytes.
ISO-2022-JP uses escape sequences to switch character sets. At the end of an encoded word (segment) the encoder has to insert an escape sequence to switch back to ASCII. So, when concatenated, the data reads like this:

[switch to JIS X 0208:1983] [some characters] [switch to ASCII] [switch to JIS X 0208:1983] [some characters] [switch to ASCII]

The switch to ASCII and then back to JIS X 0208:1983 is unnecessary. The charset decoder on the JVM doesn't care and decodes the data as expected. However, the decoder on Android does mind and inserts a replacement character �.

The proper way of decoding the data is to completely decode the individual segments and then concatenate the decoded text. However, many email clients (including old versions of K-9 Mail) mess up the encoding and require the decoding we added with PR #2725.

We'll have to figure out a way to pick which decoding method to select. Maybe assume the text is properly encoded and use the "one segment at a time" approach. And when that leads to a replacement character being present in the output, try the "combine segments, then decode" method.

Side note: If you have any control over the creation of such a message, please use UTF-8 instead of ISO-2022-JP.

@kmuto
Copy link
Author

kmuto commented Oct 31, 2021

Thank you for your speedy observation!
I hope it will be fixed in next release.

Side note: If you have any control over the creation of such a message, please use UTF-8 instead of ISO-2022-JP.

Well,ISO-2022-JP is still common encoding for Japanese e-mail.
I myself can change the encoding of subject to UTF-8 by modifying my MUA, but can't change sender's MUA or system. 😢

@cketti
Copy link
Member

cketti commented Nov 3, 2021

Thunderbird ran into the same problem: https://bugzilla.mozilla.org/show_bug.cgi?id=1374149

I'm adopting their fix for K-9 Mail.

@kmuto
Copy link
Author

kmuto commented Nov 29, 2021

I confirmed this issue was fixed in v5.905. Thank you! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something is causing incorrect behavior or errors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants