`◆?` charcter appears in the subject when it has multiple MIME lines #5760

kmuto · 2021-10-30T04:10:37Z

Describe the bug

Graphical character (◆ + ?) appears in the subject of some mails written in Japanese language.

To Reproduce

Post a mail with subject like:

Subject: =?ISO-2022-JP?B?GyRCRnxLXDhsJEhGfEtcOGwkSEZ8S1w4bCROJUElJyVDGyhC?=
 =?ISO-2022-JP?B?GyRCJS8bKEI=?=

Current K-9 shows 日本語と日本語と日本語のチェッ◆ク on both tray list mode and mail content mode.

Expected behavior
Though the example subject has a newline, this should be parsed as one-line
'日本語と日本語と日本語のチェック'
by MIME decoding rule for subject. It looks current K-9 simply trys to show (unexpected) newline char.

Screenshots
Tray mode

Content mode has same problem also

Environment (please complete the following information):

K-9 Mail version: 5.904
Android version: 12
Device: Google Pixel 6
Account type: IMAP

Additional context
It seems same as #3622.
I'm not sure what was done by 'Fixed in master'. (Because I hadn't used K-9 mail since 2019, I didn't know it was fixed at that time.)

The text was updated successfully, but these errors were encountered:

cketti · 2021-10-30T15:30:35Z

Works for me when copying the text from your post. It's possible the actual email contains additional invisible characters. Can you attach the unmodified source of such a message?

kmuto · 2021-10-30T23:59:45Z

Sure, atached.
issue5760-mail.zip

cketti · 2021-10-31T03:52:32Z

Apologies. Decoding the subject worked fine when running the code on my computer. However, when running on an Android device I can reproduce the issue you're seeing.

The problem seems to be the change we made to support improperly encoded subjects (PR #2725). With the change we strip the Q- or B-encoding from the segments and then perform the character set decoding on the concatenated bytes.
ISO-2022-JP uses escape sequences to switch character sets. At the end of an encoded word (segment) the encoder has to insert an escape sequence to switch back to ASCII. So, when concatenated, the data reads like this:

[switch to JIS X 0208:1983] [some characters] [switch to ASCII] [switch to JIS X 0208:1983] [some characters] [switch to ASCII]

The switch to ASCII and then back to JIS X 0208:1983 is unnecessary. The charset decoder on the JVM doesn't care and decodes the data as expected. However, the decoder on Android does mind and inserts a replacement character �.

The proper way of decoding the data is to completely decode the individual segments and then concatenate the decoded text. However, many email clients (including old versions of K-9 Mail) mess up the encoding and require the decoding we added with PR #2725.

We'll have to figure out a way to pick which decoding method to select. Maybe assume the text is properly encoded and use the "one segment at a time" approach. And when that leads to a replacement character being present in the output, try the "combine segments, then decode" method.

Side note: If you have any control over the creation of such a message, please use UTF-8 instead of ISO-2022-JP.

kmuto · 2021-10-31T04:21:01Z

Thank you for your speedy observation!
I hope it will be fixed in next release.

Side note: If you have any control over the creation of such a message, please use UTF-8 instead of ISO-2022-JP.

Well,ISO-2022-JP is still common encoding for Japanese e-mail.
I myself can change the encoding of subject to UTF-8 by modifying my MUA, but can't change sender's MUA or system. 😢

cketti · 2021-11-03T14:14:16Z

Thunderbird ran into the same problem: https://bugzilla.mozilla.org/show_bug.cgi?id=1374149

I'm adopting their fix for K-9 Mail.

kmuto · 2021-11-29T08:06:42Z

I confirmed this issue was fixed in v5.905. Thank you! 😄

kmuto added the type: bug Something is causing incorrect behavior or errors label Oct 30, 2021

cketti mentioned this issue Nov 3, 2021

Properly decode multiple encoded-words using ISO-2022-JP #5765

Merged

cketti closed this as completed in #5765 Nov 15, 2021

cketti mentioned this issue Dec 3, 2021

Subject MIME-encoded into multiple lines contains garbage string #5804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`◆?` charcter appears in the subject when it has multiple MIME lines #5760

`◆?` charcter appears in the subject when it has multiple MIME lines #5760

kmuto commented Oct 30, 2021

cketti commented Oct 30, 2021

kmuto commented Oct 30, 2021

cketti commented Oct 31, 2021

kmuto commented Oct 31, 2021

cketti commented Nov 3, 2021

kmuto commented Nov 29, 2021

◆? charcter appears in the subject when it has multiple MIME lines #5760

◆? charcter appears in the subject when it has multiple MIME lines #5760

Comments

kmuto commented Oct 30, 2021

cketti commented Oct 30, 2021

kmuto commented Oct 30, 2021

cketti commented Oct 31, 2021

kmuto commented Oct 31, 2021

cketti commented Nov 3, 2021

kmuto commented Nov 29, 2021

`◆?` charcter appears in the subject when it has multiple MIME lines #5760

`◆?` charcter appears in the subject when it has multiple MIME lines #5760