Support for malformed unstructured fields containing encoded words #29

BryanLeong · 2022-08-29T06:56:46Z

I've come across a number of emails where the subject, which contains encoded words, was modified by the recipients' mail server such that the final subject became something like:

[SUSPECTED SPAM]=?utf-8?B?VGhpcyBpcyB0aGUgb3JpZ2luYWwgc3ViamVjdA==?=

I understand this does not get decoded as it is missing a space before the encoded word as required in the spec

Ordinary ASCII text and 'encoded-word's may appear together in the
same header field. However, an 'encoded-word' that appears in a
header field defined as '*text' MUST be separated from any adjacent
'encoded-word' or 'text' by 'linear-white-space'.

Would it be possible to add support for parsing these types of malformed fields, seeing as mail servers which do this are relatively common?

The text was updated successfully, but these errors were encountered:

mdecimus · 2022-08-29T10:03:47Z

Sure, the library already handles different types of malformed messages so it's a good idea to support broken encoded headers. Do you have other samples of broken encoded headers? Or is it mostly missing spaces between the prefix and the beginning of the encoded text?

BryanLeong · 2022-08-29T11:06:07Z

missing spaces between the prefix and the beginning of the encoded text

Mostly ☝🏼, but also headers with empty encoded words, eg: Some text here =?utf-8?Q??=

mdecimus · 2022-08-29T11:08:47Z

Thanks, I'll look into it shortly.

mdecimus · 2022-09-08T16:00:43Z

Hi, the version now on master has support for malformed unstructured fields. I have tested it with the samples you provided but could you confirm it works well before I publish it to crates.io? Thanks.

BryanLeong · 2022-09-09T07:40:49Z

Thanks! Confirmed that it fixes the issues mentioned above.

I found a couple more broken ones though, this time they seem to be caused by newlines in the middle of the encoded words 😅

Subject: =?utf-8?Q?Hello
 _there!?=

Expected: Hello there!
Actual: =?utf-8?Q?Hello _there!?=

mdecimus · 2022-09-09T08:09:04Z

Done! I just pushed the fix which also supports multi-line base64 encoded words.

BryanLeong · 2022-09-09T08:30:36Z

Small issue: the folding whitespace gets included in the output instead of being removed together with the newline

mdecimus · 2022-09-09T08:55:16Z

Just fixed it.

BryanLeong · 2022-09-09T09:09:26Z

All good now. Thanks! 🙏🏼

mdecimus · 2022-09-09T09:11:05Z

Perfect, just published version 0.6.1 to crates.io.

mdecimus added a commit that referenced this issue Sep 8, 2022

Support for malformed unstructured fields containing encoded words (#29)

c5ce06c

mdecimus closed this as completed Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for malformed unstructured fields containing encoded words #29

Support for malformed unstructured fields containing encoded words #29

BryanLeong commented Aug 29, 2022

mdecimus commented Aug 29, 2022

BryanLeong commented Aug 29, 2022

mdecimus commented Aug 29, 2022

mdecimus commented Sep 8, 2022

BryanLeong commented Sep 9, 2022

mdecimus commented Sep 9, 2022

BryanLeong commented Sep 9, 2022

mdecimus commented Sep 9, 2022

BryanLeong commented Sep 9, 2022

mdecimus commented Sep 9, 2022

Support for malformed unstructured fields containing encoded words #29

Support for malformed unstructured fields containing encoded words #29

Comments

BryanLeong commented Aug 29, 2022

mdecimus commented Aug 29, 2022

BryanLeong commented Aug 29, 2022

mdecimus commented Aug 29, 2022

mdecimus commented Sep 8, 2022

BryanLeong commented Sep 9, 2022

mdecimus commented Sep 9, 2022

BryanLeong commented Sep 9, 2022

mdecimus commented Sep 9, 2022

BryanLeong commented Sep 9, 2022

mdecimus commented Sep 9, 2022