Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for malformed unstructured fields containing encoded words #29

Closed
BryanLeong opened this issue Aug 29, 2022 · 10 comments
Closed

Comments

@BryanLeong
Copy link
Contributor

I've come across a number of emails where the subject, which contains encoded words, was modified by the recipients' mail server such that the final subject became something like:

[SUSPECTED SPAM]=?utf-8?B?VGhpcyBpcyB0aGUgb3JpZ2luYWwgc3ViamVjdA==?=

I understand this does not get decoded as it is missing a space before the encoded word as required in the spec

Ordinary ASCII text and 'encoded-word's may appear together in the
same header field. However, an 'encoded-word' that appears in a
header field defined as '*text' MUST be separated from any adjacent
'encoded-word' or 'text' by 'linear-white-space'.

Would it be possible to add support for parsing these types of malformed fields, seeing as mail servers which do this are relatively common?

@mdecimus
Copy link
Member

Sure, the library already handles different types of malformed messages so it's a good idea to support broken encoded headers. Do you have other samples of broken encoded headers? Or is it mostly missing spaces between the prefix and the beginning of the encoded text?

@BryanLeong
Copy link
Contributor Author

missing spaces between the prefix and the beginning of the encoded text

Mostly ☝🏼, but also headers with empty encoded words, eg: Some text here =?utf-8?Q??=

@mdecimus
Copy link
Member

Thanks, I'll look into it shortly.

@mdecimus
Copy link
Member

mdecimus commented Sep 8, 2022

Hi, the version now on master has support for malformed unstructured fields. I have tested it with the samples you provided but could you confirm it works well before I publish it to crates.io? Thanks.

@BryanLeong
Copy link
Contributor Author

Thanks! Confirmed that it fixes the issues mentioned above.

I found a couple more broken ones though, this time they seem to be caused by newlines in the middle of the encoded words 😅

Subject: =?utf-8?Q?Hello
 _there!?=

Expected: Hello there!
Actual: =?utf-8?Q?Hello _there!?=

@mdecimus
Copy link
Member

mdecimus commented Sep 9, 2022

Done! I just pushed the fix which also supports multi-line base64 encoded words.

@BryanLeong
Copy link
Contributor Author

Small issue: the folding whitespace gets included in the output instead of being removed together with the newline

@mdecimus
Copy link
Member

mdecimus commented Sep 9, 2022

Just fixed it.

@BryanLeong
Copy link
Contributor Author

All good now. Thanks! 🙏🏼

@mdecimus
Copy link
Member

mdecimus commented Sep 9, 2022

Perfect, just published version 0.6.1 to crates.io.

@mdecimus mdecimus closed this as completed Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants