New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MailHeader::get_value is adding spaces where it should not #19
Comments
Hm, it looks like in your example, there is a space before the newline, and a tab after the newline. So we keep the space from the newline, and collapse the newline/tab into another space. This results in two spaces. So the behaviour matches what the That being said, it's possible that this behaviour is wrong and that it should be collapsing all the whitespace surrounding the newline instead of just the whitespace following the newline. I'll take another look at the RFC. |
https://tools.ietf.org/html/rfc822#section-3.1.1 says:
|
OK so I think we should add a state to the state machine in order to not add a space if the character followed by the (CR) LF is the start of an LSPW-char. This would fix the issue and change well be small (+ add test for it). I can try a PR. |
@MicroJoe Any progress with the PR? |
Hey, I was waiting to know if the proposed fix would be fine. I will reserve some time in order to make a PR draft then. |
Fixed by #21. |
Hello,
The comment over the get_value function says that the parser should get rid of the extra whitespace introduced by MIME for multiline headers. However I think there is a problem in the implementation. Parsing around 2000 mails with this nice library made me find this problem when using multiline UTF-8 subject:
And the output is:
An extra whitespace is added, where it should not.
I can propose patch if you indicate me what to change. I am not very familiar with the RFC but I think this is a bug because a lot of mails I parse have this extra-whitespace problem in subjects.
The text was updated successfully, but these errors were encountered: