Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback encoding for TextPart.Text #88

Closed
princeoffoods opened this issue Dec 11, 2014 · 12 comments
Closed

Fallback encoding for TextPart.Text #88

princeoffoods opened this issue Dec 11, 2014 · 12 comments
Labels
bug Something isn't working

Comments

@princeoffoods
Copy link
Contributor

Hello Jeff,

When no charset is specified for a TextPart MimeKit falls back to UTF8 and only if this throws to iso-8859-1.

We frequently receive emails with a TextPart having 8bit Content-Transfer-Encoding and no charset parameter.
In our case we must always directly decode using iso-8859-1 (UTF8 encoding garbles the text).

I'm currently using a custom build falling back to iso-8859-1 only, which is working well for us.

Maybe this case is too special, but If other MimeKit users have the same problem, it might be helpful to be able to specify a default fallback encoding.

Just wanted to give some feedback.
Alex

@jstedfast
Copy link
Owner

You shouldn't need to use a custom build, you can just use:

TextPart.GetText (Encoding charset)

The Text property is really just a convenience thing for the well-defined cases. In general, if text is not in the UTF-8 encoding, Encoding.UTF8.GetString() will fail - it won't garble the text. You must be getting (un)lucky with the latin1 text you are receiving in that it must just happen to have byte sequences that are valid UTF-8 and make sense in latin1.

@jstedfast
Copy link
Owner

I've just added TextPart.GetText (string charset) as well, now, so it's easier to use.

@princeoffoods
Copy link
Contributor Author

Thanks for your comments. This means something else went wrong.

I'm storing received messages in a database to be processed later.

  1. Get MimeMessage using MailKit PopClient
  2. Get message bytes from stream using .WriteTo() to store them in a database.
    These bytes do not contain an UTF8 marker.

Using MimeMessage.Load with the bytes from step 2 I get the garbled text problem.

The problem disappears after converting the bytes to UTF8 using Encoding.UTF8.GetBytes...

@jstedfast
Copy link
Owner

When I added the new GetText() overloads, I simplified the Text property logic. It was previously converting the bytes from their native encoding into UTF-8 and then doing Encoding.UTF8.GetString () on the resulting buffer, so it was effectively forcing the text into UTF-8, which might have been why.

Could you test the latest version of the .Text property to see if that still happens?

@princeoffoods
Copy link
Contributor Author

The behavior is still unchanged.

in TextPart.Text:


    if (encoding == null) {
    try {
        //___ this does not throw and causes garbled text: 
        return Encoding.UTF8.GetString (content, 0, (int) memory.Length);
    } catch {
        //___  this is the code that works but does not get executed
        // fall back to iso-8859-1
        encoding = Encoding.GetEncoding (28591); // iso-8859-1
    }
}
    return encoding.GetString (content, 0, (int) memory.Length);

@jstedfast
Copy link
Owner

Can you give me an example of what you mean by "garbled"?

@princeoffoods
Copy link
Contributor Author

All characters like äöüÖÜÄß are displayed as replacement character (question mark).
Image

@jstedfast
Copy link
Owner

Ah... ok, so I think the problem here is that Encoding.UTF8's default Decoder is configured to replace unknown bytes with that question mark character.

jstedfast added a commit that referenced this issue Dec 12, 2014
@jstedfast
Copy link
Owner

I think this patch will fix things up for you.

@princeoffoods
Copy link
Contributor Author

Fixed. Thank you!

I would like to send you a holiday present from Germany. Are you into whisky? (not German whisky :)

@jstedfast
Copy link
Owner

I don't drink, but thanks for the offer ;-)

The "thank you" is plenty to satisfy me.

@princeoffoods
Copy link
Contributor Author

OK, but I'll add some extra flattery next time :)

@jstedfast jstedfast added the bug Something isn't working label Mar 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants