Join GitHub today
Incoming message non-English characters are shown as ** #128
I am unable to reproduce this, either manually by sending emails directly, or via functional tests.
Considering this request: http://informatazyrtare.org/sq/request/special_characters
I note that the raw_email is correctly stored in encoded form: http://informatazyrtare.org/sq/admin/request/show_raw_email/28
If I download it (using the download link on that page), and then run:
The message is displayed correctly on my system (in the Holding Pen, since I don't have the original request).
Do you still get the asterisks if you do the same?
It's interesting to note that it's two asterisks for each single unicode character, which suggests an encoding issue to do with UTF8 -- like something is trying to convert it to ASCII.
I am running Ruby 1.8.7,
I have downloaded the message and run cat .. ./script/mail and is still the same
So, the problem is that elinks is used to generate the plain text email based on the HTML version. Elinks is receiving UTF8 but treating it as some single-byte character set, perhaps ASCII. The invocation
https://github.com/sebbacon/alaveteli/blob/master/app/models/incoming_message.rb#L832 calls through to the plain-textify method without passing a charset.
In this particular example, however, the email supplied wrongly declares its charset to be
The reason this doesn't work on the IZ server and works everywhere else is also unclear. The
Possibly the correct thing to do these days is to assume UTF8 in the first instance (given that here we have a broken charset in any case).
When looking into the performance issues on
Not sure how it ended up in this state, though.
Reopening this though I think the issue might be slightly different.
Incoming emails such as this one are still displaying double-byte characters as asterisks. It is part of the elinks conversion.
The really puzzling thing is that if I get the incoming message from the command line, and cause it to regenerate the text version, it works:
Viewing the request from the web browser now shows the correct characters.
However, if I reset the cache again as above, but omit the final step (
This is very confusing! I can only assume the different is somehow between the environment the web server's running in, and the environment that my rails console is running in (e.g. apache is running as
Also, if I run the
I just double checked this. The
So the input to the command isn't the problem.
The command is always invoked the same, viz.
If I run that command from a bash shel against the utf-8 input, then I get utf-8 output.
If I cause that command to be run from Rails, via the console using
If I cause that command to be run from Rails, via a web browser, I get the double byte characters replaced by asterisks.
The asterisks can be reproduced by forcing elinks to assume ASCII for the input (
Passenger is running Rails as
However, it is clearly some kind of elinks-related setting that is overriding or ignoring the codepage we're trying to set from the command line, because I have fixed this by adding