Example file: https://github.com/SheetJS/test_files/blob/master/biff5/number_format_greek.xls.0.csv
The encoding is windows codepage 1253. The original XLS file is https://github.com/SheetJS/test_files/blob/master/biff5/number_format_greek.xls. To reproduce, set the language for non-Unicode characters to "Greek".
What is the correct encoding setting to parse the file? Using "iso-8859-7" appears to have no effect on the result.
(Mac user here)
Hm, it's whatever encoding would be passed into readAsText on the HTML5 FileReader. Does "CP1253" work, maybe? Meanwhile, I'm taking a look at the file to reproduce on my machine if I can.
@mholt unfortunately CP1253 didn't help either (you don't need Windows to test this).
FYI: you can always set up Windows via Boot Camp or a virtual machine (like virtualbox or fusion). Excel performance is somewhat slower in a VM but it is convenient for quick tests like this
Thanks for helping me through this -- I'm a bit inexperienced with encodings.
I think I'm having trouble downloading the file in its proper format. GitHub shows me this:
And the "Raw" version appears as:
I get the same result when I "Save Link As..." -- is that what it's supposed to look like? I want to make sure I have the data in the same format as you before continuing...
Incidentally, parsing the file as I have it now seems to work OK -- what exactly is the problem?
pbcopy and other utilities may end up transparently converting to UTF-8. The safest way to do this is to pass a base64-encoded string, as follows:
$ curl https://raw.githubusercontent.com/SheetJS/test_files/master/biff5/number_format_greek.xls.0.csv | base64 | pbcopy
(pbcopy is a cool little program that copies data to the pasteboard)
document.getElementById('input').value = atob("
To see what Excel shows in the correct codepage, I threw up a quick demo: http://sheetjs.com/demos/codepage.html. Select codepage 1253, paste the base64 text in the textarea, check the base64 box, and click Convert. You should see:
The demo uses the codepage library, but that is definitely inappropriate in this context. I was wondering if there was a way to process CSVs from excel using your encoding parameter (or maybe you can clarify what the encoding is supposed to do -- do you have a sample file with a different encoding?).
Okay, thanks. That helps me feel confident that I'm working with the same data now!
But, unfortunately, I'm still lost as to where things aren't working. You say it does not "handle" the file, but everything I'm trying seems to work fine. What's happening that's different from what's expected?
If I run the data through the console, I don't see the same characters that excel shows:
The second column according to the parser is "Ïêô-33". When I open the CSV in Excel, I see that the second column is "Οκτ-33".
Okay. That helps a lot. Now I see the specific characters that you're talking about. Using your tool I was able to get the get the file saved with the correct encoding so that the data looks like "Οκτ-33" whereas before it was "Ïêô-33" or question marks. I used Sublime Text and saved the file with the Windows 1253 encoding and re-opened it to confirm that everything was preserved. It copied and pasted and again confirmed that the characters/encoding were preserved.
I then used the demo page to choose the file from my file system and parsed it using the default (UTF-8) encoding. On row 14 indeed, I saw what you see: "Ïêô-33"
Then I changed the encoding in the encoding textbox to "cp1253" and re-parsed. The same field appeared as "Οκτ-33". I also tried it using "iso-8859-7" and had the same result. Both seemed to work.
As far as I can tell, this is working properly. Is the CSV file saved on your system as UTF-8 or Windows 1253? (From your instructions, it sounds like it is...) Does knowing the steps I took help?
I see my confusion: the encoding parameter only applies to files, not to data pushed via the textarea.
This probably should be mentioned on the demo page.
Ahhhhh. I understand now. Sorry for the confusion. I'll clarify that on the demo page, indeed.
Clarifying encoding only applies to reading local files (issue #64)
As a sort of follow-up, this looks pretty promising for dealing with encodings when parsing strings: https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/iWDqDWQ8mhs
Added encoding/Excel FAQ, due to issues #64 and #169.