Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode characters on downloaded CSV (utf-8 not supported) #78

Closed
elbaza1 opened this issue Jul 12, 2019 · 8 comments

Comments

@elbaza1
Copy link

commented Jul 12, 2019

Link of the test case ::
https://www.csvjson.com/json2csv/df61580582fea1929d2c1ba50f5cfb8e

French Characters like 'é' are converted to 'é' for example.
I suggest allowing 'utf-8' on download, or writing the csv files as 'utf-8' in the api before downloading
Thanks

@DrorHarari

This comment has been minimized.

Copy link

commented Jul 12, 2019

The download is in UTF-8. The reason you see 'é' is because the program you use (likely Excel or Wordpad) does not automatically recognize the data is in UTF-8 so the bytes 0xC3, 0xA9 are interpreted as two characters rather than the single character 'é'. Some other programs such as Notepad actually do recognize the file as UTF-8 and show the data fine.

You can ask why not automatically insert at the top of the file the BOM character (0xEF, 0xBB, 0xBF) which will make Excel, Wordpad and many other software products recognize the file as UTF-8?

The answer is that many CSV file processors actually balk at the BOM sequence at the beginning of the file and include it in the data where it looks like a garbage character. That is not universally the case and that is why, it may be useful to have another checkbox "Include BOM" to let the user ask for a BOM to be added.

image

martindrapeau added a commit that referenced this issue Jul 16, 2019

@martindrapeau

This comment has been minimized.

Copy link
Owner

commented Jul 16, 2019

I've gone ahead and added the BOM character at the beginning as I believe most people use this to work in Excel afterwards. If this becomes an issue, I can add the option.

@elbaza1

This comment has been minimized.

Copy link
Author

commented Jul 16, 2019

@jmappala

This comment has been minimized.

Copy link

commented Jul 17, 2019

HI.... I am having an error now. It has been working more than week ago.

I saved an iNav SQL data extract into CSV, then used CVSJSON to convert into JSON.

Run K6 to do praalel run, and got this: level=error msg="SyntaxError: invalid character 'ï' looking for beginning of value at parse (native)"

@jmappala

This comment has been minimized.

Copy link

commented Jul 17, 2019

FYI.... I used http://www.convertcsv.com/csv-to-json.htm, and it worked.

@martindrapeau

This comment has been minimized.

Copy link
Owner

commented Jul 17, 2019

Could you try again @jmappala? Should be fixed now.

@DrorHarari

This comment has been minimized.

Copy link

commented Jul 17, 2019

Adding the BOM automatically can break apps that do not expect it - the BOM would be seen as the first character of the first column name. If the app expects a specific column name, it would not find it (unless it expects the BOM). Hence while adding the BOM by default for the expected Excel audience, it may be safer to allow the user to request a CSV without it (via that checkbox). @martindrapeau - I understand you are waiting for that hypothetical person to stand up 😉, ok.

@jmappala

This comment has been minimized.

Copy link

commented Jul 18, 2019

@martindrapeau, and it worked. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.