New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected encoding of libxlswriter #31

Open
jeroen opened this Issue Jan 9, 2019 · 5 comments

Comments

Projects
None yet
2 participants
@jeroen
Copy link
Collaborator

jeroen commented Jan 9, 2019

Lookup what encoding libxlsxwriter expects for strings.

@jeroen

This comment has been minimized.

Copy link
Collaborator

jeroen commented Jan 9, 2019

See https://libxlsxwriter.github.io/worksheet_8h.html:

Unicode strings are supported in UTF-8 encoding. This generally requires that your source file is UTF-8 encoded or that the data has been read from a UTF-8 source:
worksheet_write_string(worksheet, 0, 0, "Это фраза на русском!", NULL);

@ctbrown

This comment has been minimized.

Copy link

ctbrown commented Jan 9, 2019

See #30 (which was closed, perhaps in favor of this which addresses the same issue)

This file produces an unopenable xlsx file:

bad.txt

dn <- read_csv("bad.txt")
dn %>% writexl::write_xlsx("bad.xlsx")

This is presumably due to the encoding which should appear as "43 cm³". A call to iconv fixes the problem. It is probably advisable to call iconv( ... , to="UTF-8") on all character columns before creating the xlsx file.

System Info

packageVersion('writexl')
[1] ‘1.1’

R.version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.1
year 2017
month 06
day 30
svn rev 72865
language R
version.string R version 3.4.1 (2017-06-30)
nickname Single Candle

@jeroen

This comment has been minimized.

Copy link
Collaborator

jeroen commented Jan 9, 2019

First, you are using a really old version of R. Please update.

I think your problem appears not in writexl, but earlier on when you read the text file. Please print() the data before writing, so that we can see if the encoding was correct before exporting to xlsx.

Try the following code to read your text as Windows encoded:

library(readr)
library(writexl)
dn <- read_csv("bad.txt", locale = locale(encoding = 'ISO-8859-1'))
print(dn)
writexl::write_xlsx(dn, "bad.xlsx")

We cannot call iconv unconditionally as you suggest, because this will actually corrupt correctly encoding text. You just need to make sure you read in your data properly.

@ctbrown

This comment has been minimized.

Copy link

ctbrown commented Jan 9, 2019

@jeroen

This comment has been minimized.

Copy link
Collaborator

jeroen commented Jan 9, 2019

There really isn't any limitation on the writexl side. The strings gets written to the spreadsheet as you have it in your data frame. If print(dn) shows anything other than what you see in excel, then there would be a bug, but this is not the case.

The problem with plain text file (including csv) is that you don't know which encoding they are in. It looks like adobe exports it as 'ISO-8859-1' on Windows, but readr::read_csv() defaults to UTF-8. Hence special characters get converted wrong if you don't pass a locale to read_csv().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment