Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reveal encoding #11

Open
jennybc opened this issue May 30, 2016 · 2 comments
Open

Reveal encoding #11

jennybc opened this issue May 30, 2016 · 2 comments

Comments

@jennybc
Copy link
Member

jennybc commented May 30, 2016

This Twitter conversation reminded me of past pain I've had importing from Excel with unknown encoding. Maybe we could offer a little function that just exposes encoding info, even if someone goes on to import with a more conventional package. Reinforces the idea that one productive role for rexcel is for Excel diagnostics and troubleshooting xlsx import.

I note that when I migrated Gapminder data extraction from gdata to readxl, I was able to drop the explicit encoding specification. So some packages, presumably readxl among them, do figure this out for themselves, quietly.

@richfitz
Copy link
Member

Good idea. I guess the enron corpus will be pretty basic, but getting hold of some more complex test cases would be good.

@jennybc
Copy link
Member Author

jennybc commented May 30, 2016

readxl and googlesheets issues seem to have a steady trickle of spreadsheets from Russia 😬. Gapminder also has some non UTF-8 sheets. Maybe we should systematically download those and make the "Gapminder corpus"?

http://www.gapminder.org/data/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants