Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tweaks to unark for more robust parsing #19

Merged
merged 6 commits into from
Sep 26, 2018
Merged

tweaks to unark for more robust parsing #19

merged 6 commits into from
Sep 26, 2018

Conversation

cboettig
Copy link
Member

  • unark() will strip out non-compliant characters by default.

  • unark() is also be more flexible, allowing the user to specify the corresponding table names manually, rather than enforcing they correspond with the incoming csv names. #18

  • Technical tweak: readLines call inside unark() method will use encoding directly from getOption("encoding"), e.g. allowing encoding to be set to UTF-8.

This can resolve parsing errors when using the readr parser on certain files. See FAO.R example in examples for an illustration.

cc @noamross thanks for reporting these issues and maybe for testing this out too.

- `unark()` will strip out non-compliant characters by default.

- `unark()` is also be more flexible, allowing the user to specify the corresponding table names manually, rather than enforcing they correspond with the incoming csv names. [#18](#18)

- Technical tweak: readLines call inside `unark()` method will use encoding directly from `getOption("encoding")`, e.g. allowing encoding to be set to UTF-8.

This can resolve parsing errors when using the readr parser on certain files.  See `FAO.R` example in `examples` for an illustration.

cc @noamross
Though based on stringi:stri_enc_detect, encoding may actually be ISO-8859-2 instead of ISO-8859-1 (latin1)? Though that causes other parsing errors...
stringi's guess was correct, we just needed to use R's short name instead of the official encoding name in `options`.
also adds ability for unark to guess csv vs tsv.
@cboettig cboettig merged commit 04f353c into master Sep 26, 2018
@cboettig cboettig deleted the patch-tablename branch September 26, 2018 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant