Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API change: Support different file encodings #376

Closed
lorenzwalthert opened this issue Mar 14, 2018 · 20 comments
Closed

API change: Support different file encodings #376

lorenzwalthert opened this issue Mar 14, 2018 · 20 comments

Comments

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Mar 14, 2018

Migrated from ##374 (comment).

@krlmlr Do you think we should support more file encodings? Is the package enc limited to utf8 and latin1? If not, we could add an argument file_encoding to the exported style_*() functions. As mentioned in #374, formatR seems to handle the problems outlined in #374 well.

@krlmlr
Copy link
Member

krlmlr commented Mar 21, 2018

I'd rather ask users to convert their files/packages to UTF-8.

@lorenzwalthert
Copy link
Collaborator Author

I agree that one should use UTF-8 in principal, but I am not sure whether styler is the right place to tell people what encoding they should use 😊
Let's see what a quick poll yields.

@krlmlr
Copy link
Member

krlmlr commented Mar 23, 2018

See checks in r-pkgs, second bullet point ;-)

This is about DESCRIPTION, which also affects the encoding used for source files AFAIK.

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Mar 23, 2018

Ok, thanks.
I could not check because I am on a Unix machine now but is the default file encoding with RStudio UTF8 or the native encoding?

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Mar 25, 2018

As a result of this poll, which is probably not entirely free of bias, only 60% of R users use utf8 for sure. Around 35% use some default.

@lorenzwalthert
Copy link
Collaborator Author

If the default of windows users is the native encoding (and hence most likely latin1 in most cases), I think we can't ignore this.

@krlmlr
Copy link
Member

krlmlr commented Mar 25, 2018

Yeah, or change that default ;-) Maybe we could supply another package that reencodes as UTF-8 first, and have styler detect non-UTF-8 encoding and direct users there? Could be a part of usethis, too -- can you please check if there is something related to this and perhaps open an issue if not?

@lorenzwalthert
Copy link
Collaborator Author

You mean changing the RStudio default encoding for new R Scripts? I mean for packages, I think it might be utf8 already, but we also have users that are not package developers. Is utf8 encoding a standard or planned to become one in the tidyverse?

@krlmlr
Copy link
Member

krlmlr commented Mar 25, 2018

We should teach users/package authors/... to use the One True Character Encoding. I forgot about user scripts, but these too should be UTF-8 in my opinion. That standard has been around long enough, it doesn't seem anything else is going to replace it anytime soon.

Windows users still won't be able to use characters from non-native locales (e.g., can't use a Chinese letter on a US-English Windows) due to limitations of source(), but reading characters representable in the own locale works, I just checked.

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Mar 25, 2018

We should teach users/package authors/...

That sounds like we should consider doing a PR to tidyverse/style, adding a rule to use utf8 throughout.

@krlmlr
Copy link
Member

krlmlr commented Mar 25, 2018

Sounds good. (And maybe help them convert their existing code.)

@lorenzwalthert
Copy link
Collaborator Author

I filed an issue in tidyverse/style. If we get approval there, I think I can open one in usethis for conversion.

@lorenzwalthert
Copy link
Collaborator Author

I think in the light of tidyverse/style#71 (comment), we should re-consider offering a file_encoding option to top-level stylers and pass it down to enc::transform_lines_enc(). 😊

@krlmlr
Copy link
Member

krlmlr commented Apr 4, 2018

I'm not convinced. Why?

@lorenzwalthert
Copy link
Collaborator Author

Because I don't see how we can end the encoding mess of computer science with styler -.- As far as R goes, the issue will most likely be there for a very long time in the future unless some key players in the R world change their mind and enforce ASCII or UTF8 or some standard. Then, I think it would be reasonable to only support this encoding. And since RStudios default encoding is not UTF8 on windows if you are not inside a project that is a package, we have a large group of R users that are excluded from using styler if we insist on UTF8.
In any case, I think we need to add a note to the help file that tells user that we (currently) only support UTF8.

@krlmlr
Copy link
Member

krlmlr commented Apr 15, 2018

ASCII is already somewhat enforced by R CMD check for packages: You get a note if you try to use UTF-8, even if it's declared. Comments may be a different story.

RStudio's default file encoding is UTF-8, also on Windows. Just checked.

@lorenzwalthert
Copy link
Collaborator Author

If we decide to stick to UTF-8, we should at least update help files accrodingly and maybe even give a warning if we detect (how?) a file is not utf8 encoded.

@lorenzwalthert
Copy link
Collaborator Author

RStudio's default file encoding is UTF-8, also on Windows. Just checked.

When I type glück <- 3 and save the R file in Rstudio (v1.1.383), I get:
unbenannt

So at least for this version, it's not UTF-8.

@lorenzwalthert
Copy link
Collaborator Author

Anyways, let's add a note in the help file that we currently only support UTF-8.

@lorenzwalthert
Copy link
Collaborator Author

Final decision. UTF8 only as per https://yihui.name/en/2018/11/biggest-regret-knitr/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants