New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when using cpphs in some locale environments #6
Comments
I can't seem to reproduce the issue with the given steps. cpphs uses the standard Haskell/ghc System.IO.openFile, which I think trusts the underlying filesystem's metadata about the file's encoding? Certainly, setting LC_CTYPE does not seem to change its behaviour. $ LC_CTYPE=C ./cpphs Test.hs
#line 1 "Test.hs"
module Main where
main = putStrLn "∀" |
Using the
What do you get? |
The same. |
It seems you have no the
? |
$ locale -a |
I don't know whether the version of ghc might be relevant, but in case it is, I'm compiling cpphs with ghc-7.6.1 |
You have the
|
I think recent versions of GHC by default use the locale (or code page) to decide what encoding to use. |
A simple (system-dependent) test:
|
Perhaps you've set LC_ALL, which overrides LC_CTYPE. |
$ ghc --version |
I think I can close this issue, since it appears that neither cpphs nor ghc is at fault. |
Which operating system and shell are you using? |
Could you reproduce the issue running
? |
ghc-7.6.1 on MacOSX 10.7.5, with bash. |
Cannot reproduce the issue, even with LC_ALL=C. |
Did you mean |
Which is the output of
? |
$ locale # MacOSX The result is similar on Windows 7, except that the default is en_US.UTF-8 rather than en_GB.UTF-8. |
I just discussed this issue with a Mac user, and it seems as if the System.IO functions by default always use UTF-8 under MacOS, while the locale is ignored. Under Windows I guess that one can use chcp to trigger the problem. Perhaps GHC has used UTF-8 as the character encoding for source files since version 6.6 (which was released in 2006), so perhaps cpphs could also use this as the default. Note, however, that the GHC documentation states that "invalid UTF-8 sequences [are] ignored in comments, so it is possible to use other encodings such as Latin-1, as long as the non-comment source code is ASCII only". I've attached a patch that switches to UTF-8 everywhere (?) in cpphs, with two caveats:
I've used the base library's support for roundtripping to handle illegal characters. Feel free to base any changes on this patch. |
FYI, I reported here the different behaviour in Linux and Mac OS. |
Thanks for the patch Nils. I rolled something slightly different, to ensure that e.g. #included files also get the UTF8 encoding. I was not previously aware of the roundtripping style of TextEncoding, so that was a useful addition for me. |
Thanks for fixing the issue (tested on Agda). Could you release a new version, please. |
cpphs-1.20.2 released. |
The issue related to some locale environments (see malcolmwallace/cpphs#6) was fixed in cpphs 1.20.2.
The issue related to some locale environments (see malcolmwallace/cpphs#6) was fixed in cpphs 1.20.2.
Some Agda users have reported an error when installing Agda in their locale environments.
A MWE (adapted from this example) is the following:
@nad wrote here:
Blocking agda/agda#2112.
The text was updated successfully, but these errors were encountered: