Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix corrupted config file header for non-ASCII package names #5804

Merged
merged 2 commits into from
Mar 4, 2019

Conversation

hvr
Copy link
Member

@hvr hvr commented Dec 17, 2018

The config-state header is a human readable line prepended to the
binary serialisation which looks like

Saved package config for pkgname-1.2.3 written by Cabal-2.5.0.0 using ghc-8.6

However, the functions generating and parsing this header didn't take into
account that package names are not limited to the ASCII subset and blindly used
the ByteString pack function which truncates away the high bits of the Char
code point resulting in a corrupted header with a non-sensical package-name.

The fix is simply to serialise the package-name with the UTF-8 encoding which
works nicely with the rest of the UTF-8 unaware string handling functions.
Hence the fix is a lot shorter than this commit message.

Fixes #2557

@hvr
Copy link
Member Author

hvr commented Dec 17, 2018

A test-case would be nice to have but I ran out of time over the weekend

@edsko can you verify whether this also fixes your issue?

@edsko
Copy link
Contributor

edsko commented Dec 17, 2018

@hvr I just checked. It does not. The first time I build the package it works, but if I then make a change and recompile again I get

 [hvr-pr-issue2557 8.6.3 ~/personal/doc/calligraphy/QuerySFZD]
# cabal new-build all
Build profile: -w ghc-8.6.3 -O1
In order, the following will be built (use -v for more details):
 - Query书法字典-0.1.0.0 (exe:QuerySFZD) (file src/QuerySFZD/API/Ours/Results.hs changed)
/home/edsko/personal/doc/calligraphy/QuerySFZD/dist-newstyle/build/x86_64-linux/ghc-8.6.3/Query书法字典-0.1.0.0/x/QuerySFZD/build/QuerySFZD/autogen/Paths_Query书法字典.hs: hGetContents: invalid argument (invalid byte sequence)

@hvr
Copy link
Member Author

hvr commented Dec 17, 2018

@edsko would it be possible for you to create a small (or not so small -- whatever's easier) repro-case?

@edsko
Copy link
Contributor

edsko commented Dec 17, 2018

Sure, it's not hard. Create a new project that has some unicode in the name, compile, then recompile. The attached will do the trick. Compile, make a modification to Main.hs (insert blank line), recompile.

testcase.zip

@hvr
Copy link
Member Author

hvr commented Dec 17, 2018

@edsko I see the problem now... yet another place where we're using the wrong char encoding... fix on the way...

@hvr
Copy link
Member Author

hvr commented Dec 17, 2018

@edsko I added a fix for your observed problem; you should be good as long as you don't enable the standard CPP in modules of components having a unicode package as direct dep :-)

(if you need CPP, you're gonna require a custom cpp that can deal with unicode code-points in cpp tokens for now...)

@edsko
Copy link
Contributor

edsko commented Dec 18, 2018

I tried to confirm, but couldn't. I probably did something wrong. No matter how much I removed, and rebuilt, problem still persists. So I am at commit 02fc719a27d28899d492be7b83dc868335fbb963, then eventually cleaned my checkout by running git clean -dfX from the repo root, removing all generated files, removed ~/.ghc for good measure, rebuilt using the bootstrap script, but no go. Problem still there..

@hvr
Copy link
Member Author

hvr commented Dec 18, 2018

@edsko here's how it behaves for me with a cabal exe built from 02fc719

/tmp/X$ ls -l
total 12
-rw-r--r-- 1 hvr hvr  68 Dec 17 15:59 Main.hs
-rw-r--r-- 1 hvr hvr 726 Dec 17 15:57 Query书法字典.cabal
-rw-rw-r-- 1 hvr hvr 824 Dec 17 15:59 testcase.zip

/tmp/X$ ../cabal new-build -w ghc-8.6.3
Resolving dependencies...
Build profile: -w ghc-8.6.3 -O1
In order, the following will be built (use -v for more details):
 - Query书法字典-0.1.0.0 (exe:Query书法字典) (first run)
Configuring executable 'Query书法字典' for Query书法字典-0.1.0.0..
Warning: The 'license-file' field refers to the file 'LICENSE' which does not
exist.
Preprocessing executable 'Query书法字典' for Query书法字典-0.1.0.0..
Building executable 'Query书法字典' for Query书法字典-0.1.0.0..
[1 of 1] Compiling Main             ( Main.hs, /tmp/X/dist-newstyle/build/x86_64-linux/ghc-8.6.3/Query书法字典-0.1.0.0/x/Query书法字典/build/Query书法字典/Query书法字典-tmp/Main.o )
Linking /tmp/X/dist-newstyle/build/x86_64-linux/ghc-8.6.3/Query书法字典-0.1.0.0/x/Query书法字典/build/Query书法字典/Query书法字典 ...

/tmp/X$ ../cabal new-build -w ghc-8.6.3
Up to date

/tmp/X$ echo "-- ..." >> Main.hs 

/tmp/X$ ../cabal new-build -w ghc-8.6.3
Build profile: -w ghc-8.6.3 -O1
In order, the following will be built (use -v for more details):
 - Query书法字典-0.1.0.0 (exe:Query书法字典) (file Main.hs changed)
Preprocessing executable 'Query书法字典' for Query书法字典-0.1.0.0..
Building executable 'Query书法字典' for Query书法字典-0.1.0.0..
[1 of 1] Compiling Main             ( Main.hs, /tmp/X/dist-newstyle/build/x86_64-linux/ghc-8.6.3/Query书法字典-0.1.0.0/x/Query书法字典/build/Query书法字典/Query书法字典-tmp/Main.o )
Linking /tmp/X/dist-newstyle/build/x86_64-linux/ghc-8.6.3/Query书法字典-0.1.0.0/x/Query书法字典/build/Query书法字典/Query书法字典 ...

@edsko
Copy link
Contributor

edsko commented Dec 18, 2018

Any suggestions for what I should try?

@hvr
Copy link
Member Author

hvr commented Dec 18, 2018

@edsko what OS are you on, and what's your current locale (e.g. I used LANG=en_US.UTF-8 )? And what's the exact error you're getting?

@edsko
Copy link
Contributor

edsko commented Dec 18, 2018

[hvr-pr-issue2557 8.6.3 ~/t]
# uname -a
Linux edsko-passive 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

[hvr-pr-issue2557 8.6.3 ~/t]
# locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=nl_NL.UTF-8
LC_TIME=nl_NL.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=nl_NL.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=nl_NL.UTF-8
LC_NAME=nl_NL.UTF-8
LC_ADDRESS=nl_NL.UTF-8
LC_TELEPHONE=nl_NL.UTF-8
LC_MEASUREMENT=nl_NL.UTF-8
LC_IDENTIFICATION=nl_NL.UTF-8
LC_ALL=

[hvr-pr-issue2557 8.6.3 ~/t]
# cabal new-build all
Build profile: -w ghc-8.6.3 -O1
In order, the following will be built (use -v for more details):
 - Query书法字典-0.1.0.0 (exe:Query书法字典) (file Main.hs changed)
/home/edsko/t/dist-newstyle/build/x86_64-linux/ghc-8.6.3/Query书法字典-0.1.0.0/x/Query书法字典/build/Query书法字典/autogen/Paths_Query书法字典.hs: hGetContents: invalid argument (invalid byte sequence)

@hvr
Copy link
Member Author

hvr commented Dec 18, 2018

@edsko look into the Paths_Query书法字典.hs file; is it a valid UTF-8 file? or is it maybe some left-over written by a different cabal exe? I.e. did you clean dist-newstyle ?

@edsko
Copy link
Contributor

edsko commented Dec 18, 2018

Sorry, I'm being stupid. I had copied the recompiled binary to the wrong place, rather than overriding the existing one 🤦‍♂️ I can confirm it works :)

@hvr
Copy link
Member Author

hvr commented Dec 18, 2018

@edsko in any case, I think we need a section in the user's guide to document all gotchas re Unicode support (issues with standard CPP processors, requirements for filesystem and locale, etc). But otherwise I think this PR makes unicode support in package names usable -- unless you run into yet other issues :-).

@edsko
Copy link
Contributor

edsko commented Dec 18, 2018

Yup, agreed on both counts. Nice work :)

Next up: fix syntax highlighting in my editor, which gets equally confused by unicode :D

@hvr hvr added this to the 3.0 milestone Jan 16, 2019
hvr added 2 commits March 3, 2019 23:55
The config-state header is a human readable line prepended to the
binary serialisation which looks like

    Saved package config for pkgname-1.2.3 written by Cabal-2.5.0.0 using ghc-8.6

However, the functions generating and parsing this header didn't take into
account that package names are not limited to the ASCII subset and blindly used
the ByteString `pack` function which truncates away the high bits of the `Char`
code point resulting in a corrupted header with a non-sensical package-name.

The fix is simply to serialise the package-name with the UTF-8 encoding which
works nicely with the rest of the UTF-8 unaware string handling functions.
Hence the fix is a lot shorter than this commit message.

Fixes haskell#2557
…encoding

This takes care of knock-off effects of haskell#2557

Specifically, the `Paths_*.hs` and `cabal_macros.h` files would result being incorrectly
by a `rewriteFileEx` which isn't UTF-8 capable.

Now the `cabal_macros.h` file is written out exactly like the `.h` file generated
internally by `ghc` is generated; note however that standard CPP doesn't support
non-ASCII characters in CPP symbols and will thus not work with a standard CPP
preprocessor.
@hvr hvr merged commit 1907a08 into haskell:master Mar 4, 2019
@hvr hvr deleted the pr/issue-2557 branch March 4, 2019 10:07
@hvr
Copy link
Member Author

hvr commented Mar 4, 2019

I've tried writing a test-suite but were blocked due to #5921 -- a testcase shall be provided as part of #5921

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cabal config corrupted when using Unicode
4 participants