Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupt file header when dealing with Unicode normalization issues #4920

Open
snoyberg opened this issue Nov 30, 2017 · 4 comments
Open

Corrupt file header when dealing with Unicode normalization issues #4920

snoyberg opened this issue Nov 30, 2017 · 4 comments

Comments

@snoyberg
Copy link
Collaborator

First, the repro, then the background. This likely only repros on OS X (explained below). Place the following files in a directory:

Setup.hs:

import Distribution.Simple
main = defaultMain

package.cabal:

name:                ば日本-4本
version:             0.1.0.0
build-type:          Simple
cabal-version:       >=1.10

library
  exposed-modules:     Lib
  build-depends:       base >= 4.7 && < 5
  default-language:    Haskell2010

Then run the following series of commands:

bash-4.4$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.2.2
bash-4.4$ ghc-pkg list Cabal
/Users/michael/.stack/programs/x86_64-osx/ghc-8.2.2/lib/ghc-8.2.2/package.conf.d
    Cabal-2.0.1.0
bash-4.4$ runghc Setup.hs configure --user
Configuring ば日本-4本-0.1.0.0...
bash-4.4$ runghc Setup.hs build
Saved package config file header is corrupt. Re-run the 'configure' command.

Expected: Builds the package correctly

Actual: complains repeatedly about corrupt file header. Re-running 'configure' does not help.

Background

This popped up when debugging a failing integration test in Stack. It turns out that this specific name for a package has a long history on the Stack side, since (on OS X) it appears that some Unicode normalization is applied to filenames, therefore making the sequence of code points stored in the cabal file mismatch the sequence returned by the OS from the generated file name. For a lot more information, see these issues:

I'm guessing that a similar file name codepoint modification is occurring inside the dist directory.

@hvr
Copy link
Member

hvr commented Dec 1, 2017

Sounds like a duplicate of #2557 to me

@Blaisorblade
Copy link
Collaborator

It looks like a dup indeed, apparently not limited to OS X. Please do test on OSX though; since NFD filename normalization might cause similar problems to what we’ve seen in Stack, and commercialhaskell/stack#1337 shows at least some would like Unicode package names.

@hvr
Copy link
Member

hvr commented Dec 1, 2017

@Blaisorblade Well, I'm very keen on getting Cabal Unicode-proper, to the extent that the underlying operating systems allows this... but once I fix #2557 I'll clearly have to check whether less transparent OS fileystems APIs such as Win32 or OSX need some OS-specific quirks... :-)

@Nolrai
Copy link

Nolrai commented May 23, 2019

@hvr Did you look at this after fixing #2557?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants