The files in the test directory with unicode names are using codepoints which combine the base character and accent in a single codepoint when fetched from git, which matches what the test scripts expect.
When the tar ball from the npm registry is used however the filenames have separate combining characters for the accents, which mean the tests fail because the names do not match.
As an example, consider tests/data/Clément which should be encoded as:
but in the npm tarball is encoded as:
any idea how to re-encode correctly? This has baffled me. I might just remove them...
Well I don't really know anything about how the tar balls in the npm registry are created. I'm guessing that there is an npm command that creates and uploads them? Presumably that is using a node implementation of tar rather than the normal tar command, and that implementation is broken in it's handling of filename encodings...
A quick fix might be to change the names of the files and then have a test script that renames them before running the tests?
not seen this problem for a while I'm assuming something was fixed upstream in npm.