non-ascii file gives (Invalid or incomplete multibyte or wide character) even with locale set #34

Closed
simonmichael opened this Issue Apr 8, 2013 · 2 comments

Projects

None yet

1 participant

@simonmichael
Owner

Original author: simon@joyful.com (December 18, 2010 20:56:50)

Current hledger built with ghc >= 6.12 is supposed to use the configured locale to decode non-ascii data in journal files. We have found and documented some quirks with this (locale must be installed, locale name must be exact on some platforms) but T. Daucourt is still seeing problems with this:

$ locale -a
C
en_AG
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_NG
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8
fr_BE.utf8
fr_CA.utf8
fr_CH.utf8
fr_FR.utf8
fr_LU.utf8
POSIX
$ LANG=fr_FR.utf8 ~/hledger-0.13-linux-x86_64 --verbose --debug bal -f hledger_test_09
hledger-0.13-linux-x86_64: hledger_test_09: hGetContents: invalid argument (Invalid or incomplete multibyte or wide character)

I can't reproduce the problem. Apparently using the same 0.13 64-bit linux binary from http://hledger.org/DOWNLOADS.html , same ubuntu version, same locale and same journal file, it works for me:

$ locale -a
C
en_US.utf8
fr_BE.utf8
fr_CA.utf8
fr_CH.utf8
fr_FR.utf8
fr_LU.utf8
POSIX

$ LANG=fr_FR.utf8 bin/hledger-0.13-linux-x86_64 --verbose --debug bal -f tdaucourt-test-09.journal
(works)

Original issue: http://code.google.com/p/hledger/issues/detail?id=34

@simonmichael
Owner

From simon@joyful.com on December 18, 2010 21:15:17
More: in the above example, the test file is a simple utf8-encoded journal. A file containing a single utf8-encoded character should be enough to test this. I'd really like to understand why this fails for him, before I go fixing stuff.

However, it does seem that going back to requiring that input be utf8-encoded (like pandoc) is the right thing to do. I think I moved away from that only because it seemed "the GHC 6.12 way", but it seems locale-sensitivity basically creates fragility for users. If someone wants to add a --use-locale option later, I'd be happy with that.

@simonmichael
Owner

From simon@joyful.com on January 21, 2011 01:51:59
Fixed for 0.14, we now always read and write utf-8 regardless of locale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment