Skip to content

Latest commit

 

History

History
14 lines (8 loc) · 608 Bytes

encoding.asciidoc

File metadata and controls

14 lines (8 loc) · 608 Bytes

The character encoding of GITenberg books should always be UTF-8 (aka Unicode). Project Gutenberg source files were often created before Unicode existed and need to be converted.

If you get an error like the following, you will need to do this step:

pandoc: Cannot decode byte '\xb0': Data.Text.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

To re-encode a file as UTF-8, you need to use the iconv commandline tool.

iconv -c -t "UTF-8" < input.html > output.asciidoc

For instance, if 164-h/164-h.htm