Support for other inputenc than latin1 in ocamldoc -latex #7048
Original bug ID: 7048
I'm trying to use ocamldoc on files with utf-8 encoded characters. It seems however that ocamldoc -latex is hard-wired to latin1. Even when compiling with -noheader so that we can use our own inputenv, there is still an additional automatic expansion of latin1-encoded accentuated letters into LaTeX expressions of the form 'e. This virtually breaks any encoding other than latin-1 if it is using code >= 128 such as utf-8 (e.g. ? in utf-8 starts with â in latin1 - octal code 0342 -, and is hence translated into the 5 bytes ^a\0210\0205 instead of the 3 bytes \0342\0210\0205 where I use \0XXX to denote a byte in octal notation).
Contrastingly, "ocamldoc -html -charset utf-8" works fine with utf-8.
Setting it as blocking is of course subjective. It is blocking in using ocamldoc -latex on a non pure-ascii environment, which I believe is the norm nowadays, but we shall probably decide to live instead with only ocamldoc -html, since we anyway try to support compilation of Coq with version of OCaml which are not the most recent. We could also try to apply a translation backwards, though it is subtle to identify which ^a or so come from ocamldoc and which possibly come from the original source.
Steps to reproduce
Build foo.mli with utf-8 contents
(** ? ? ? : ? *)
ocamldoc -noheader -notrailer -latex foo.mli -o foo.tex
gives a file foo.tex where ? is faitfully translated but ? and ? are not.
The text was updated successfully, but these errors were encountered:
Comment author: herbelin
Apparently, Mantis does not support utf-8 either.
The sentence "e.g. ? in utf-8" should be read as "e.g. [unicode U+2205] in utf-8".
I uploaded the file foo.mli so that its contents is visible. Using LaTeX to express the non-ascii symbols, the sentence "gives a file foo.tex where ? is faitfully translated but ? and ? are not." should be read as "gives a file foo.tex where \Gamma is faitfully translated but \vdash and \emptyset are not."