Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Support for other inputenc than latin1 in ocamldoc -latex #7048
Original bug ID: 7048
I'm trying to use ocamldoc on files with utf-8 encoded characters. It seems however that ocamldoc -latex is hard-wired to latin1. Even when compiling with -noheader so that we can use our own inputenv, there is still an additional automatic expansion of latin1-encoded accentuated letters into LaTeX expressions of the form 'e. This virtually breaks any encoding other than latin-1 if it is using code >= 128 such as utf-8 (e.g. ? in utf-8 starts with â in latin1 - octal code 0342 -, and is hence translated into the 5 bytes ^a\0210\0205 instead of the 3 bytes \0342\0210\0205 where I use \0XXX to denote a byte in octal notation).
Contrastingly, "ocamldoc -html -charset utf-8" works fine with utf-8.
Setting it as blocking is of course subjective. It is blocking in using ocamldoc -latex on a non pure-ascii environment, which I believe is the norm nowadays, but we shall probably decide to live instead with only ocamldoc -html, since we anyway try to support compilation of Coq with version of OCaml which are not the most recent. We could also try to apply a translation backwards, though it is subtle to identify which ^a or so come from ocamldoc and which possibly come from the original source.
Steps to reproduce
Build foo.mli with utf-8 contents
(** ? ? ? : ? *)
ocamldoc -noheader -notrailer -latex foo.mli -o foo.tex
gives a file foo.tex where ? is faitfully translated but ? and ? are not.
Comment author: herbelin
Apparently, Mantis does not support utf-8 either.
The sentence "e.g. ? in utf-8" should be read as "e.g. [unicode U+2205] in utf-8".
I uploaded the file foo.mli so that its contents is visible. Using LaTeX to express the non-ascii symbols, the sentence "gives a file foo.tex where ? is faitfully translated but ? and ? are not." should be read as "gives a file foo.tex where \Gamma is faitfully translated but \vdash and \emptyset are not."