Original bug ID: 7048 Reporter: herbelin Status: resolved (set by @xavierleroy on 2017-10-09T17:22:50Z) Resolution: fixed Priority: normal Severity: minor Version: 4.02.3 Fixed in version: 4.06.0 +dev/beta1/beta2/rc1 Category: ocamldoc Monitored by:@gasche
Bug description
Hi,
I'm trying to use ocamldoc on files with utf-8 encoded characters. It seems however that ocamldoc -latex is hard-wired to latin1. Even when compiling with -noheader so that we can use our own inputenv, there is still an additional automatic expansion of latin1-encoded accentuated letters into LaTeX expressions of the form 'e. This virtually breaks any encoding other than latin-1 if it is using code >= 128 such as utf-8 (e.g. ? in utf-8 starts with â in latin1 - octal code 0342 -, and is hence translated into the 5 bytes ^a\0210\0205 instead of the 3 bytes \0342\0210\0205 where I use \0XXX to denote a byte in octal notation).
Contrastingly, "ocamldoc -html -charset utf-8" works fine with utf-8.
Setting it as blocking is of course subjective. It is blocking in using ocamldoc -latex on a non pure-ascii environment, which I believe is the norm nowadays, but we shall probably decide to live instead with only ocamldoc -html, since we anyway try to support compilation of Coq with version of OCaml which are not the most recent. We could also try to apply a translation backwards, though it is subtle to identify which ^a or so come from ocamldoc and which possibly come from the original source.
The sentence "e.g. ? in utf-8" should be read as "e.g. [unicode U+2205] in utf-8".
I uploaded the file foo.mli so that its contents is visible. Using LaTeX to express the non-ascii symbols, the sentence "gives a file foo.tex where ? is faitfully translated but ? and ? are not." should be read as "gives a file foo.tex where \Gamma is faitfully translated but \vdash and \emptyset are not."
Original bug ID: 7048
Reporter: herbelin
Status: resolved (set by @xavierleroy on 2017-10-09T17:22:50Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 4.02.3
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: ocamldoc
Monitored by: @gasche
Bug description
Hi,
I'm trying to use ocamldoc on files with utf-8 encoded characters. It seems however that ocamldoc -latex is hard-wired to latin1. Even when compiling with -noheader so that we can use our own inputenv, there is still an additional automatic expansion of latin1-encoded accentuated letters into LaTeX expressions of the form 'e. This virtually breaks any encoding other than latin-1 if it is using code >= 128 such as utf-8 (e.g. ? in utf-8 starts with â in latin1 - octal code 0342 -, and is hence translated into the 5 bytes ^a\0210\0205 instead of the 3 bytes \0342\0210\0205 where I use \0XXX to denote a byte in octal notation).
Contrastingly, "ocamldoc -html -charset utf-8" works fine with utf-8.
Setting it as blocking is of course subjective. It is blocking in using ocamldoc -latex on a non pure-ascii environment, which I believe is the norm nowadays, but we shall probably decide to live instead with only ocamldoc -html, since we anyway try to support compilation of Coq with version of OCaml which are not the most recent. We could also try to apply a translation backwards, though it is subtle to identify which ^a or so come from ocamldoc and which possibly come from the original source.
Best,
Hugo
Steps to reproduce
Build foo.mli with utf-8 contents
(** ? ? ? : ? *)
Then
ocamldoc -noheader -notrailer -latex foo.mli -o foo.tex
gives a file foo.tex where ? is faitfully translated but ? and ? are not.
File attachments
The text was updated successfully, but these errors were encountered: