Rosetta - universal decoder of an encoded flow to Unicode
If you want to handle a new encoding (like, hmmhmm, APL-ISO-IR-68...), you can
make a new issue - then, the process will be to make a new little library and
integrate it to
How to use it?
rosetta follows the same design as libraries used underlying. More precisely,
it follows the same API as uutf about encoding. This is a little example
to transform a latin1 flow to UTF-8:
let trans ic oc = let decoder = Rosetta.decoder (Rosetta.encoding_of_string "latin1") (`Channel ic) in let encoder = Uutf.encoder `UTF_8 (`Channel oc) in let rec go () = match Rosetta.decode decoder with | `Await -> assert false (* XXX(dinosaure): impossible when you use `String of `Channel as source. *) | `Uchar _ as uchar -> ignore @@ Uutf.encode encoder uchar ; go () | `End -> ignore @@ Uutf.encoder `End | `Malformed err -> failwith err in go () let () = trans stdin stdout
rosetta follows aliases availables into IANA character sets database:
Others aliases will raise an exception. This function is case-insensitive.
About translation tables
rosetta relies on underlying libraries such as
integrate translation tables provided by Unicode consortium. They should not be
updated - so we statically save them into an
rosetta supports only decoding to Unicode code-point. A support of encoding is
not on our plan where people should only use Unicode now. Deal with many
encodings is a pain and we should only produce something according to Unicode
than old encoding like latin1.