Currently, the BatSubstring module does not expose an API consistent with that of the other string modules or with idomatic OCaml. It uses size for length (although the Git head currently exposes length as an alias for size), and the get function has a different parameter order from standard OCaml modules. The calling conventions of functions such as extract and slice may also not be considered idomatic OCaml; I think that labeled arguments are probably preferable.
Therefore, I propose the following changes for 2.0:
If desired, the String signature could itself be an extension of an Array signature, implemented by BatArray and BatBigarray.Array1.
Agreed. The Rope module is different from the Vect module in primarily the ARRAY module it uses internally. Could be a functor, but that has runtime costs, which is I guess the reason. Maybe its requirements could serve as a base.
Additional features the String interface could support: converting between the defined type and standard string objects containing UTF-8, Latin-1, and current locale characters. How to handle this for the BatString module is unclear, however, as its coding is unstated.
Raw ocaml strings are just sequences of bytes - let's treat them as that. This makes converting from them to a string type a noop, and makes it impossible to automatically convert to UTF8/etc.
That sounds reasonable. I think it means that the standard String interface can't support to_utf8 and friends, unless we want to have the raw string interface raise a conversion error exception in all cases with those functions. Such an option feels like a violation (or at least abuse) of Liskov substitutability, but it may well be the best option if we want these kinds of functions to be pervasively available.
I'd rather not include to_utf8 in the proposed String interface. It's not useful to say a module implements some interface, but then have to add that it actually doesn't implement functions f, g, etc.