Make string (particuarly BatSubstring) modules consistent #62

Open
mdekstrand opened this Issue Jun 27, 2010 · 5 comments

3 participants

@mdekstrand

Currently, the BatSubstring module does not expose an API consistent with that of the other string modules or with idomatic OCaml. It uses size for length (although the Git head currently exposes length as an alias for size), and the get function has a different parameter order from standard OCaml modules. The calling conventions of functions such as extract and slice may also not be considered idomatic OCaml; I think that labeled arguments are probably preferable.

Therefore, I propose the following changes for 2.0:

  • Introduce a String signature somewhere, likely in BatInterfaces, defining standard string operations. This should be as compatible as possible with the standard library's String module, and probably include many of the enhancements inherited from ExtLib. Including slicing/substring functions that use labeled, optional arguments for convenience would likely be worthwhile.
  • Make BatString, BatRope, BatSubstring, and probably BatUTF8 all conform to this interface for their common functionality.
  • Evaluate the other BatSubstring functions to make them consistent with the style and flavor of the standard library and the rest of Batteries (at this point, this becomes largely a separate issue).

If desired, the String signature could itself be an extension of an Array signature, implemented by BatArray and BatBigarray.Array1.

@thelema
ocaml-batteries-team member

Agreed. The Rope module is different from the Vect module in primarily the ARRAY module it uses internally. Could be a functor, but that has runtime costs, which is I guess the reason. Maybe its requirements could serve as a base.

@mdekstrand

Additional features the String interface could support: converting between the defined type and standard string objects containing UTF-8, Latin-1, and current locale characters. How to handle this for the BatString module is unclear, however, as its coding is unstated.

@thelema
ocaml-batteries-team member

Raw ocaml strings are just sequences of bytes - let's treat them as that. This makes converting from them to a string type a noop, and makes it impossible to automatically convert to UTF8/etc.

@mdekstrand

That sounds reasonable. I think it means that the standard String interface can't support to_utf8 and friends, unless we want to have the raw string interface raise a conversion error exception in all cases with those functions. Such an option feels like a violation (or at least abuse) of Liskov substitutability, but it may well be the best option if we want these kinds of functions to be pervasively available.

@agarwal
ocaml-batteries-team member

I'd rather not include to_utf8 in the proposed String interface. It's not useful to say a module implements some interface, but then have to add that it actually doesn't implement functions f, g, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment