Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String: change order of arguments in starts_with and ends_with #10105

Closed
wants to merge 1 commit into from

Conversation

ygrek
Copy link
Contributor

@ygrek ygrek commented Dec 30, 2020

to match extlib.
ref #9533 (comment)
This change will make life for extlib maintainer (aka me) much easier and will help prevent bugs for people used to extlib String module

@xavierleroy
Copy link
Contributor

I'm not enthusiastic about Extlib's choice of order of arguments, but I agree we have a bit of a problem here.

Should we revert the addition of starts_with and ends_with from 4.12? To leave time for discussion.

@dra27
Copy link
Member

dra27 commented Dec 30, 2020

Just for the proverbial record, this was discussed in the original PR - see #9533 (review)

@dbuenzli
Copy link
Contributor

dbuenzli commented Dec 30, 2020

@ygrek honestly, unlabeled, you are only going to confuse readers that are not familiar with extlib's idea of reading this function.

Almost every startsWith function you'll find out there in other languages syntactically take the prefix as the first argument after startsWith for the simple reason that this is the natural way to read it.

From a cost/benefit perspective I don't think it's a good idea to have to explain for the next 20 years that an unlabeled String.starts_with is in the wrong reading order because there exists a library called extlib that got that wrong.

If you want to provide a bugless migration path towards stdlib's signature I suggest either 1) you break the signature in extlib by any means to make users aware of the problem or 2) you deprecate it with a suitable alert that explains the situation.

@ygrek
Copy link
Contributor Author

ygrek commented Dec 31, 2020

honestly, unlabeled, you are only going to confuse readers that are not familiar with extlib's idea of reading this function.

extlib as a library is older than my programming career, let alone my ocaml experience, so I am trying to maintain it without breaking backward compatibility.

I was thinking that maybe stdlib starts_with being implemented with a labeled argument would make it less sensitive to the order of arguments.

@Octachron
Copy link
Member

I tend to agree with @dbuenzli : committing to backward compatibility with all standard library extensions feels hardly sustainable in the long term.

And if I am not mistaken, this compatibility issue would only be a concern for users that switch to extlib after the 4.12 release, and then only if they are not using the omitted labels warning (which is is enabled by default in dune). Overall, it seems that this unfortunate compatibility issue will not sneak silently upon a large part of users.

@gasche
Copy link
Member

gasche commented Jan 4, 2021

Wasn't there another suggestion for those functions to change the label? I would be in favor of playing it safe and not exposing them in 4.12 so that we can reach a better consensus by 4.13.

@dbuenzli
Copy link
Contributor

dbuenzli commented Jan 4, 2021

Wasn't there another suggestion for those functions to change the label? I would be in favor of playing it safe and not exposing them in 4.12 so that we can reach a better consensus by 4.13.

There was here. Do you want to reopen that discussion ? As far as I'm concerned I would be glad if we went for what I proposed there but I'm too tired to argue for it, it seems difficult to make people step back and look a bit further.

At a certain point we need to move on. I already changed here and there some APIs to align to the stdlib. I would be glad if we could at least settle on something now (regardless of whether the functions make it to 4.12) so that I can stop the back and forth.

@dra27
Copy link
Member

dra27 commented Jan 4, 2021

I agree that we should decide now - 4.12 hasn't yet flipped from alpha to beta (even if it will be soon) and I tend to @Octachron's opinion, too. I don't the extra time gained delaying the decision to 4.13 will add anything!

@gasche
Copy link
Member

gasche commented Jan 4, 2021

I propose to do the following:

  • we try to reach a consensus in the next few days
  • if no consensus emerges, we schedule a meeting with interested people to discuss it this week
  • if that also fails, I will campaign to remove the functions from 4.12

I created an open, shared document at https://codimd.math.cnrs.fr/os79Xe-dRU-J5laJf6qoiw to try to condense the arguments in various directions in a single place.

@gasche
Copy link
Member

gasche commented Jan 4, 2021

On argument order:

  • Labels are supposed to remove annoying questions of argument order. Indeed, if you do use labels for the function, then String.starts_with str ~sub and String.starts_with ~sub str are both valid in the current status. (They would also both work if the function was declared as string -> sub:string -> bool.)

  • We are discouraging unlabelled applications, so I am not sure that it would be a big deal to change the declaration order: (labeling) users can use any order they want, and the labeling greatly reduces code-readability concerns. In particular, we could decide to make an argument-order choice motivated by backwards compatibility, given how little it matters in practice.

  • Personally when I see <relation> x y where <relation> x y does not read naturally, but x <predicate> y does, I assume that the latter is what is meant. For example, I read less_than x y as x less than y. There is another opposite tradition which is to read (less_than x) as the predicate (? is less than x), and therefore less_than x y as y is less than x. Both make sense so we cannot make everyone happy here. (But then, it should not matter thanks to labeling.)

Some extra data that would be useful:

  • what are precisely the prototypes for those functions in Base, Batteries, Containers and Extlib ?
  • in terms of reading order for <relation> x y, is the stdlib currently making a consistent choice on this question?

All this being said, my preference would be to make sure we like the label name, consistently encourage its use, and as a result consider that this unsolvable tension for parameter-declaration order does not matter much, and (as a gift of kindness on this new year) make the choice that is consistent with those standard-library-extensions that use the same function names -- if a unique such choice exists.

@yallop
Copy link
Member

yallop commented Jan 4, 2021

All this being said, my preference would be to make sure we like the label name, consistently encourage its use,

I agree with this, and I'd like to suggest a slightly stronger proposal for resolving this question (and similar questions that might arise in the future): we should add support for attaching the "labels-omitted" warning to a declaration, so that any unlabeled use of the annotated function triggers a warning:

val starts_with : prefix:string -> string -> bool
   [@@ocaml.warning "+6"]

With this change there's no longer a need to make the signature of the standard library function the same as the extlib version, since if unlabeled applications are detected then there's no possibility of silent bugs.

@gasche
Copy link
Member

gasche commented Jan 4, 2021

@ygrek do you have a concrete example of Extlib-using code that gets in trouble because of the new stdlib export, and how a different declaration-side argument order would make things different?

(If I remember correctly extlib shadows stdlib module with additional definitions, so I suppose that you now have to change the signature in ExtString to match the stdlib's, to avoid being cursed by users. Do you have a concrete example where this change in problematic?)

@Octachron Octachron added this to the 4.12 milestone Jan 4, 2021
@ygrek
Copy link
Contributor Author

ygrek commented Jan 4, 2021

Current signature in extlib:

  val ends_with : string → string → bool
  (** [ends_with s x] returns true if the string [s] is ending with [x]. *)

  val starts_with : string -> string -> bool
  (** [starts_with s x] return true if [s] is starting with [x]. *)

Example of code using it : https://github.com/search?q=extlib+starts_with+language%3Aocaml&type=Code

I agree with @gasche reasoning about order and labels (that's what was my thinking behind this PR).

committing to backward compatibility with all standard library extensions feels hardly sustainable in the long term.

It is about committing to backward compatibility with existing code.
Of course everything works well for people who start using ocaml with 4.12 with dune with fresh code and no tech debt.
For N years the (understandable given the circumstances) consensus was that standard library extensions are expected to exist out there and provide all the functionality missing from stdlib. Since then there is a heap of code (some hidden, some open) written with that in mind. Now the direction changes to extend standard library. I am saying that people looking at existing code (see link above) and deciding to, say, throw away extlib - will remove open ExtLib from top of module and get hidden bugs. The question is - whether it is a big concern or not.

@gasche
Copy link
Member

gasche commented Jan 4, 2021

Yeah so the problem is with code like:

open Extlib

let is_prompt line = String.starts_with line "#"

if you remove open Extlib, the code remains type-correct but its meaning changes (it gets a bug).
There could be a Warning 6 [labels-omitted], but this warning is not enabled by default, except for Dune users.

My take-away would be that:

  • We should probably enable warning 6 [labels-omitted] by default for 4.13.
  • The feature that Jeremy mentioned, to selectively enable warnings on certain declarations, would be nice.
  • In the meantime, I think that either "use extlib argument order" or "hide the functions from Stdlib until 4.13" would be improvements over the current state.

@Octachron
Copy link
Member

Removing an open Extlib means that the code is being updated.
This is not a backward compatibility concern anymore at this point.
An alternative would be for extlib to document the migration process from extlib to the standard library. For instance, extlib could advise to turn on the omitted labels warning at the start of the migration process.

For those specific functions using them without labels is probably error-prone enough that it may make sense to stick with the extlib convention for kindness's sake. However, it seems quite possible than such migration issue rears his head against in the future without such a cop-out solution and we would be stuck to the local extrema of "kind of following extlib convention but only up to the point where this was no longer possible".

@bschommer
Copy link
Contributor

I'm not sure what the way to go here is. Given that there have already been some discussions about the argument order and name I think we won't find I conclusion that satisfies everyone. I personally would go with the existing order and wait with the inclusion of the functions until either #10118 is merged or the proposal of @yallop would be implemented, which I personally would prefer more.

@gasche
Copy link
Member

gasche commented Jan 6, 2021

The proposal of @yallop requires some non-obvious design and implementation, so it's very clear that it would not be available until at least 4.13. I'm less sure about #10118 (my proposal to enable the warning by default), I don't know if we could sneak it into the 4.12 release at this point -- this would probably require opam-wide testing. Either of those changes could fix the Extlib issue (if we use the current order) or the other-people issue (if we use Extlib order), and it also makes sense to try to have both.

I believe that once one of those changes are applied, it probably does not matter which order we use, so we can as well stick to the current order which seems to be slightly more popular than the Extlib order.

I believe that this leaves us with the following choice:

  1. change the order to Extlib order in 4.12
  2. keep the functions hidden in 4.12, and wait until labelled-application is forced to include them

I think that (2) is the safer default choice (it does not assume extra worry for the release-management team). @dbuenzli pointed out (understandably) that he would be unhappy about it, but then he would also be unhappy about (1) so maybe that's life.

I will submit a PR to implement (2), and also a PR to change the labels to sub because I think it's better.

@xavierleroy
Copy link
Contributor

My own reading of the discussion is that the incompatibility with Extlib is minor and no changes are required in the 4.12 release.

At any rate, the only two reasonable choices are 1- go ahead, and 2- remove the new functions from 4.12. It is way too late in the 4.12 release cycle to slip in new features like #10118.

@dra27
Copy link
Member

dra27 commented Jan 6, 2021

It is way too late in the 4.12 release cycle to slip in new features like #10118.

Agreed. I’d add though that we should never be afraid to add or enable warnings - packages in opam which break because we add or turn on a warning are themselves already broken.

@gasche
Copy link
Member

gasche commented Jan 6, 2021

Regarding #10118 and 4.12: noted, there will be no attempt to have it in 4.12. Could you also consider giving your opinion there as to whether the change is actually a good idea? I would be happy to have it in trunk (regardless of the outcome of the String helpers discussion).

@gasche
Copy link
Member

gasche commented Jan 6, 2021

Regarding extlib: I don't think that the compatibility issue is as minor as you are saying. Extlib is used by a lot of packages out there, and I can very well see bugs happening because of the argument-order mismatch. (There is a correlation between "using extlib" and "larger codebase" and "not having migrated to Dune yet", an effective cocktail to make this bug likely.) I think this is a compatibility issue (even if the changes does not silently break existing code) that should be taken seriously.

@damiendoligez
Copy link
Member

I agree with @gasche: we should definitely enable warning 6 for 4.13 and delay the introduction of these functions until then.

IMO the problem is not so much backward compatibility but the possibility of introducing hidden bugs in existing code.

@Octachron
Copy link
Member

It is true that it is possible to trigger a hidden bug with code that amounts to

open Extlib.String
open Stdlib.String
let bug = starts_with "prefix_rest" "prefix"

After some pondering, I kind of like the idea of enabling the warning for omitted labels in order to free us from the ghosts of alternative argument orderings.

@dbuenzli
Copy link
Contributor

dbuenzli commented Jan 6, 2021

I think that (2) is the safer default choice (it does not assume extra worry for the release-management team). @dbuenzli pointed out (understandably) that he would be unhappy about it, but then he would also be unhappy about (1) so maybe that's life.

Just for the record I'm not unhappy about 2 and would be more unhappy about 1. The only thing I was asking is to have a decision on the actual signatures now as I have some APIs that match them.

@gasche
Copy link
Member

gasche commented Jan 7, 2021

Random fun fact (as I'm preparing PRs to implement my proposals): git grep starts_with in the compiler repository ends with the following hits:

tools/ocamlmklib.ml:let starts_with s pref =
tools/ocamlmklib.ml:    else if starts_with s "-l" then
tools/ocamlmklib.ml:    else if starts_with s "-L" then
tools/ocamlmklib.ml:    else if starts_with s "-R" then
tools/ocamlmklib.ml:      if starts_with a "-Wl,"
tools/ocamlmklib.ml:    else if starts_with s "-Wl,-rpath," then
tools/ocamlmklib.ml:    else if starts_with s "-Wl,-R" then
tools/ocamlmklib.ml:    else if starts_with s "-F" then
tools/ocamlmklib.ml:    else if starts_with s "-" then

looks like Extlib is not the only place where the prefix is passed as a second parameter...

@gasche
Copy link
Member

gasche commented Jan 7, 2021

(Obviously the prefix of starts_with should be the first parameter, because it's on the left of the string in code-reading order, and the suffix of ends_with should be the second parameter, it's on the right of the string.)

@ygrek
Copy link
Contributor Author

ygrek commented Jan 7, 2021

ftr my way of reading it (ingrained over years) is "starts_with a b" = "string a starts_with b" :)
(I do agree with (starts_with prefix) string currying argument from api design pov and I think label is the best way to distinguish similar arguments (with stricter label control from compiler))

@xavierleroy
Copy link
Contributor

I'm closing this PR because we are not going to merge it and because an alternate course of action will be followed. (Namely: remove the new functions from 4.12, and turn warning on by default in 4.13.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants