Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve stdlib support for int #2011

Merged
merged 2 commits into from
Oct 23, 2018
Merged

Improve stdlib support for int #2011

merged 2 commits into from
Oct 23, 2018

Conversation

dbuenzli
Copy link
Contributor

This PR improves stdlib support for int values. It adds an Int module to the standard library and deprecates the int_of_string_opt function.

Deprecating int_of_string and string_of_int is a bit more involved on the code base and will be done in separate PRs.

A few comments:

  1. Following the Bool module proposal this PR includes only the non-raising Int.of_string function which corresponds to int_of_string_opt. Here again I have nothing against renaming this function to of_string_opt if this is desired.
  2. Unless I made a mistake the rest of the module should be uncontroversial, it follows the common parts of the Nativeint, Int32 and Int64 modules. In particular it does not define any operators.
  3. I think that all non operators functions acting on ints of the Stdlib module could/should be deprecated aswell. This is not done by this PR but can be added. These functions are succ, pred and abs.
  4. Leaving aside exn and unit I think this is the last builtin datatype that was missing its own module. I'm sure you'll be glad to eventually be able to write {Set,Map}.Make (Int) your stdlib-based programs !

@pmetzger
Copy link
Member

I recognize I may be in the minority, but I really prefer when the default functions return opt and those that don't are labeled _exn or the like, so I favor the suggested name for of_string.

@dbuenzli
Copy link
Contributor Author

Appveyor timedout.

stdlib/int.mli Outdated Show resolved Hide resolved
@dbuenzli
Copy link
Contributor Author

I would like to move on this PR.

It seems no one has a strong objection about the module's functionality and existence. A contentious point though might be 1. here again the choice is between:

  1. Keep it as it is now and accept that non regularity in the stdlib (an irregularity becoming more regular as it might share it with the Bool module)
  2. Rename the Int.of_string of this PR into Int.of_string_opt (sic).
  3. Add a function Int.of_string that raises Failure.

While I favour 1. full Stdlib regularity would entail 2+3.

This PR deprecates the int_of_string_opt function in this message @gasche expressed concerns about the current rash of Pervasives deprecations. If the deprecations of this PR are blocking merge I'm willing to remove them (I personally think it's a good time to do this, if I get a clear sign that it is desirable I'm willing to put up a PR that deprecates succ, pred and abs).

@nojb
Copy link
Contributor

nojb commented Sep 11, 2018

Keep it as it is now and accept that non regularity in the stdlib (an irregularity becoming more regular as it might share it with the Bool module)

One reason I can think of for not doing this is that it will keep the modules Int, Int32, Int64, Nativeint from sharing a common signature with this function. Not sure about the relative importance of this point, though.

@dbuenzli
Copy link
Contributor Author

I have rebased this PR to resolve the conflicts. At the same time:

  1. Given this decision by @alainfrisch I have removed the deprecation warning for int_of_string_opt.
  2. Still kept the commit that replaces uses of int_of_string_opt by Int.of_string (which only appear in the test suite). So that less work is needed once/if 1. becomes a reality.
  3. Stayed behind the Int.of_string returns an option design despite @nojb's remark.

stdlib/int.mli Outdated Show resolved Hide resolved
@alainfrisch
Copy link
Contributor

alainfrisch commented Sep 12, 2018

If being able to share a common interface with Int32/Int64/... (e.g. to support algorithms abstracted over those modules) is not a goal, the rationale for exposing e.g. Int.one is mostly an aesthetic one, no? (i.e. nobody will use it in practice). Not saying I'm against it, but the case for it does not very strong.

If being able the share a common (sub-)interface is a goal, then it's indeed unfortunate that the X.of_string has a different signature between Int and Int32 (i.e. it's more problematic than having an inconsistency between Bool and Int). That said, it's not a huge problem, particularly because many algorithms abstracted over a concrete representation of integers won't be responsible for parsing numbers from text themselves (meaning that the common interface won't need to include of_string).

One possibility, which I don't necessarily fully believe in myself, but might be worth discussing would be to add X.parse: string->X.t option/X.print: X.t -> string. At the cost of breaking a tradition, and making even recent addition already morally deprecated, this would have the advantage of providing an eventually consistent story with a good migration path.

[EDIT: forgot the option above]

@dbuenzli
Copy link
Contributor Author

That said, it's not a huge problem, particularly because many algorithms abstracted over a concrete representation of integers won't be responsible for parsing numbers from text themselves (meaning that the common interface won't need to include of_string).

That's my impression aswell and the reason why I would still like to otherwise maximize similarity with the other integer modules and hence include Int.one.

@dbuenzli
Copy link
Contributor Author

One possibility, which I don't necessarily fully believe in myself, but might be worth discussing would be to add X.parse: string->X.t/X.print: X.t -> string

I'm unconvinced by X.print especially since that one is usually not problematic: AFAIR all to_string functions of the stdlib are total.

I'm intrigued by the potential of an eventual total regularization of of_string functions via an M.parse : string -> t option convention but I think this discussion can be left out of this PR.

@alainfrisch
Copy link
Contributor

Indeed, X.print was just be to have some symmetry with X.parse.

Ok, let's leave the discussion around X.parse for later.

@mshinwell
Copy link
Contributor

I think we should strive for the largest reasonable common subset of operations between all of the integer modules (like Base does with Int_intf).

As an example, the next version of Flambda makes extensive use of parameterisation over modules representing various kinds of numbers. As it happens, Int is not one of the modules involved, since it represents host-width integers rather than target-width integers. However I would not be surprised if others have similar applications where Int might be included in the parameterisation.

@Drup
Copy link
Contributor

Drup commented Sep 17, 2018

I agree that I would highly prefer all the of_string functions to be regular among IntX/Bool/Float/Others. Both because of functors, but also simply because I prefer having more regular APIs. Having both of_string of type string -> t and an option version for all these modules seems like the best choice.

@dbuenzli
Copy link
Contributor Author

Having both of_string of type string -> t and an option version for all these modules seems like the best choice

I think it's better if we entice people to not use raising variants and the best way of doing is to not provide them. I could do 1. (see above) to make the thing more regular but I personally don't find that to be a great name (it should have been Int.option_of_string in my opinion) and as far as this is concerned I think going in the X.parse direction suggested by @alainfrisch above would provide a good eventual solution.

@Drup
Copy link
Contributor

Drup commented Sep 17, 2018

I didn't advise a name on purpose (I also think X.parse is pretty good), my only point was to ensure the regularity. I understand your point about encouraging people not to use the raising versions, but if we deprecate them, let's do that on all the parsing functions at the same time, instead of having some irregular transitional APIs.

@dbuenzli
Copy link
Contributor Author

but if we deprecate them, let's do that on all the parsing functions at the same time, instead of having some irregular transitional APIs.

The raising ones are already irregular, see for example bool_of_string. So what should I do now introduce Bool.of_string that raises Failure which never existed and this even though we agree we should move away from raising variants ?

I'm personally rather against this both because we have an opportunity to give the good signal here and because I can see people switching from bool_of_string to Bool.of_string failing to realize they now need to catch another exception.

@Drup
Copy link
Contributor

Drup commented Sep 17, 2018

I'm not sure I understand. bool_of_string has been around for a while, and has always been raising. If you are talking about the fact that Bool.of_string is non-raising .... well, I have the same oppositions there, for exactly the same reasons.

One of the reason some alternative standard libraries are used is because of their very high degree of internal consistency. You are making it worse for the stdlib here. If you want to go forward with removing all the raising variants, and I'm sure many people agree, I propose you make a patch for that, instead of doing that change in various semi-related PRs.

Personally, my opinion would be to add parse everywhere, put it front and center in the documentation, and keep to_string raising, for efficiency purpose. But I would rather debate that somewhere else where we will do it for the whole stdlib.

@dbuenzli
Copy link
Contributor Author

I'm not sure I understand. bool_of_string has been around for a while, and has always been raising.

It raises Invalid_argument, not Failure on error.

You are making it worse for the stdlib here.

Frankly, is it that bad ?

If you want to go forward with removing all the raising variants, and I'm sure many people agree, I propose you make a patch for that, instead of doing that change in various semi-related PRs.

I think this misses the point somehow. This PR introduces new functionality, I'm just avoiding in this new functionality a mode of operation that we consider we should move away from. I acknowledge it introduces irregularity, I personally don't see it as a big deal. If others also think strongly about this then I think the current of_string of this PR should be changed to of_string_opt and that's all, I do not want to introduce new names that use the mode of operation we would like to move away from.

Regarding removing the raising variants I was actually on that track (but I don't think it's necessarily something to be done in a single PR), but somehow the discussion about deprecation and its outcome stopped me on the way.

@gasche
Copy link
Member

gasche commented Sep 17, 2018

I wouldn't support removing existing stdlib functions or changing their types, but personally I'm ok with @dbuenzli's choice of having a more modern type for the more modern versions of of_int. I would also be fine with a module only exporting of_int_opt, and I agree that introducing more exception-raising of_int should not be done solely for the sake of consistency.

@Drup
Copy link
Contributor

Drup commented Sep 17, 2018

I acknowledge it introduces irregularity, I personally don't see it as a big deal. If others also think strongly about this then I think the current of_string of this PR should be changed to of_string_opt and that's all, I do not want to introduce new names that use the mode of operation we would like to move away from.

That's fair. I don't know about others, but I do think having regular APIs is a Very Big Deal. So yes, please do the name change, either to parse or to _opt.

That being said, I also disagree that raising variants should never be used. int_of_string is used very commonly in contexts where we know that the element to parse will never fail, such as lexers. In such cases, the little bit of efficiency gained by not having to constantly allocate option wrappers can be important. For instance, Tyre's converter used to be of type 'a -> 'b option. I eventually changed them to 'a -> 'b with exceptions and the perf gain was occasionally significant. And no, flambda doesn't bridge that gap (yet ?).

@gasche
Copy link
Member

gasche commented Sep 17, 2018

(Also, to play the devil's advocate, match .. with exception makes it much easier to write correct code handling exceptions. We only need exception typing...)

@alainfrisch
Copy link
Contributor

The current implementation of int_of_string_opt is:

let int_of_string_opt s =
  (* TODO: provide this directly as a non-raising primitive. *)
  try Some (int_of_string s)
  with Failure _ -> None

Perhaps it's actually best to leave it like that. A clever inliner could then optimize:

match int_of_string_opt s with
| Some x -> ...
| None -> ...

into:

match int_of_string s with
| x -> ...
| Failure _-> ...

(Can flambda do that?)

@dbuenzli
Copy link
Contributor Author

dbuenzli commented Sep 18, 2018

The current implementation of int_of_string_opt is:

let int_of_string_opt s =
  (* TODO: provide this directly as a non-raising primitive. *)

I actually had the plan to work on this after these things are in. So if you really think it's not worth it please say.

@dbuenzli
Copy link
Contributor Author

dbuenzli commented Sep 18, 2018

So it may be time to sum up a little bit the larger picture on these of_string interface regarding the new Int and Bool module which may have been lost in the discussions. Here are a few possibilites, feel free to add your own. Basically it seemed to me that we had agreed with @alainfrisch was to follow 1 which is what the current state of PRs implement.

  1. The new modules Float,Bool, Int all have an of_string : string -> t option and no raising variant (Float introduced in the last major OCaml version is to be corrected along this line after a decision has been made, I offered to do it once these Bool and Int PRs are merged). This contradicts the interface of {Int32,Int64,Nativeint}.of_string and note: only these ones. A potential regularization of the interfaces for functor arguments regarding the string -> t option interface could be done later via {Bool,Float,Int,Int64,Int32,Nativeint}.parse. Note that this would mean there would be no regularization for raising variants.
  2. Only {Bool,Int}.of_string_opt : string -> t option are introduced. This means that we immediately get regularity for the string -> t option interface and thus likely eschew the need for X.parse regularization and we forget about regularizing raising variants.
  3. Only {Bool,Int}.parse : string -> t option is introduced and a regularization to be done later via {Float,Int64,Int32,Nativeint}.parse and we forget about regularizing raising variants.
  4. Only {Bool,Int}.option_of_string : string -> t option is introduced and a regularization in the other modules to be done later via {Float,Int,Int32,Int64,Nativeint}.option_of_string and we forget about regularizing raising variants.
  5. Both a {Bool,Int}.of_string : string -> t that raises Failure (please note that the current bool_of_string raises Invalid_argument, not Failure) and {Bool,Int}.of_string_opt : string -> t option are introduced. This entails immediate full regularity for both raising and non-raising variants w.r.t. the current Float, Int32, Int64 and Nativeint.

A few random comments of mine.

  • The reason I'm favouring 1. is on the ground that the Stdlib is also the place where conventions can get established and I think the signal that should be given is for users to have that signature for their of_string functions as I don't find of_string_opt to be a great name, it really reads from a string option to me. I also have the impression that the of_string function is the one users will naturally fish for.
  • In 5. I'm a little bit concerned about the potential for nasty bugs regarding the change in exception names between Bool.of_string and bool_of_string but repeating the design error in Bool.of_string would be totally silly in my opinion.
  • Only 5. would regularize raising variants. I'm not too concerned about regularizing raising variants. Especially I don't find @Drup's comment about still having raising variants for efficiency reasons compelling: I mostly see these of_string functions as toys for quick hacks. This because the syntax they parse for integers is foreign to most standards (e.g. the _ to separate digits). If you need performance or whenever you implement standards you likely reimplement and specialize integer parsing yourself at which point you can use an exception for control flow yourself if you wish so.

@gasche
Copy link
Member

gasche commented Oct 9, 2018

Come on, @dbuenzli, the handling of stdlib PRs is frustrating (for everyone), but your last message is rude to @alainfrisch, which is neither pleasant nor productive. It's a shitty job to be handling stdlib changes for reasons exemplified in this PR, and most people (myself included!) decided to not apply for it. @alainfrisch is currently the most active in processing stdlib proposals (with @nojb), and it's a bit sad that you would alienate him.

Hacking on the stdlib sucks, and we all know it, because there are many competing choices and often no clear "better choice" to agree on (in absence of a better choice, doing nothing is often the only reasonable-looking option). (Syntax discussions are often like that, except for really good proposals like match ... with exception). Sometimes one wins an argument, sometimes everyone loses, and I think we should try not to be too harsh about it.

@dbuenzli
Copy link
Contributor Author

dbuenzli commented Oct 9, 2018

Come on, @dbuenzli, the handling of stdlib PRs is frustrating (for everyone), but your last message is rude to @alainfrisch, which is neither pleasant nor productive.

I apologize to @alainfrisch for it. Int.of_string has been removed.

@mshinwell
Copy link
Contributor

I can't quite believe that it is now proposed to add these modules without the of_string functions. I agree entirely with the second paragraph of this comment.

Bear in mind that unless you're writing a library from scratch, having already written such a library before with all the pitfalls and potential inconsistencies already in your head, then there is always going to be imperfection. Yes, we should have policies about breaking backwards compatibility; yes, we should have good refactoring tools; and yes, older function signatures should be updated so that everything is consistent. However blocking the addition of new, useful functions that have standard, discoverable names and just slightly different type signatures because we haven't yet solved those problems seems the wrong decision to me. Many users probably won't even notice these minor inconsistencies on a large scale---it will be a minor irritation now and then.

The only real question to my mind that needed to be answered here is what the new standard interface for the functions in question should be (it would seem wrong to add new functions with old signatures, especially if such signatures are problematic in some way as the ones in question potentially are). That seems like it could have been worked out rather than kicking it into the long grass.

@alainfrisch
Copy link
Contributor

(Ok, unsubscribe is not enough, since explicit @-mentions re-subscribe automatically.)

However blocking the addition of new, useful functions that have standard

It's really not like this is blocking the addition of new useful stuff: all suggest features are already part of the Sdtlib and Stdlib.int_of_string_opt follows the current naming convention in the Stdlib. The main practical value of the Int module, in my opinion, is to allow e.g. Set.Make(Int) (btw, why not add Int.hash?), and to provide a namespace for future, more specialized, additions (for instance a popcount function). Another value of the PR is to create uniformity, but since we cannot fully easily achieve it now, at least let's not make it more difficult to get it later.

The decision of switching to option-returning variants pervasively in the Stdlib, for all conversion and lookup functions, is a topic on its own, rather independent of the addition of Int and much less consensual.

@Drup
Copy link
Contributor

Drup commented Oct 9, 2018

I find it a bit sad that the stubborn position of one individual basically brings us a more inconsistent and less useful stdlib. My feeling after all this is that there's no real interest to have a good stdlib within the design constraints that are given to us. Frankly individuals looking for the perfect one and are not able to think within the design constraints of the actual one should rather go work on their own rather than actively help in creating a broken stdlib.

I suppose you mean yourself, by camping on your position and refusing to add the basic of_string : string -> _ function to stdlib, in favor of the fancier and more modern string -> _ option variant, or even the slightly less elegant parse name.

Let's not play silly games @dbuenzli . Don't paint me as the bad guy here, we all have our pet-peeve but at least I recognized the value in yours and I was ready to compromise. I still think parse is a perfectly good solution.

@mshinwell
Copy link
Contributor

@Drup It has been pointed out that there are good reasons for preferring a standard name, also, though.

Did you mean "of_string" in your example?
Assuming that is what you meant, then I do sense some aversion to the option-returning versions. I would point out the following here: these kinds of conversion functions are a source of errors. A library should be guarding against such errors as much as it reasonably can. As an example, I've been working recently with a lot of conversions between different integer sizes (some related to cross compilation issues), and it's really painful to have to deal with conversion functions that cannot return results as option or equivalent. This isn't just a matter of being "modern", it's also a matter of having the right functions to write robust code. If you're going to be ideologically set on a particular position, as some of our contributors may or may not be, it doesn't seem so bad to come down on the side of an option-returning conversion being the default.

@Drup
Copy link
Contributor

Drup commented Oct 9, 2018

@mshinwell As my previous messages on the matter should make clear, I'm not against the option returning at all. But either you do it properly (by changing everything and breaking compat) or you do it differently (by adding an extra function with that api, like parse, and setting a new standard which I'll happily adopt in my libraries). I'm against the solution which consist of doing something inconsistent that doesn't fix the problem properly.

@dbuenzli
Copy link
Contributor Author

dbuenzli commented Oct 9, 2018

I suppose you mean yourself, by camping on your position and refusing to add the basic to_string : _ -> string function to stdlib, in favor of the fancier and more modern _ -> string option variant, or even the slightly less elegant print name.

Please I was not alone. Moreover you said you were willing to compromise but you never responded with any good argument to the concern of introducing a Bool.of_string function raising Failure _ while bool_of_string has been raising Invalid_argument for ages.

Anyways that discussion is over for me. I only commented the function since I did spend sometime on improving the doc string on int parsing. If people want to revert that decision at some point it will be easy to do.

@bluddy
Copy link
Contributor

bluddy commented Oct 9, 2018

I do think people tend to associate a name - especially a simple one - with a given type signature template. Remembering that Int.of_string has an option return type while other of_string functions raise exceptions is going to be really hard once we forget that Int was even added later, which will only take a couple of versions. It'll be something that only the old-timers remember, and even then we won't be sure how it happened and need to consult the resident historian.

So since we all basically agree that we like option-returning functions and dislike exception-based ones, why don't we incorporate of_string here with exceptions to be consistent, and then work on a global PR that renames common exception-raising functions and deprecates the old names? We can do a PR per function or per a few functions. Once we have the new names, any new modules can immediately use the new names and types.

@Chris00
Copy link
Member

Chris00 commented Oct 9, 2018

A library should be guarding against such errors as much as it reasonably can. […] write robust code.

I agree with that too, so these should be the default. Occasionally, functions raising exceptions are convenient and a suffix _exn reminds the reader of their behavior. Moreover, I would argue that we need to use standard names. So the only way forward it to break backward compatibility. As already mentioned, this is going to be a bit painful, so we have to propose a carefully thought migration path (I tentatively proposed one above but nobody commented on it) and preferably do it only once (which is why I think some design principles have to be explicitly be laid out). Many alternatives to the standard library have been proposed over the years and no one has “won” — and despite all the energy put into these, I'm not convinced that it will happen as it did in the package and the build departments. Also, from the perspective of a newcomer, the message that the standard library coming with the compiler is better replaced with another one may be worrisome. So I believe that stdlib should use all the experience gathered from alternatives and showcase the highest quality, consistency, and best practices.

@dbuenzli
Copy link
Contributor Author

dbuenzli commented Oct 9, 2018

a carefully thought migration path (I tentatively proposed one above but nobody commented on it)

The problem with your proposal is that it doesn't entice people to upgrade and it's also quite heavy both in terms of maintenance and build obfuscation (OCAMLPARAM).

I think the only sane approach for the stdlib is to try to gradually improve it, with occasional hard and well thought out break points. Even if this involves, meanwhile, transient, non-harmful (because they establish better standards and do not break code), inconsistencies of certain kind.

Maybe what this discussion failed at to point out is that that there are 1) different kind of inconsistencies and 2) these have different properties, from harmless to harmful and 3) that the same holds for consistencies. The inconsistency that I tried to introduce maybe felt embarrassing to some but it was not harmful from a programming perspective (amusingly the end-result is as embarrassing now).

Software is never perfect, I'm routinely dissatisfied with designs I started from scratch and thought to be the ultimate one when I created them. Errors are and will still be made even in new interfaces that are being introduced now. Asking for perfection, especially in a project like the OCaml stdlib is just aiming to eventually do nothing and perpetuate our dissatisfaction with the stdlib.

I think that all or nothing attitudes like that (I'm pointing the attitude not the person) will make it difficult to make significant progress on the stdlib in the future. Gradual improvement with a vision and end-user onboarding is the only solution in my opinion.

@Chris00
Copy link
Member

Chris00 commented Oct 10, 2018

The problem with your proposal is that it doesn't entice people to upgrade and it's also quite heavy both in terms of maintenance and build obfuscation (OCAMLPARAM).

As I see it, there is no maintenance — this solution is for existing packages in the repo to compile with the newer compilers. I did not mean to do this for new packages submitted to the repo (which can adjust their build systems).

@dbuenzli
Copy link
Contributor Author

As I see it, there is no maintenance —

I meant maintenance around providing the compatibility layer. Also it may not always be possible to provide one (e.g. when new language features/primitives are introduced).

this solution is for existing packages in the repo to compile with the newer compilers.

That's precisely what I think should be avoided. When you move the compilers and break things you should entice the eco-system to update and move with you rather than allow it to function with older assumptions.

@Chris00
Copy link
Member

Chris00 commented Oct 10, 2018

move with you rather than allow it to function with older assumptions.

The only way I see then is that stdlib be released as separate packages (with a version number in the name since there will be a need to install several of them concurrently). One just increases the version number if there is backward compatibility and release a new package if not? ocaml packages could just be meta-packages depending on ocaml-compiler and stdlib (in order to keep the fact that they are blessed because they come with the compiler).

@dbuenzli
Copy link
Contributor Author

dbuenzli commented Oct 23, 2018

I think the single remaining objection to this PR has been solved by 6a94394 and this can be merged.

@nojb nojb merged commit 90d9b22 into ocaml:trunk Oct 23, 2018
@nojb
Copy link
Contributor

nojb commented Oct 23, 2018

That being the only point of disagreement, I went ahead and merged.

Thanks for your efforts and perseverance!

@dbuenzli
Copy link
Contributor Author

Thanks @nojb !

@dbuenzli dbuenzli deleted the int-support branch October 23, 2018 09:33
@Drup
Copy link
Contributor

Drup commented Oct 23, 2018

@dbuenzli Despite my oppositions, I appreciate you volunteered to do those PRs. Thanks!

@dbuenzli
Copy link
Contributor Author

Thanks @Drup and sorry again to have lost my temper, especially to @alainfrisch.

@alainfrisch
Copy link
Contributor

Sure, no problem @dbuenzli.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.