Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow [] as a user-defined constructor #234

Merged
merged 6 commits into from Jan 29, 2016

Conversation

objmagic
Copy link
Contributor

This patch allows empty list as a user-defined constructor name. Together with pull request #144 , one can write the following heterogeneous list.

module H = struct
  type 'a hlist =
   [] : unit hlist
   | :: : 'a * 'b hlist -> ('a * 'b) hlist
end

let h = H.[1; "two"; false]
module H :
  sig
    type 'a hlist = [] : unit hlist | :: : 'a * 'b hlist -> ('a * 'b) hlist
  end
val h : (int * (string * (bool * unit))) H.hlist =
  H.:: (1, H.:: ("two", H.:: (false, H.[])))

@Drup
Copy link
Contributor

Drup commented Aug 29, 2015

I'm all in favor of that. It's basically a little piece of #76 but It should be enough for interesting uses.

I personally think the syntax, for uniformity, should be:

module H = struct
  type 'a hlist =
   [] : unit hlist
   | (::) : 'a * 'b hlist -> ('a * 'b) hlist
end

@objmagic
Copy link
Contributor Author

I'm OK with both of the syntaxes.

@damiendoligez
Copy link
Member

For consistency, definitely demand (::) instead of :: whenever the constructor is not surrounded by its arguments.

@objmagic
Copy link
Contributor Author

I uncommented another line to allow (::) to be used as constructor name. I still leave :: as valid constructor name

@chambart
Copy link
Contributor

chambart commented Oct 7, 2015

Notice a nice use case for that:

type (_,_) t =
  | [] : ('a, unit) t
  | (::) : 'a * ('a, 'b) t -> ('a, unit -> 'b) t

let rec map: type a b l. (a -> b) -> (a, l) t -> (b, l) t = fun f -> function
  | [] -> []
  | h :: t -> f h :: map f t

let [a; b; c] = map succ [1;2;3]

It would allow to safely write that kind of mapping on a set of values.

@lefessan
Copy link
Contributor

lefessan commented Oct 7, 2015

I don't see the point of allowing overriding of all the of basic syntax. For every OCaml user, [] is expected to be the empty list, idem for :: and [a;b;c]; breaking this assumption makes the code unreadable, unmaintenable by other developers, incompatible with other libraries, etc. So, the only consequence that I can see of accepting such patch will be a strong recommendation in the user manual to never use that possibility...

So, why add features just to recomment later not to use them ???

@Drup
Copy link
Contributor

Drup commented Oct 7, 2015

It's not possible to have confusion for the unaware user without getting a warning about disambiguation, at the very least, and it makes some pattern much easier to express. I don't see how it's different from various other recent additions to the language.

@lpw25
Copy link
Contributor

lpw25 commented Oct 7, 2015

I don't see the point of allowing overriding of all the of basic syntax.

For the same reason OCaml allows overriding of operators.

@lefessan
Copy link
Contributor

lefessan commented Oct 7, 2015

@Drup: the warning about disambiguation is going to disappear at some point in the future. At that point, there will be nothing to save the "unaware user".
@lpw25: yes, we made that mistake. Why make it worse now ? Those few people who use it do it for some time, and then just stop because they discover it is not good. It makes nice 5 lines programs, but rarely maintainable 10 klines programs.

I think we have a choice: either we want to make the language simple and easy, to encourage new users without a PhD in type theory to join us, or we want to make the most complex language in the world, with features that nobody use but so that we can say: oh, it's cool, I can write type 'a option = () of 'a | false and it works ! and it works... :-(

@lpw25
Copy link
Contributor

lpw25 commented Oct 7, 2015

yes, we made that mistake. Why make it worse now ?

It's not a mistake its a useful feature. + is "normally" addition on integers but in some contexts it makes much more sense for it to mean something else. You might as well argue that every value in every program must be completely unique because obviously all identifiers only have one sensible meaning.

@Drup
Copy link
Contributor

Drup commented Oct 7, 2015

Those few people who use it do it for some time, and then just stop because they discover it is not good.

Because Int32.add (Int32.mul x y) z is obviously so much better and easier to understand to the beginner (and the non-beginner) than Int32.( x * y + z )

Please.

@lefessan
Copy link
Contributor

lefessan commented Oct 7, 2015

@lpw25: indeed, one feature I really enjoy in OCaml is having the possibility to make "every value in every program" "completely unique": the recommended coding style should be to never open modules except for constructors and labels, and to always prefix identifiers by the module providing them (List.map,etc.), thus making them "completely unique" in the program.

@Drup: Yes, I am wondering why the beginner does not understand simple error messages:

# let sum t x = Int32.(t.(x) + t.(x+1) );;
Error: This expression has type int but an expression was expected of type
         int32

I like compositionnality...

@lpw25
Copy link
Contributor

lpw25 commented Oct 7, 2015

"every value in every program" "completely unique"

I wasn't suggesting your argument is equivalent to saying that there should only be one thing called map (or +) in scope, I was suggesting your argument is equivalent to saying there should only be one thing called map (or +) in the whole program. No Array.map since there is already a List.map. No Int32.(+) because there is already a Pervasives.(+).

@lefessan
Copy link
Contributor

lefessan commented Oct 7, 2015

My argument is that things that cannot be uniquely identified should not exist in a program: so, map can be both in List and Map, but the user should never open both of them (and actually, they should never be opened). For (+) and other operators, there is no way to elegantly uniquely qualify them : x Int32.(+) y is not better than Int32.add x y.

Anyway, as the example with arrays was showing, you reach the limits of overriding (+) so fast that it's really not worth allowing it in the first place.

@lpw25
Copy link
Contributor

lpw25 commented Oct 7, 2015

Anyway, as the example with arrays was showing, you reach the limits of overriding (+) so fast that it's really not worth allowing it in the first place.

That example is completely fabricated, I can't really see how that constitutes "so fast". In practice it is fine to use such overloading and it can lead to much clearer code.

@objmagic
Copy link
Contributor Author

objmagic commented Oct 7, 2015

so, map can be both in List and Map, but the user should never open both of them

I agree.

module H = struct
  type 'a hlist =
   [] : unit hlist
   | :: : 'a * 'b hlist -> ('a * 'b) hlist
end

Say user writes this module. We can assume that user understands what bad thing is going to happen if he opens his module later. It is analogous to the case in which user opens module Plus = struct let ( + ) = ( * ) end. No one wants to open this module to pollute outside scope.

On the other hand, if user wants to open such module which is written by someone else, he should somehow understand the possibility of polluting outer environment. For example, open Core.Std will make all arguments of List.map become labelled arguments. However, I do agree overriding semantics of basic syntax could leave user confused.

@ghost
Copy link

ghost commented Oct 7, 2015

I think this feature is nice as it removes a bit of magic around the list type. Now you can say "this is how the list type is defined", instead of saying that it is a normal sum type except that you can't redefine it yourself.

And it's impossible to prevent people from writing unreadable code, so I don't think we should use this as an argument against this specific feature.

@lefessan
Copy link
Contributor

lefessan commented Oct 7, 2015

@lpw25: "In practice, it is fine..." I have written alone projects of nearly 100klines of OCaml code, so I think I have quite an experience in the kind of code that can be written in such projects. It is not "fabricated", it is exactly what appears when you move from short examples to real life code.

@lpw25
Copy link
Contributor

lpw25 commented Oct 7, 2015

It is not "fabricated"

Are you saying that the example was taken from a real piece of code, or are you objecting to my use of the word "fabricated"? Perhaps it is too strong. My point is drawing a very general conclusion "using operator overloading is always a mistake" from a small sample of code is not particularly persuasive.

@Octachron
Copy link
Member

@lefessan:

Anyway, as the example with arrays was showing, you reach the limits of overriding (+) so fast that it's really not worth allowing it in the first place.

I don't think your example is particularly illustrative on this point. It could be easily rewritten as

let sum t x = Int32.(t.(x) + t.(succ x) )
 (* or *) 
let sum t x = Int32.( t.(x) + t.( Int.add 1 x ) )

At the same time, for numerical and mathematics oriented applications, the
ability to bind (+) to the right associative commutative binary operator is
a true blessing and can increase readability a lot.

I think the same thing is true for the list literal syntax; there is quite a few
constructions that are essentially a list plus some supplementary type
information that could benefit from having access to the list literal syntax.

@lefessan
Copy link
Contributor

lefessan commented Oct 7, 2015

Anyway, whatever, I gave my opinion. PR are nice to improve the distribution tools and libraries. I think they are a bad way to improve the language itself, by small inconsistent increments and no long term vision.

@bluddy
Copy link
Contributor

bluddy commented Oct 7, 2015

I, for one, am supportive of this PR. Less magic is a good thing.

I also acknowledge the danger inherent in local opens. I think the real problem is the inability to restrict local opens easily. For example,

(* Ideal syntax *)
module Int32 = Int32((+), (-), (/), (*))
(* current syntax: 

module Int32 : sig val (+):int32 -> int32 -> int32 val (-):int32 -> int32 -> int32 val (/): int32 -> int32 -> int32 val (*): int32 -> int32 -> int32 end = Int32 *)

...
Int32.(x + (2 * 3))

Would be much safer than using Int32 directly.

Currently, to do this, we'd have to write a whole module type, which involves the type of every function we want (as is shown above), and is tedious. A quicker selective syntax as suggested in the first alternative would really help, but I don't think this particular syntax works because it clashes with functor application. Anyone have any ideas?

@xandkar
Copy link

xandkar commented Oct 7, 2015

@bluddy

module Int32 = struct
  let (+) = Int32.add
  ...
end

@bluddy
Copy link
Contributor

bluddy commented Oct 7, 2015

@ibnfirnas: Good point, you could either restrict the type view of a module or build another module altogether -- I didn't think of that. I agree that building a new module is cleaner syntactically, but it's still not concise enough that people would use it in every file. I guess stating the point of this is important: you want to make it such that the included functions/types/etc can be concisely specified in every file. This prevents the problem you open yourself up to by opening a file either locally or globally, whereby a change in one file causes a possibly invisible semantic change in another file.

In other words, suppose module Foo has no (+), and I locally open it in another file, bar.ml. Many years later, I add (+) to Foo, not realizing that it's opened locally in bar.ml. Calls to (+) in Bar now go to Foo's version of (+) rather than the default library's version. Hilarity and painful debugging ensues.

If Foo was first constrained to only the used functions, any addition to Foo would not have been reflected in bar.ml. However, to be feasible as a common practice, this must be as painless a process as possible, which means it should really be a one line thing with minimal syntactic footprint IMO.

@objmagic
Copy link
Contributor Author

objmagic commented Oct 7, 2015

@damiendoligez Right now, :: can be used as constructor. I agree that we should keep uniformity and demand (::). Should I disallow :: in this patch? Will this change cause any backward incompatibility?

@objmagic
Copy link
Contributor Author

objmagic commented Oct 8, 2015

@chambart I guess the meaning of this map is to guarantee returned list has the same length? It is cool but I would like to know in which case it will be useful?

@lefessan
Copy link
Contributor

lefessan commented Oct 8, 2015

@lpw25: Yesterday, I was not completely satisfied by my reply, but I couldn't put the finger on the problem. And then, I realized why: in fact, my example is completely real, it is indeed code that I have written in the past, and now, I remember where! In try.ocamlpro.com, from the beginning, we added an extension to locally rewrite constants. For example, you can type:

#   I.(4494999 * 48484848);;
- : Big_int.big_int = 217939343275152I

As you can see, integers and operators are overriden in the I module into big integers. And indeed, as soon as I tried to make a simple program using that extension, I fell on the example I gave you.

In the OCamlPro compiler (the one used by Memprof), we have tested such features for a long time, and our conclusion was often: once you have made the patch, you usually play with it on small examples for a few days; but then, in bigger projects, the new feature is never used. Finally, we usually decide to drop them after one or two years.

In conclusion, I would argue that people here should refrain from saying "it's worderful, I want it !", instead, people should take such PRs, merge them in their own local version of OCaml, and use them for one year in their projects. Then, after one year, they should tell us whether it was the wonderful feature that solved all their problems, or just a nice idea, but with little interest.

@damiendoligez
Copy link
Member

@marklrh

Right now, :: can be used as constructor. I agree that we should keep uniformity and demand (::). Should I disallow :: in this patch? Will this change cause any backward incompatibility?

:: is on my list of things to remove anyway (see PR#5936). As far as I know, the incompatibility is entirely theoretical because no known program uses this misfeature.

@bobzhang
Copy link
Member

bobzhang commented Oct 8, 2015

I agree with @lefessan , local open (overloading) makes code hard to reason about
Think twice, I think it might be interesting to providing a qualified syntax which only allow the module functions in that scope, for example

M.(f a b)  (* all functions inside M can only be imported from M*)

Edit: typo

@bluddy
Copy link
Contributor

bluddy commented Oct 8, 2015

@yallop, thanks for the clarification and cross-reference. LGTM now, but I'd also like to see #76 merged.

@damiendoligez
Copy link
Member

@marklrh

Also, notice that false and true also can be used as type constructor and it seems no one complains about this.

That's not exactly right: as evidenced by Mantis #5936, most people expect true and false to be keywords whose meaning is fixed. If we wanted them to be ordinary constructors, we'd write them capitalized.

@objmagic
Copy link
Contributor Author

@gasche any opinion on this issue during developer meeting?

@gasche
Copy link
Member

gasche commented Nov 19, 2015

No, we didn't have time to cover all pull requests and open bugs, and this one was not discussed, sorry.

@damiendoligez
Copy link
Member

I think this boils down to the mental representation of [] and ::.
Are they keywords or constructors? @lefessan says keywords, everyone else says constructors.

At any rate, it's really strange to have :: redefinable but not []. I move that we merge this PR for 4.03.

@alainfrisch
Copy link
Contributor

Generally speaking, overloading built-in operators/syntax/constructors/... seems a rather bad idea in terms of readability, but I don't really object to the PR.

FWIW, I consider [] as some piece of syntactic sugar, the degenerate form of the n-ary construction [e1;...en] (and it allows an arbitrary number of whitespaces between [ and ]). Without this PR, we could change the internal constructor name to which the empty list is desugared, this is purely an implementation detail.

@objmagic
Copy link
Contributor Author

objmagic commented Dec 9, 2015

ping again. any consensus on getting this merged? more discussion?

@objmagic
Copy link
Contributor Author

@damiendoligez @gasche any plan on merging this in 4.03.0? Is there any concern? I would love to fix any problem.

@@ -23,6 +23,9 @@ Language features:
- GPR#88: allow field punning in object copying expressions:
{< x; y; >} is sugar for {< x = x; y = y; >}
(Jeremy Yallop)
* GPR#234: allow "[]" as a user-defined type constructor. Demand parenthesis
around "::" when using "::" as type constructor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"type constructor" is the wrong phrase here. You should just say "user-defined constructor" (in both instances).

@damiendoligez damiendoligez added this to the 4.03.0 milestone Dec 23, 2015
@objmagic objmagic force-pushed the empty_list_user_defined_constructor branch from 36eff90 to 4f2ad78 Compare January 28, 2016 22:41
@objmagic
Copy link
Contributor Author

@damiendoligez Rebased against trunk, and made Change entry better

@mshinwell
Copy link
Contributor

I am going to merge this, since @damiendoligez has approved it, and there have been no further objections for more than one month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet