Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better explanation of multi-argument vs. tuple constructors in sum types #7783

Open
vicuna opened this Issue Apr 23, 2018 · 18 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link
Collaborator

vicuna commented Apr 23, 2018

Original bug ID: 7783
Reporter: pmetzger
Assigned to: @Octachron
Status: assigned (set by @Octachron on 2018-04-24T14:14:22Z)
Resolution: open
Priority: normal
Severity: minor
Platform: N/A
OS: N/A
OS Version: N/A
Version: 4.06.1
Category: documentation
Related to: #5883
Monitored by: @nojb @hhugo @gasche

Bug description

The manual does not sufficiently explain a subtle point. Consider the following declaration:

type foo =
| A of intint
| B of (int
int)

Although these look similar, A and B are quite different. A is a constructor taking two arguments, even though it looks like it takes a tuple, while B of course takes a tuple.

This means that the two cannot be used interchangeably and in fact have different enough meanings to matter in a lot of contexts.

This is confusing to a beginner because it would seem like the parentheses are just meaningless precedence grouping, but they aren’t, they’re syntactically quite significant, and it would seem like the * in the A case means “tuple” but it does not.

(Note that I myself didn't understand this distinction. The manual does not make it particularly clear. That is why I am filing this bug report. A discussion on the discord chat channel seemed to reveal that many beginners were confused about this.)

It would be good to make this very explicit in the manual.

Steps to reproduce

  1. Try learning OCaml
  2. Become confused about multi-argument constructors vs. tuple constructors in sum types.
@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 23, 2018

Comment author: pmetzger

BTW, during discussion, what cemented the distinction in my mind was someone noting that, if constructors had function types (as in Haskell), then:

A: int -> int -> foo
while
B: (int * int) -> foo

This made the difference much clearer to me. (The use of * in the A case is unfortunate I think, but that decision was made decades ago.)

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: sampo

OCaml provides pairs, triples, quadruples and so forth. For n >= 2, the collection of n values is called an n-tuple, or just a tuple. The tuple whose components are x1,x2,...,xn is written (x1,x2,...,xn). Such a value is created by an expression of the form (E1,E2,...,En). With functions, tuples give the effect of multiple arguments and results.
#type foo =
| A of int * int
| B of (int * int);;
type foo = A of int * int | B of (int * int)

A(1,2);;

  • : foo = A (1, 2)

B(1,2);;

  • : foo = B (1, 2)

let average(x,y) = (x+y) / 2;;

val average : int * int -> int =

This writting is also the same:
val average : (int * int) -> int =

I hope this will help.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: pmetzger

Sampo, I'm not sure what you were suggesting? The point is that there's a difference between a constructor that takes multiple values and a constructor that takes a single tuple, but the syntax is so similar that beginners have problems with this, and it is important to document the difference carefully. I am not talking about a misunderstanding that I myself am experiencing, I'm suggesting that documentation on this subtlety be improved. If you were proposing documentation, I'm not sure that really explains it well enough.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: pmetzger

Here is some possible documentation that could be added in section 1.4 of the manual ("Records and Variants"):

Note that although the syntax is almost the same, there is a significant difference between a variant constructor that takes multiple arguments and one that takes a single tuple as an argument. Consider the following type:

type foo =
| A of intint
| B of (int
int)

Although the syntax for A and B seems nearly the same, the parentheses are quite significant. A is a constructor that takes two arguments, both of which are integer typed values, while B is a constructor that takes a single argument, a tuple of two integers. These behave quite differently in many circumstances, such as when pattern matching. Be careful not to confuse them.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: sampo

@pmetzger.
A is a constructor taking a tuple, not arguments. This is a "Cartesian product". The values themselves can be grouped in pairs or even in tuples.
int * int is a pair or a tuple likewise (int * int). There is no difference.
However, external parentheses are not obligatory. The "Cartesian product" is denoted by the symbol *.
type such as these:
t1 * t2 * t3
(t1 * t2) * t3
t1 * (t2 * t3)
The first indicates a set of triples.
The second indicates a set of pairs where the first component is also a pair.
The third indicates a set of pairs where the second component is also a pair.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: @Octachron

@sampo, I am afraid that you are mistaken. Constructor arguments are not packed in a tuple by default: there are differences in both behaviors in pattern matching and memory representations between type t = A of int * int and type s = B of (int * int). I recommand playing with caml-inspect and reading the memory representation chapter of RWO to learn about the difference in memory representation.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: sampo

@Octachron,a pair or a 2-tuple will always remain a pair or a 2-tuple, with or without parentheses and this has existed since the birth of CAML and continues with Ocaml. This is a fact.
The representation in memory is another matter.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: @trefis

No.

type t = A of int * string;;

type t = A of int * string

let x = (3, "haha");;

val x : int * string = (3, "haha")

A x;;

Characters 0-3:
A x;;
^^^
Error: The constructor A expects 2 argument(s),
but is applied here to 1 argument(s)

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: pmetzger

Sampo: apparently, the documentation update I'm requesting is even more necessary than I had naively expected.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: sampo

@trefis, yes, this is an example that shows the power of Ocaml typing. But the two expected arguments have a name and are called a pair or a 2-tuple. And that makes the difference that apparently you do not discern. (respectfully speaking) As I said above, a type int * int or (int * int) is the same and is a component value written in parenthesis like this (1,2), a comma serves as the separator between elements. In this case (your example) the argument is not a value and another value but it is a pair or a 2-tuple that is expected. Reading the error that the compiler gives can cause confusion in the understanding, and apparently that's
what he does.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: pmetzger

Sampo:

Abusing notation (since unlike Haskell, constructors don't have types):

The type of "A" in "type t = A of int * int" would be "A: int -> int -> int"
The type of "B" in "type s = B of (int * int)" would be "B: (int * int) -> int"

These are not the same. The "*" in the first case is rather unfortunate notation, but we're stuck with it.

This is why I think this really needs to be documented. It confuses people, even quite smart people, and nothing in the documentation really explains it.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 24, 2018

Comment author: pmetzger

(And again note that if you feed a tuple to A, it won't work it is expecting two arguments, but if you feed a tuple to B it will. They are not the same.)

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 25, 2018

Comment author: sampo

@pmetzger
This is a matter of writing. Let's go back to the past, at the time of CAML.
Write these examples in the Caml top level.
#type t = A of int * int;;
Le type t est défini.
#type s = B of (int * int);;
Le type s est défini.
As you can see there is no signature representing the types with or without parenthesis. Only an acknowledgment of receipt. Then write:

A = B;;
^
Cette expression est de type int * int -> s,
mais est utilisée avec le type int * int -> t.
#B = A;;
Entrée interactive:
B = A;;
^
Cette expression est de type int * int -> t,
mais est utilisée avec le type int * int -> s.
The corresponding signature is int * int, no parenthesis.

Come back to today, with Ocaml write those same sentences in the top level:

type t = A of int * int;;

type t = A of int * int

type s = B of (int * int);;

type s = B of (int * int)
The writing (the answer) of the interpreter is different. Then write:

A = B;;

Characters 0-1:
A = B;;
^
Error: The constructor A expects 2 argument(s),
but is applied here to 0 argument(s)

B = A;;

Characters 0-1:
B = A;;
^
Error: The constructor B expects 1 argument(s),
but is applied here to 0 argument(s)

As you can see, the writing (the answer) of the interpreter is different.
However, it's still the same thing in all these past years.

Perhaps, to summarize this situation, it is necessary to simply improve the error message written in the top level, for a better comprehension?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 25, 2018

Comment author: @Octachron

Sampo, you are getting confused by CAML promotion of constructor to function when they are partially applied. I assure you that constructors do have an arity in OCaml, as stated by the compiler error messages and the reference manual(e.g. http://caml.inria.fr/pub/docs/manual-ocaml/expr.html#sec146).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 26, 2018

Comment author: @hhugo

Would it help to make the parallel with inline record ?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 26, 2018

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 27, 2018

Comment author: sampo

I read the debate with particular attention. Many agree and apparently I will be the only one who does not agree.
In fact, I understand your reasoning at all, but I think that in this particular situation you have to take a solid foundation, a base, that is the Cartesian product. And a Cartesian product always indicates an n-tuple, composed of argument(s), singleton, 2-tuples, 3-tuples,..., n-tuples with or without parenthesis.
So to resume the example of bobbypriambodo I will say that:
type account =
| Facebook(string, int) /* not 2 arguments but a 2-tuple or pair*/;
type account2 =
| Instagram((string, int)) /* not 1 argument -not happens to be a 2-tuple but also a 2-tuple or pair*/;
This is my point of view and I do not get confused.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 29, 2018

Comment author: sampo

@Octachron. I would like to say one more thing before you change the documentation. I resumed the example of trefis with a small change.

type bar = int * string;;

type bar = int * string

type t = A of bar;;

type t = A of bar

let x = (3, "haha");;

val x : int * string = (3, "haha")

A x;;

  • : t = A (3, "haha")
    Here, below, another example that shows some difficulty in calculating the number of arguments.

let addpoint ((x1,y1), (x2,y2)) = (x1+.x2, y1+.y2);;

val addpoint : (float * float) * (float * float) -> float * float =
This function take a pair of points. Its arguments pattern is a pair of pairs.
Look again at the argument pattern of addpoint. We may equivalently view this function as taking:

  • one argument: a pair of pairs of float numbers
  • two arguments: each a pair of float numbers
  • four arguments: all float numbers, oddly grouped
    There is a probability that the compiler is mistaken in calculating the arity of the expressions given to it in a constructor with or without parentheses, between an argument and two arguments like the example above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.