New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-defined indexing operator without array indexing #622

Closed
wants to merge 12 commits into
base: trunk
from

Conversation

Projects
None yet
9 participants
@Octachron
Contributor

Octachron commented Jun 19, 2016

This pull request cherry-picks the features of the user-defined indexing operators branch orthogonal to the current array data types proposal #616.

More precisely, it adds two families of operators:

  • .[] for simple indexing
  • .{} for multidimensional indexing

that can be redefined by users like other operators. For instance,

type matrix = { dim:int; array: float array }
let (.[]) m (i,j) = m.array.( i + j * m.dim )
let (.[]<-) m (i,j) v = m.array.( i + j * m.dim ) <- v 

(see #69 and the included manual documentation for more information)

The array-like indexing operators (.( )) are left unmodified and free to be improved.

If the array data proposal focuses on array-like data types, this proposal is more concerned with other data types that implements a map between two finite sets. Standard library's map and hashtable or multidimensional array are three examples of data types that could benefits from a short indexing syntax. It seems difficult to describe within the compiler all the kinds of maps between finite sets; therefore user-defined indexing operators might be a better fit here than the specialized field projection proposed for array data kinds.

Moreover, to compensate, the loss of one user-definable family of index operators, the current indexing syntax is extended to support module path prefix like in the array data type proposal:

  • a.M.[i]M.(.[]) a i
  • a.M.[i]<-vM.(.[]<-) a i v
  • a.M.{i}M.(.{}) a i
  • a.M.{i}<-vM.(.{}<-) a i v

Note that this pull request integrates all the compatibility patches for the Bigarray modules already present in the user-defined indexing operator branch and should not break any code except for code relying on the old and undocumented parser-level implementation of indexing operators (i.e. the mapping between a.(i) and Array.get a i).

Octachron and others added some commits May 24, 2014

Add a special syntax for index operators
This commit introduces a new syntax for index operators.
Five core parenthesis operator are added:
.[], .{}, .{,}, .{,,}, .{,..,}.
The .{,}/.{,,}/.{,,,} operators are defined for compatibility with the
Bigarray syntax extension.
Each core index operator is available in a access and assignement
versions. For instance, .[] is declined in
* .[] : index operator
* .[]<- : indexed assignment operator
The general syntax for these index operators as implemented in the
parser is index_operator::= index_operator_core [<-].
User-defined indexing operator (.{})
This commits modify the Bigarray syntax extension in order to facilitate
the use of custom (.{}) operators. The compatibility with the existing
Bigarray syntax has been preserved as much as possible. However, this
commit will break code which use the Bigarray (.{}) syntax without opening
the Bigarray module first!

Like the previous commit, this commit modifies the parser to desugar
bigarray1.{index} to ( .{} ) bigarray1 index. Following the bigarray
syntax, the index operator used in the desugaring changes if the index
is a n-tuple:

1-tuple ⟹ `.{}`
2-tuple ⟹ `.{,}`
3-tuple ⟹ `.{,,}`
4 and more tuples ⟹ `.{,..,}`

The bigarray modules has been modified to use this new index operators.
Note that this means that these index operators are not anymore
accessible without opening the bigarray module.
PR#6885: documentation for index operators
This commit documents the new syntax for index operators ( .[], .{})
and updates the documentation of the bigarray specific syntax:

  * A new section "Customizable index operators" (7.28) describes the
  index operator syntax. Within this section, two subsections details
  respectively the particularities of the multidimensional index
  operator (.{}) and some potential source compatibility problem with
  the previous  bigarray specific syntax.

  * The "Syntax for bigarray access" section (7.21) has been partially
  removed and only mention that this extension has been superseded by
  the new extension and deprecated, with forward references to the new
  section 7.28 and compatibility subsection

  TODO:
  * The documentation would have to be updated again when/if the
  mantis issue #6765 is integrated in trunk : for now, the
  documentation only mention that using the ( .{} ) syntax without
  opening the Bigarray module is "deprecated".
User-defined indexing operator (.[])
This commit modifies the parser to use the newly defined (.[]) and
(.[]<-) operators. It also moves the definition of the .[] operators for
String/Bytes to the pervasives module.

Before this commit, expressions of the form `string.[index]` where
desugared to String.get[_unsafe] string index. The safe or unsafe
version were chosen depending on the presence of the "-unsafe" compiler
option. Such expression are now desugared to `( .[] ) string index`.
The same desugar operation is applied to `string.[index] <- value`
which is translated to `( .[]<- ) array index value`.

In order to keep the standard semantic for string index operations,
these new index operators are defined in the pervasives module using
new compiler primitives, e.g. ` let .[] = "%string_opt_get"`.

These new primitives are then mapped to safe or unsafe version
depending on the the "-unsafe" compiler option. Consequently, these
modifications should have no impact on existing code.

With these modifications, defining custom `.[]` operators should be
easier, at the cost of losing access to the standard index operator
for string.
Indexing submodule for bigarray
The objective of this commit is to introduce a short notation for
bringing in scope the bigarray index operators and only them. For
that purpose, the bigarray index operators are regrouped in a single
submodule. This submodule is also included inside the global bigarray
module to preserve compatibility and ease of use of the bigarray
module.
Bigarray index operators deprecated warning
With the simplification of index operators, the expressions a.{..} are
no longer automatically resolved to Bigarray.Array[n].[g|s]et. To use
these operators, it is now necessary to bring them in scope, for
instance by opening either the Bigarray or Bigarray.Operators module.
To ease the transition period, this patch add an hack in
`typing/typetexp.ml` to catch the cases where the index operators
`.{}/.{,}..` are used without being bound in the current scope
and tranlate then to Bigarray.(..)  with a deprecated warning.
Manual: Document bigarray compatibility warning
This commit update the documentation on the compatibility problems
between the deprecated bigarray specific syntax extension and the new
user-defined index operators extension. In particular, this commit
describes the new deprecated warning for implicit use of the
`Bigarray(.{...})` operators and states that this warning might be
turned into an error in the undetermined futures. This commits also
amend the documentation to mention the new `Bigarray.Operators`
submodule when useful.
Manual: Fix user-defined index operators
Change the name from customizable to user-defined index operators and
fix the alignment of the latex tables for better readability.
Extended syntax for indexing operator
Enable module path prefix for indexing operator:
* `a.M.[i]` ≡ `M.(.[]) a i`
* `a.M.[i]<-v` ≡ `M.(.[]<-) a i v`
* `a.M.{i}` ≡ `M.(.{}) a i`
* `a.M.{i}<-v` ≡ `M.(.{}<-) a i v`
@gasche

This comment has been minimized.

Member

gasche commented Jun 25, 2016

We discussed this solution during the development meeting where @lpw25 first proposed type-directed array resolution: having .( ) type-directed and .[ ], .{ } scope-directed. (Maybe @lpw25 himself proposed that as a way to avoid reverting your change.)

I was opposed to it at the time and I still think it's a hack. Choosing the semantics of a language feature based on whether one uses parentheses or accolades feel completely arbitrary to me, and I see no justification other than "well they were two proposals at the time...". What if people later ask for .[ ] to be type-directed (bytes and string, say?), or for .( ) to be scope-directed?

Is there not a more satisfying solution to resolve this tension?

(I wonder if it would be possible to use the type-directed discipline when the typing information allows it, and the scope-directed discipline otherwise. It seems tricky and possibly wrong from the point of principality -- adding more type information changes the lookup strategy --- but maybe it just extends what is done for records?)

(Another solution would be to let users define type-directed access iterators for arbitrary types -- instead of introducing new operators in scope as in this proposal. That is, make @lpw25's proposal user-extensible. Can we get a coherent design this way? type t = foo with (.()) = get and (.()<-) = set)

@lpw25

This comment has been minimized.

Contributor

lpw25 commented Jul 5, 2016

Choosing the semantics of a language feature based on whether one uses parentheses or accolades feel completely arbitrary to me, and I see no justification other than "well they were two proposals at the time...".

I don't really agree with this. Distinguishing two semantically different operations by which symbols they use seems fairly natural. Method call and record projection are distinguished by whether we use . or #, why not distinguish projection primitives (.()) from arbitrary indexing functions (.[]) by the choice of parentheses. It is slightly unfortunate that this proposal uses . for something other than a projection primitive, whereas currently . always indicates a projection, but I think the cost is probably worth it to have nice syntax for indexing functions.

What if people later ask for .[ ] to be type-directed (bytes and string, say?), or for .( ) to be scope-directed?

I don't think of the choice as about type-directed or scope-directed, but as about whether these operations are primitives. The choice was already made to make primitive operations support type-based disambiguation whilst leaving regular functions completely scope directed. This choice is not currently observable with array primitives as there is only a single-array type, but by allowing multiple array types (including redefining some existing types as arrays -- string, etc) we naturally get type-based disambiguation of the array primitives.

We are already heading towards a situation where primitive operations use type-based disambiguation, whilst regular functions use modular implicits to get similar behaviour. I would expect the same thing to happen here. .(), as a primitive, would use type-based disambiguation whilst .[], as an ordinary function, would use modular implicits.

I think there are tangible benefits from syntactically distinguishing primitive operations -- which have no computational content -- from function application. At the very least this is good for the value restriction.

Note that with the array data types proposal .() will work with string and bytes, so there will already be a type-directed primitive for these types. (Not actually implemented yet in the PR due to some issues with -safe-string, but I would implement that before suggesting merging).

So I would be in favour of merging this proposal. (To be clear, I have not reviewed the code itself, I just mean that I am in favour in principle).

@garrigue

This comment has been minimized.

Contributor

garrigue commented Jul 7, 2016

We are already heading towards a situation where primitive operations use type-based disambiguation, whilst regular functions use modular implicits to get similar behaviour.

What are you pointing at? If this is about record field access and datatype constructor, I repeat my view that the semantics do not use the type at all (the disambiguation is purely a compilation artefact). How can you guarantee that if the user has to define his own accessor functions?

@lpw25

This comment has been minimized.

Contributor

lpw25 commented Jul 7, 2016

How can you guarantee that if the user has to define his own accessor functions?

For clarity, I'm precisely saying that user functions do not use type-based disambiguation, whereas primitive operations (record fields and variant constructors) on types do. My only proposal (in a different PR) is to use type-based disambiguation for array primitives (as part of allowing user-defined array types).

Gabriel is suggesting that having both my proposal and the one in this PR is unsightly because the array operations will use type-based disambiguation whilst the user-defined indexing operators won't. My point was that this difference is already in the language, and it is about whether or not something is a primitive operation or a user-defined function.

@garrigue

This comment has been minimized.

Contributor

garrigue commented Jul 8, 2016

OK, looks like I got confused by the two PRs.

I would not really describe this as primitive vs. user-defined, but rather (guaranteed) uniform semantics vs. ad hoc semantics; otherwise I agree with you that it is wise to distinguish the two semantics.

@xavierleroy

This comment has been minimized.

Contributor

xavierleroy commented Dec 4, 2016

Six month later, any progress made on this one?

@Octachron

This comment has been minimized.

Contributor

Octachron commented Jan 13, 2017

Seven months later, I still not see a clear-cut way to resolve the tension between the potentially primitive .() operators and the non-primitives .[]/.{} operators, which share the same basic objectives (accessing an element of an indexed family) but with a different status; and thus different scope rules.

A possible solution might be to increase the syntactic distance between the two indexing operators family. However, choices are limited (with ascii characters): if we do not want to add yet another brace variations,
we can only replace the separator . . If I am not mistaken, the only possible separator here would be ?, ` or ~.
The syntax a?[x] may work well (in particular for dictionary or other data type where the natural return type is an 'a option), but a?[x]<-y is already more startling, and ? might be too visually invasive.
I find a`[x] mildly acceptable but foreign, and I don't think a~[x] conveys the right meaning.
Nevertheless, all syntaxes would suffer from their exoticism.

All in all, I think that the a.{x}/a.[x] syntax is still the optimal one for user-defined indexing operations,
even if it is not perfect when combined with array data types.

Another point to consider, maybe, is that user-defined indexing operators are not the only advantages of the (.[]) syntax: it can be also useful to be able to distinguish in the parsetree a.[x] and
String.(unsafe_)get. In particular, this would make easier to deprecate the .[] syntax.

In brief:

Positive Negative
Nicer syntax for user-defined data types More design space for operator hell
More information in the parsetree AST Tension with projection primitive
Deprecatable (?) Bigarray compatibility hack

(I need to check if the move of bigarray towards stdlib is enough to remove the bigarray compatibility hack).

@Drup

This comment has been minimized.

Contributor

Drup commented Jan 13, 2017

There is also a#[3].

The good thing about # is that there is a concrete precedent for using # instead of . for ad-hoc-y things.

@Drup

This comment has been minimized.

Contributor

Drup commented Jan 13, 2017

(This is half a joke by the way, I still think the best solution is to arbitrarily decide that .( ) is for type based access and open all the others for redefinition, and live with that)

@hcarty

This comment has been minimized.

Contributor

hcarty commented Jan 13, 2017

One possible benefit to ( #[] ) is that it opens up ( #() ) too. It removes some of the visual ambiguity between the behavior of .() and .[].

It may make more sense as a method syntax though, if someone wanted to use method ( #[] ) in a class for some reason.

@hcarty

This comment has been minimized.

Contributor

hcarty commented Feb 8, 2017

Is there any chance this will make it into 4.05.0? If the answer is unknown is there anything users can do to help?

@Octachron

This comment has been minimized.

Contributor

Octachron commented Mar 7, 2017

During the last developper meeting, it was briefly discussed that having a clearly distinct syntax from primitive projection operator was desirable. Since such syntax is implemented in #1064, I am closing for now this specific PR.

@Octachron Octachron closed this Mar 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment