Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array data types #616

Open
wants to merge 14 commits into
base: trunk
from
Open

Array data types #616

wants to merge 14 commits into from

Conversation

@lpw25
Copy link
Contributor

lpw25 commented Jun 14, 2016

This PR adds support for defining fresh array types:

type int_array = [| mutable int |]

Motivation

The benefit of fresh array types over instances of the standard 'a array type is that they can use optimised representations without resorting to a non-uniform representation. OCaml already has a couple of such types builtin: string and bytes, in addition to using a non-uniform representation to optimise float array.

The intention is that in the future one will be able to write definitions such as:

type point = { x: int; y : int }

type point_array = [| mutable point [@unboxed] |]

to create an array type using an unboxed representation of record and tuple types (e.g. point).

This patch only adds support for unboxing float contents (which is done without an annotation just like in records). The support for an [@unboxed] attribute would be better done as part of a large patch to support it for all data types (i.e. records fields, variant constructors, and array contents).

FloatArray

The addition of fresh array types allows an implementation of the FloatArray.t type from #163. This provides an unboxed float array type, without the dynamic boxing of float array. This means it is more efficient than float array (due to fewer dynamic checks) and would serve as a good first step to eliminating the dynamic boxing of float array.

Array primitives.

Since these fresh array types are not equal to 'a array they require their own primitives for construction, mutating and destructing them. As with the other data types (variants and records) these primitives are scoped and support type-based disambiguation. The primitives are as follows:

  1. Field access: x.(5).
  2. Field mutation: x.(5) <- "hello"
  3. Array literals (both expression and pattern): [|1; 2; 3|]
  4. Array length: x.length
  5. Array comprehension: [| x * 2 for x = 0 to 10 |]

The first three are syntactically identical to the OCaml's existing array primitives. The last two fill the roles traditionally taken by Array.length and Array.init. They are needed because we cannot have a function which is polymorphic across all array data types, and they are sufficient for implementing the remaining functions in the Array module.

I imagine some people will be against the array comprehensions on various aesthetic grounds. I would argue that they provide exactly the required primitive, and that they are already well known (and liked) by programmers familiar with other languages. Both for and arrays are from the imperative half of OCaml, so it seems natural to me to use a for-based syntax in connection with arrays.

Notes

  • Whilst part of the intention of this PR is to make the builtin optimised array types less magic, we do not actually use it for bytes and string due to the difficulties around handling -safe-string.
  • The default representation for array data types does not perform the float array optimisation. A [@dynamic_boxing] attribute is used to get the traditional array behaviour. This is not really intended for public use: it is mostly just there so that the Array module can reexport the definition of array.
  • This patch allows the creation of immutable array types. However, they are currently translated identically to mutable array types. I'm going to add another commit to change them to use the new support for immutable arrays added as part of the flambda work.
@Drup

This comment has been minimized.

Copy link
Contributor

Drup commented Jun 14, 2016

Random question: are immutable polymorphic arrays (type 'a array = [| 'a |]) covariants ?

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Jun 14, 2016

Good question. I can't remember if I remembered to fix that.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Jun 14, 2016

Looks like I was on the ball this time: they are indeed covariant.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jun 14, 2016

Even if the proposed set of primitives is complete in theory, it does not allow for an efficient implementation of e.g. the blit operation, except if we rely on the compiler to remove bound checks on such a function implemented naively (with explicit assertions on the lengths) -- but this is another story.

What about polymorphic algorithm such as sorting, or even map? Would the recommended approach be to pass a dictionary of functions (explicitly or with modular implicits)?

An alternative design could be to have a new construction at the module expression level, which would produce not only a fresh array type, but also a (larger) set of operations to work on it as regular functions (which would hopefully be inlined when used directly). E.g.

  module IntArray = [|mutable int|]

This would produce a module matching module type MUTABLE_ARRAY with type elt = int (containing functions such as init, but also possibly blit, iter, etc). This seems less intrusive in the core language than your proposal, which introduces a new kind of core type that can only be used through type-based selection.

In your proposal, you reuse the syntax of field projection for the length. It seems you could similarly reuse the syntax of constructors for the init operation, with the same type-based lookup:

   (Array (11, fun i -> i * 2) : int_array)
@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Jun 15, 2016

Even if the proposed set of primitives is complete in theory, it does not allow for an efficient implementation of e.g. the blit operation

Indeed. To be complete whilst keeping efficiency in terms of bound checks and use of memcpy we would require additional primitives for handling subarrays: x.(2 .. 5) and x.(2 .. 5) <- y including optimisation of the blit case: x.(2 .. 5) <- y.(2..5). These would be reasonable primitives and are inline with what other languages provide for arrays, but I didn't include them in this PR as they are not strictly necessary.

What about polymorphic algorithm such as sorting, or even map? Would the recommended approach be to pass a dictionary of functions (explicitly or with modular implicits)?

Yes. Operations that are polymorphic over arrays with different representations are essentially ad-hoc polymorphism and so they would be best handled with modular implicits.

introduces a new kind of core type that can only be used through type-based selection.

Actually, the PR allows you to specify the appropriate constructors and destructors using paths, just like with records and variants: x.FloatArray.(5), FloatArray.[|1.2|], etc.

In your proposal, you reuse the syntax of field projection for the length. It seems you could similarly reuse the syntax of constructors for the init operation, with the same type-based lookup:

Labels are a reasonable syntax for length because it is a projection, the field just happens to be stored in the block header. Whereas Array (11, fun i -> i * 2) implies that we are constructing a value which contains the function, whilst we are actually going to run the function multiple times and then discard it.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jun 15, 2016

Operations that are polymorphic over arrays with different representations are essentially ad-hoc polymorphism and so they would be best handled with modular implicits.

Would you need to create the instances manually for each fresh array type? What about my proposal of extending the module layer to return directly the instance?

PR allows you to specify the appropriate constructors and destructors using paths

Ok, but this is already at the level of the module system. Two array types in the same structure cannot work without type-based selection. This is already the case with two record types sharing the same labels, but the user is free to choose the labels; for array type, the overloading would mandated by the design.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Jun 15, 2016

Would you need to create the instances manually for each fresh array type?

This is identical to generic operations over records and variants and the same solutions should be used. At the moment, that probably means ppx_deriving and similar.

What about my proposal of extending the module layer to return directly the instance?

Personally, that seems like a pretty ugly solution. It is completely inconsistent with how other data types are handled. I also don't see how it would work for patterns or with type-based disambiguation. What module type are you expecting to give these modules? It wouldn't really help you avoid ppx_deriving: it is not like the set of functions in the standard library's Array module is the last word on arrays, if anything it would make dealing with them more awkward.

Two array types in the same structure cannot work without type-based selection.

True, but I would be surprised if that was a common use-case.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jun 15, 2016

I also don't see how it would work for patterns or with type-based disambiguation.

I don't think this needs to be supported; arrays are so rare (strings excluded, but they would keep some ad hoc treatment anyway). Same for array literals.

type-based disambiguation

We don't need that, we just use IntArray.get, etc.

What module type are you expecting to give these modules?

MUTABLE_ARRAY with type elt = ... or IMMUTABLE_ARRAY with type elt = ... with built in definitions for these two named signatures, like:

module IMMUTABLE_ARRAY = sig
  type t
  type elt
  val init: int -> (int -> elt) -> t
  val get: t -> int -> elt
  (* and perhaps some more fields, unsafe_get, sub, blit, etc *)
end
@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Jun 15, 2016

We don't need that, we just use IntArray.get, etc.

This would mean anyone hoping to switch from float array to FloatArray.t would need to rewrite their code into a completely different style.

To be honest, I see no real benefit in the approach you are proposing, whilst there is an obvious cost in the usability of the array types.

@yallop

This comment has been minimized.

Copy link
Member

yallop commented Jun 15, 2016

It looks like you're not storing anything in the object tag. So could this approach support "inline" array types, like this?

type 'a tree =
   Leaf of 'a
 | Branch of [| 'a |]
@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Jun 15, 2016

It looks like you're not storing anything in the object tag. So could this approach support "inline" array types, like this?

I don't immediately see a reason why not, and that certainly seems like a useful feature.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jun 15, 2016

would need to rewrite their code into a completely different style

It's local syntactic changes, not a different style. Getting rid of the syntax for literals and array access seems very minor to me (and a new indexing syntax could be introduced, and combined with modular implicits, if really desired). Switching a few a.(i) to FloatArray.get a i is not really a big deal; we already need to rewrite Array.length a to FloatArray.length a or a.length anyway.

With your proposal, one would already need to add type annotations and replace Array by FloatArray. And this works because FloatArray comes with a lot of hand-written code. For another array type, one would need to use a new syntax for init and we loose unsafe array access and efficient blit (except by piling up some more syntax).

Relying on the module system seems much more natural to me. This is the standard way to create ad hoc monomorphic data structures (Set.Make, Hashtbl.Make). The only difference here is that the Make functor for arrays behaves differently according to the input type and cannot be implemented in the language, so it needs to be built-in.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Jun 15, 2016

For another array type, one would need to use a new syntax for init

Or just use ppx_deriving to build the functions from the Array module. If you want generic functions you should use a mechanism for implementing generic functions.

The only difference here is that the Make functor for arrays behaves differently according to the input type and cannot be implemented in the language, so it needs to be built-in.

That's not some minor difference: it is fundamental. Set and Hashtbl are parametric in their parameter, whilst these array types aren't -- that is the whole point. Using the module system for something like this is completely alien to the rest of the language.

Adding expression syntax is cheap, the only cost is that people need to know what the new syntax means. That is why I have been emphasising that the new syntax is common in existing languages: people already know what it means. This makes the cost of the new syntax very low, whilst there are obvious benefits in terms of usability to having array syntax (otherwise why have array syntax in the first place).

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jun 15, 2016

That's not some minor difference: it is fundamental.

I don't think it's fundamental. For instance one can write a functor which chooses a custom representation of arrays based on a type witness:

type _ ty =
  | Char: char ty
  | Pair: 'a ty * 'b ty -> ('a * 'b) ty
  | Other: 'a ty

module type TY = sig
  type t
  val ty: t ty
end

module type ARRAY = sig
  type t
  type elt
  val init: int -> (int -> elt) -> t
  val get: t -> int -> elt
end

let rec make_array: type t. t ty -> (module ARRAY with type elt = t) =
  function
  | Char ->
    (module struct
      type t = string
      type elt = char
      let init = String.init
      let get = String.get
    end)

  | Other ->
    (module struct
      type elt = t
      type t = elt array
      let init = Array.init
      let get = Array.get
    end)

  | Pair (ty1, ty2) ->
    let module A1 = (val make_array ty1) in
    let module A2 = (val make_array ty2) in
    (module struct
      type elt = A1.elt * A2.elt
      type t = A1.t * A2.t
      let init n f =
        let res = Array.init n f in
        A1.init n (fun i -> fst res.(i)),
        A2.init n (fun i -> snd res.(i))
      let get (a1, a2) i =
        A1.get a1 i,
        A2.get a2 i
    end)


module MyArray(X : TY)() : ARRAY with type elt = X.t = (val make_array X.ty)
@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented Jun 15, 2016

Fair enough. Generative functors do allow the creation of fresh non-parametric types.

This discussion seems to come down to whether the syntaxes [| x for x= 0 to 10 |] and a.(1..10) are easier for people to understand than module FooArray = [| mutable foo |].

Personally, I think the expression syntaxes are much easier for people to learn since they already exist in other languages and are very similar to existing syntaxes. Whereas, I would expect most people to respond to the module syntax with some variation on "WTF is that".

Combined with the additional benefit of supporting standard array syntax on fresh array types, I do not see the advantage of the module-based approach.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jun 15, 2016

Can you give a concrete example on how we would create "instances" (for modular implicit, or explicit passing of dictionnaries) for generic array-based algorithms, in your proposal? My current understanding is that you would need to write something like (in the explicit case):

  type int_array = [| mutable int |]
  module IntArray = struct
    type t = int_array
    type elt = int
    let init n f = [| f x for x = 0 to n - 1 |]
    let length (a : t) = a.length
    let get (a : t) i = a.(i)
    let set (a : t) i x = a.(i) <- x
  end
  ....
  let sort_int_array = generic_sort (module IntArray)

and you would need to duplicate the code above for each instance.

This discussion seems to come down to whether the syntaxes [| x for x= 0 to 10 |] and a.(1..10) are easier for people to understand than module FooArray = [| mutable foo |].

Rather between

  type int_array = [| mutable foo |]
  let a0 = ([| x for x = 0 to 10 |] : int_array)
  module IntArray = struct 
    (* see above *)
  end
  let a1 = generic_sort (module IntArray) a0

and

  module IntArray = [| mutable foo |]
  let a0 = IntArray.init 11 (fun x -> x)
  let a1 = generic_sort (module IntArray) a0
@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Dec 4, 2016

Six months after this hot debate, what do we do now?

@mshinwell mshinwell added the suspended label Dec 28, 2016
@bluddy

This comment has been minimized.

Copy link

bluddy commented May 24, 2017

@xavierleroy let's merge it!

@gasche

This comment has been minimized.

Copy link
Member

gasche commented May 24, 2017

I like the underlying design idea behind this proposal (to make it easy to avoid the cost associated to the float array optimization, let a thousand array types bloom!), but I am still as unconvinced by the execution as I was when the feature was first presented to me. It is a coincidence that @lpw25 was able to find a cute syntax for each of the array operations he wanted to support; I don't see how I would explain to ours users why we have a nice array comprehension syntax but not, for example, a list comprehension syntax (and maybe a monadic comprehension syntax, etc.).

In that way Alain's syntax-less proposal is nicer (and the point that you inherit built-in implementations of more functions is good), but the specific of writer [| 'a |] as a module are still syntactically unpalatable -- that's one case where I would rather have an extension syntax [%array-module: 'a].

To me the whole thing feels like something that may not work very well with other aspects of the language that we don't fully understand yet, like modular implicits. Rather than import a bunch of new syntax now that we are likely to regret, I would rather go with a syntactically-minimal proposal (we can always add more syntax later, but removing it is hard), or even wait for the other pieces to fall into places first: I would expect wider usage of modular implicits to give us a better perspective on the proposed feature.

@bluddy

This comment has been minimized.

Copy link

bluddy commented May 25, 2017

Is it fair to refer to features coming down the pipeline, such as modular implicits? Given the glacial rate of progress in this community - and I know it's been picking up speed, but still - I'd be surprised if modular implicits landed earlier than 3 years from now. Should that oncoming revolution of 2020-2025 weigh on this feature, as well as countless others?

@gasche

This comment has been minimized.

Copy link
Member

gasche commented May 25, 2017

Language design is not about being "fair", it is about getting things right. The compatibility constraints that we work with means that things pretty much have to be right on the first time.

match with .. exception .. waited 13 years (between Benton&Kennedy's 2001 "exceptional syntax" proposal and the merge of the actually right syntax), but now the design is right, and effect handlers are building on top of it.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented May 25, 2017

I genuinely don't understand people's distaste for syntax. It seems to be a common problem with functional programmers: they always want everything to be a function. Yet only new syntax is ever held to this standard: record projections could easily have been functions but they're not, and it is more consistent for array projections to also not be functions.

As for wanting something more general, again this applies just as much to existing syntax: are c-style for loops the most general thing? No, but they're useful and they are common across a number of languages. Similarly array comprehensions are useful and common across a number of languages. I'm actually more inclined to add a bit more syntax to this proposal, for example it is easy to have a nice syntax for subarrays and array concatenation that is familiar from other lanugages.

With regards to forward compatibility with modular implicits, I'm a firm believer that efficient code is easier to write when the syntax distinguishes the operations guaranteed to be "single operations" from arbitrary computations. Whilst it would be nice to have something like x.[y] use implicits to perform some arbitrary projection, there should also be a simple array projection x.(y) which is always just reading a value -- otherwise you or your IDE needs to perform implicit resolution at every array projection in order to check that it is just an array projection. I also think that it is more consistent with the design of the rest of OCaml: OCaml is a language that uses distinguished syntax for the primitive operations on all of its builtin types -- it could have used functions in a number of places but it doesn't.

It is also worth again considering the case of the c-style for loop. Implicits enable a more general for x in y do ... done loop to be supported (using implicits to look up the appropriate iteration for the type of y), a more general style of loop supported in a number of languages. The existing for loop does not prevent this addition, if anything it makes the addition more palatable. Similarly, I doubt that the existence of array comprehensions would prevent the later addition of more general implicit-based comprehensions. Of course, in both these cases, you could say "but now we have two things when we could have had one", but the aim of language design is not to have the least amount of syntax -- I'm looking at you Lisp -- it is for the syntax to be consistent and intuitive with an easy narrative to explain the semantics.

Fundamentally, this proposal is about replacing a builtin magic type in the initial environment with a proper type in the language that could have been defined by the user and has proper consistent -- and well-established -- syntax for the primitive operations on this type.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented May 25, 2017

Whilst we're back on this proposal again, I'll describe my upcoming plans for this pull request:

  • Add support for:
type t = I of int | A of [| t |]

as suggested by Jeremy. I think this will be a very useful feature -- I certainly have places in my code where this would make things cleaner and more efficient.

  • Add support for a sub-array syntax, probably x.(i .. j) and x.(i .. j) <- a

  • Allow [@unsafe] annotations on array projections to avoid the bounds check

  • Possibly add support for array concatenation syntax, other languages have things like [| x; ..y; ..z |] using .. to indicate taking the elements from an array. Still undecided on this one.

I'll also rebase it up to trunk, and I may write a quick implementation of [@deriving array] in the same vein as [@deriving fields] and [@deriving variants].

@bluddy

This comment has been minimized.

Copy link

bluddy commented May 25, 2017

In that way Alain's syntax-less proposal is nicer (and the point that you inherit built-in implementations of more functions is good), but the specific of writer [| 'a |] as a module are still syntactically unpalatable -- that's one case where I would rather have an extension syntax [%array-module: 'a].

@gasche, I'm not sure what you mean here exactly, but this sounds quite appealing as a stepping stone that doesn't commit us to syntax we'll need to support forever-more. Something like [%array-gen: float] could be intercepted by the compiler and generate all needed functions (create, get, set, blit, etc) in-place for a type of [|float|]. It would be up to the user to wrap it up in a module if she so chooses. Is that what you meant? I think this is a great solution, and the fact that it involves an extension (and requires generation) communicates that it's a low-level construct. It would allow us to also gradually figure out if we need to expand this basic set of primitive automatically-generated functions, without experiencing the sudden, and quite unfortunate need to add more syntax (as is evidenced in @lpw25's last comment).

@gasche

This comment has been minimized.

Copy link
Member

gasche commented May 25, 2017

What I had strictly in mind was to have this generate a module expression along the line of Alain's proposal, but what you suggests also work, with the tweak that it needs to be attached to the actual type definition to be appropriately generative (not all [| 'a |] types are equal, if I understand correctly, in Leo's design). So the more natural syntax for the not-a-module presentation would actually be

type (...) t = [| ... |]
[@@deriving array]
@gasche

This comment has been minimized.

Copy link
Member

gasche commented May 25, 2017

I genuinely don't understand people's distaste for syntax.

It's not a dislike of syntax -- @alainfrisch probably proposed more syntax changes than the rest of us combined. You have a proposal that is about giving users control over the representation of their arrays, for efficiency reason. I think the underlying idea behind the proposal is good and I would like it to be accepted. But it comes with a bunch of syntax that, in the current form of the feature, we have to accept as well. It is this coupling between a design proposal for efficiency and a bunch of syntax proposals that I find distasteful, rather than the syntax proposal itself.

I like your proposal of having the indexing notation work for those new array types -- in that respect, I find the proposal better than Alain's suggestions to just use Foo.{get,set}. But you are also asking for new syntax for comprehensions, and soon a syntax for slices.

If you insist that the particulars of the syntax must be considered together with the core design idea, we can do it. I think that indexing and slicing notations that you propose are fine, but that the comprehension syntax you propose clearly violates a notion of syntactic completeness.

When users see that something works, they will often generalize and expect other things to work as well. For example, when we see that (=) works to test both integers and strings for equality, it is natural to also expect it to work for floats, integer lists, etc. (In contrast (+) obviously fails to generalize to the next most obvious thing, namely floats, so one does not expect much generality here.) A new user seeing your list comprehension syntax in use could expect two axes of generalization that your proposal does not support:

  • More comprehension features: if I can do "init", I would also expect map, but also filter (this expectation comes from python), and various forms of bind / cartesian product. The standard comprehension syntax [| e | x <- e; guard; ... |] easily scales along those dimensions, but your proposal (with just one for) may not.

  • Comprehensions on more types, on user-defined types. Haskell and F# desugar comprehensions into operations on the underlying datastructure, that make comprehension usable on many types, and enable user to use comprehensions on their user-defined types. Your proposal only supports comprehension on arrays (which is hardly the first type where one would have a use of them), no other type, and it is highly unclear how that would generalize. (In fact, we would expect to have the same conflict between type-directed and user-defined syntaxes at this level as there is between your proposal and @Octachron's user-defined operators.)

(I have written Camlp4 extensions for comprehensions that supported extensibility along these two axes because users were asking for it.)

(In comparison, your proposal for slicing is fine with respect to generalization. Users will expect to be able to slice everything that they can index, which suggests slicing strings and bytes as well, but you precisely propose (as a latter step, if I understand correctly) to have those types defined by those new array types, which would enable the syntax for them.)

One possibility to decouple the syntactic and efficiency aspect of the proposal would be to combine Alain's approach with Octachron's user-defined infix operators. This is not strictly compatible as the infix operators have to be opened in scope to be usable, but it also provides the property that Leo mentions and I agree is important that current array-using code doesn't need to be changed globally to use a different array type. On the other hand, this proposal also interacts in unpredictable way with the preferred overloading style in a modular-implicits world, and it may also turn out to be unsatisfying in the end.

I am not convinced with the idea that built-in syntax should in general correspond to efficient built-in operations. I think that syntax should first and foremost aim at readability. The work on modular implicits precisely stems from the desire to have a concise/readable syntax for ad-hoc polymorphism, in a user-definable way that in no way follows efficiency boundaries.

Modular implicits and fields/constructor disambiguation are two different features that bring type-directed behavior to the language (although for fields/constructor disambiguation some argue that we have a type-erasure semantics). Your proposal extends the range of syntactic forms that benefit from built-in type disambiguation in a way that I find difficult to justify, in the light of (1) current reserves on the usability of type propagation behavior for fields/constructors and (2) our lack of experience with implicit-aware library design, which may very well subsume this proposal by proposing better (and possibly incompatible) ways to do overloading.

Your proposal to have two different syntaxes that are overloaded in different ways, one being type-directed and efficient and the other being user-definable (either through scoping or implicit resolution) and flexible, has merits. I'm not sure I like it, I'm not sure this coupling is robust, but I can see advantages to it -- it is a form of "tooling" to reason about performance that is very simple. But it does not suffice to avoid debate on the particulars of the syntactic forms you propose, in that case the comprehension syntax.

@bluddy

This comment has been minimized.

Copy link

bluddy commented May 25, 2017

@lpw25, what is you opinion on an extension-based solution as a way to integrate the feature into the language -- we can reconsider expanding the syntax down the line, once we have more experience.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

lpw25 commented May 26, 2017

You have a proposal that is about giving users control over the representation of their arrays, for efficiency reason.

It's not just efficiency -- I'm also quite keen on avoiding builtin types that couldn't have been defined by the user. I don't really like it when the OCaml runtime knows how to handle a representation that OCaml as a language has no idea about.

It is this coupling between a design proposal for efficiency and a bunch of syntax proposals that I find distasteful, rather than the syntax proposal itself.

To be fair, I am not bundling two proposals together here. The syntax is necessary and I can see no reasonable way to avoid it -- of course the form of the syntax is completely open for debate and I welcome suggestions on it.

The proposal here is for making array types ordinary OCaml data types. Ordinary OCaml data types are:

  • defined using the type construct
  • have constructors/destructors that are syntactic forms with type-based disambiguation

I think that keeping these properties is very important for consistency -- more important than any of the other concerns I've seen raised in this discussion -- so I still don't see how to avoid having some syntax.

As for the specific choice of syntax for these constructors there is more that can be done.

  • Alain has pointed out that the current implementation has no support for the unsafe versions of the operations. I think we can use an [@unsafe] attribute for that since the unsafe versions are essentially optimisations of safe versions and we already use attributes to control optimisations.

  • Alain also points out that whilst my syntax is technically sufficient to implement everything in Array it can not implement some operations as efficiently as the current implementation, at least not without some very clever optimisations. Rather than try to rely on optimisations I think the best approach here is to extend the array syntax to allow operating on groups of indices at a time, since it is this contiguous grouping that allows things to be done more efficiently, hence my suggestion of including a .(n .. m) operation.

  • You've pointed out that by using a comprehension syntax for the dynamic array constructor creates an expectation that other comprehension syntax might be available. Personally I tend not to worry about user expectations this way around: I think it is important that when a user reads the syntax that it does what they expect, and that the syntax works the same way everywhere, but I don't tend to worry about them expecting other similar syntaxes to work since the cost here is low -- they'll try it, get a syntax error, maybe look online to see if there is such a syntax, and then move on, they'll be disappointed but they won't be confused. Still it is a reasonable concern and I can see two possible approaches: we could try to use a syntax other than comprehension for dynamic array construction, or we could add more support for comprehensions to the language. I'm open to both of these although I have my doubts about being able to create another syntax which is still intuitive.

what is you opinion on an extension-based solution

Really not a fan. Using extension points for fundamental extensions to the language meets no criteria for consistency or intuitiveness. The only advantage being that you can say "It doesn't count because it's an extension point" whenever someone points out that the feature has a terrible syntax that baffles users.

@bluddy

This comment has been minimized.

Copy link

bluddy commented May 26, 2017

I guess I was seeing this as a way to make the language more efficient without really expanding it for the average user, something like unboxed Haskell types, which are there for the experts to tune performance with, whereas @lpw25 is seeing it as an opportunity to 'retcon' OCaml as if it had these types all along, including all necessary supporting syntax. I'm coming around to seeing it @lpw25's way, though -- this makes sense.

  • @lpw25, is your intention to redefine 'a array as type 'a array = [| mutable 'a |]?
  • I'd love to get away from comprehensions to something less controversial, but I can't think of anything better either. There's nothing in the existing syntax that automatically suggests size of a data type rather than its value. So long as these comprehensions, as well as all other new syntax, are inherited by regular array types as well (which they are in the current patch), I think it's fine.
  • Is concatenation necessary? We should really minimize the amount of new syntax IMO.
  • Since types definitions are generative, is there any way to reset the bias of type disambiguation as affected by defining a type (I get it, it's like the bias we get from defining a record type) other than opening a module with a pre-existing type definition?
@rleonid

This comment has been minimized.

Copy link
Contributor

rleonid commented May 30, 2017

As a pretty heavy array user, I'd encourage further progress on this issue. I find @lpw25 arguments convincing.

@mrvn

This comment has been minimized.

Copy link
Contributor

mrvn commented Jun 27, 2017

Maybe I missed something but what's the status of passing (mutable) array elements to other functions?

Last I checked it is not be possible to pass unboxed records or tuples to other functions the way values are passed. Instead a pointer + index would be needed. Or copying. And copying doesn't work with mutable and destroys physical equality.

Overall I see this only working when no functions are called on elements or those functions can be inlined. Otherwise there is a performance penalty and breakage.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jun 27, 2017

Maybe I missed something but what's the status of passing (mutable) array elements to other functions?

We don't pass a "mutable array element " to functions anymore than we do today. We can get an element out of an array and pass it to a function. If we want to pass a reference to an element so that the function can mutate it, we pass the array + the index, as we do today.

@mrvn

This comment has been minimized.

Copy link
Contributor

mrvn commented Jul 18, 2017

Maybe I missed something but what's the status of passing (mutable) array elements to other functions?

We don't pass a "mutable array element " to functions anymore than we do today. We can get an element out of an array and pass it to a function. If we want to pass a reference to an element so that the function can mutate it, we pass the array + the index, as we do today.

Except that is not true. Till now an array contains immutable primitive values or pointers and that pointer is simply passed to other functions. If the pointer is to a record or object with mutable fields it can simply be mutated in place. As you say, for array data types, you have to pass array + index to functions to get that behaviour and that needs something new. A function 'a -> unit doesn't handle 'a = 'b array * 'b index on its own.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jul 18, 2017

Till now an array contains immutable primitive values or pointers and that pointer is simply passed to other functions.

I'm not sure what you're talking about, but no, one cannot pass a pointer to some middle of an array to a function. One extracts the value stored in some array cell and pass this value to a function. And the current proposal does not change anything in this respect.

@mrvn

This comment has been minimized.

Copy link
Contributor

mrvn commented Jul 18, 2017

It's not a pointer to the middle of an array that is passed, it is the pointer stored in the array cell that is passed. Problem is that with the unboxing suggested there is no more pointer. And "extracting" the array cell then means copying the record or object stored there. And that breaks mutability. Or is the array data type now restricted to immutables?

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jul 18, 2017

Ok, the problem is not the mutability of the arrays, but the mutability within its elements. This PR (as far as I remember) does not implement the unboxing representation yet. When/if this is added, it is likely that this unboxed representation cannot be used for a record type with mutable fields (or alternatively that we will just document the copying semantics in that case).

@alainfrisch

This comment has been minimized.

Copy link
Contributor

alainfrisch commented Jul 18, 2017

Note that you started the discussion with this example:

type point = { x: int; y : int } type point_array = [| mutable point [@unboxed] |]

here there is no problem. The array is mutable but the elements are not. So there is no observable change in semantics compared to today's regular arrays.

@bluddy

This comment has been minimized.

Copy link

bluddy commented May 9, 2018

In the interest of getting the actual feature merged, I think the notion of adding all this syntax to the core language needs to be abandoned and debated independently. I'd much rather see the

type (...) t = [| ... |]
[@@deriving array]

solution implemented, whereby magic functions are created for t. This is the only solution that will be agreed upon simply because it doesn't intrude into the rest of the language. Once we have this solution in place, we can discuss whether introducing syntax elements make sense on a case-by-case basis.

@ubsan

This comment has been minimized.

Copy link

ubsan commented Dec 23, 2018

What's the status of this? I would really like immutable arrays in my compiler.

@lpw25 lpw25 mentioned this pull request Sep 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.