New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add Array, Dict, Set Literal Syntax #12

Closed
TheSeamau5 opened this Issue Jul 21, 2015 · 26 comments

Comments

Projects
None yet
@TheSeamau5

TheSeamau5 commented Jul 21, 2015

Inspired by Clojure's collection literals, I propose to add support for Array, Dict, and Set literals (analogous to List syntax).

Proposed syntax

Array

array = 
  #[1, 2, 3]

Rationale: Looks like a list but with a hash sign.

Dict

dict = 
  #{ "key1" : "value1"
   , "key2" : "value2"
   , "key3" : "value3"
   }

Rationale: Looks like Json. In spirit, Dicts and Json serve the same purpose--to store key-value data.

Set

set = 
  #{1, 1, 2, 3, 2}

Rationale: While we're at it, why not? I mean... Clojure does it.

In all cases, the main goal of the syntax is to encourage the usage of these datastructures, which have advantages that linked lists don't. Arrays have fast access, setting, and efficient appending to the end. Dicts are perfect for storing key-value data. Sets guarantee that there are no duplicates (therefore, no need to filter out all duplicates from a list).

Currently, if you want to write a DSL in Elm (like elm-html), lists and records are your only tools. The downside of this is that lists may not be the optimal data structure for your DSL.

For example, here is how elm-html would look with arrays and dicts

div 
  #[ style #{ "position" : "absolute" , "color" : "red" } 
   , id "containerDiv" 
   ]
  #[ ul #[]
      #[ li #[] #[text "Hello"]
       , li #[] #[text "World"]
       ]  
   ] 
@texastoland

This comment has been minimized.

texastoland commented Jul 21, 2015

I dislike this. More syntax is bad. Specifically, I almost never use sets on the job. Moreover, since our target user is accustomed to using dictionaries for everything, convenient record syntax pushes them to think typed-ly.

@toastal

This comment has been minimized.

toastal commented Jul 21, 2015

Why not both?

It's just sugar--and a good, terse sugar that other languages are already using. I don't see why you couldn't have still have Set., Dict., Array. inserts/froms for backwards compatibility or whatever your stylistic conviction may be. I feel like I'd be more inclined to use other (read: better) data structures if the API was this way. Maybe you can rope in some Clojure/Script users in as well...

@mgold

This comment has been minimized.

mgold commented Jul 21, 2015

Regarding the dict example: four quotes on a line is pretty ugly. Without endorsing the proposal, I think it should be = not :.

I think the more fundamental concern here is, what problem are we trying to solve? If it's "we have all these great data structures but no one uses them," then literals are only part of the problem. You'll also need

  • the Array etc module to be imported by default (sounds like an uphill battle)
  • examples that use these data structures on the website, perhaps with comparisons between data structure choices
  • more reliability in these libraries. Dict and Set will don't treat == in a useful fashion. The array library had some bugs when it came out and I'm not sure if these are all flushed out.
@TheSeamau5

This comment has been minimized.

TheSeamau5 commented Jul 21, 2015

@mgold Yes, that is exactly the problem I'm trying to solve. We have amazing data structures and no one uses them. The reason I looked at Clojure as an example is because, as a Lisp, it would sound a-priori unwise to extend the Lisp syntax to include stuff like curly braces and square brackets. But, that bet paid off as vectors (equivalent to Elm Arrays) are preferred to lists as a means to store sequential data.

As regards the actual Array, Dict, Set implementations, I think it would be worth considering either strengthening these implementations by thorough testing, performance tweaks, etc... Or, we could consider switching to something like Immutable.js or Mori. But that's a discussion for another day I fear.

@dnalot About the target user. If I understand correctly, JS users are a big part of Elm's target. JS people are accustomed to having efficient indexing in collections and never really deal with linked lists. (I don't remember ever using a linked list in JS to be honest) As such, we could consider a stronger form of the proposal specifically to target JS people.

The stronger form

Redefine the square bracket syntax to mean array instead of list (and find some other syntax for lists). Yes, this would break with the ML family. But like @evancz pointed out in his talk, the goal is not to make Haskell or Standard ML more usable and more maintainable. The goal is to make Javascript more usable and more maintainable. Javascript has Arrays. JS people know arrays, not linked lists. We can cater to JS people by having square brackets mean Arrays.

array : Array Int
array = 
  [1..10]

We can then figure something out for lists.

@Apanatshka

This comment has been minimized.

Apanatshka commented Jul 22, 2015

I like the premise, we should use more data structures than just lists. I'm not sure this literal syntax is the way to go though. If we expand the literals syntax, I think we should have a general way to do so, not specialise for some stuff that seems nice right now.

@evancz

This comment has been minimized.

evancz commented Jul 22, 2015

I think this is a nice proposal, great work @TheSeamau5! I think it's kind of sad that # is an ugly thing to be putting in so many places, but I think the HTML example is pretty compelling.

My first intuition was the same as @Apanatshka's: "is there something general happening here?" Perhaps not. Maybe that way is a quagmire of things we don't actually care about. Maybe if our core data structures are easy to use, then we are all set.

I also agree with @mgold that = is better. If we ever allow type annotations in expressions, the dict vs set one will become sooooo ambiguous. It's already pretty rough as is though, not sure how plausible it is to parse that efficiently.

The OCaml syntax for arrays is [| 1, 2, 3 |] which I didn't add because it always seemed kind of meh as syntax. I like #[ 1, 2, 3 ] more now that I see them both, but I wonder if we can do a bit better.

@evancz

This comment has been minimized.

evancz commented Jul 22, 2015

So maybe the way forward is to get a small set of example programs that use dicts and sets and lists and arrays and try out variations of this idea til we are satisfied. Perhaps like with the tasks stuff, someone shares a gist, others make variations, we list them all up as an addendum to the initial issue.

@evancz

This comment has been minimized.

evancz commented Jul 22, 2015

What if it is somehow related to modules, kind of like #5 but not:

import Array as A
import Dict as D

div 
  A.[ style D.{ "position" = "absolute" , "color" = "red" } 
    , id "containerDiv" 
    ]
  A.[ ul A.[]
        A.[ li A.[] A.[text "Hello"]
          , li A.[] A.[text "World"]
          ]
    ]

Haha, def looks worse to me! Okay, I'm gonna get back to components stuff now, but again, this seems quite interesting!

@TheSeamau5

This comment has been minimized.

TheSeamau5 commented Jul 22, 2015

Here's one example, from elm-todomvc

Current version with linked lists:

view : Address Action -> Model -> Html
view address model =
    div
      [ class "todomvc-wrapper"
      , style [ ("visibility", "hidden") ]
      ]
      [ section
          [ id "todoapp" ]
          [ lazy2 taskEntry address model.field
          , lazy3 taskList address model.visibility model.tasks
          , lazy3 controls address model.visibility model.tasks
          ]
      , infoFooter
      ]

Current proposal: (colon replaced with equals)

view : Address Action -> Model -> Html
view address model =
    div
     #[ class "todomvc-wrapper"
      , style #{ "visibility" = "hidden" } 
      ]
     #[ section
         #[ id "todoapp" ]
         #[ lazy2 taskEntry address model.field
          , lazy3 taskList address model.visibility model.tasks
          , lazy3 controls address model.visibility model.tasks
          ]
      , infoFooter
      ]
@rtfeldman

This comment has been minimized.

rtfeldman commented Jul 22, 2015

Great idea @TheSeamau5!

I think a good way to go for this would be to split it up into separate proposals for "Separate Syntax for Array Literals and List Literals" vs. others. I think the motivation for Array vs. List is pretty strong, because it's so common to want to use literals for those in DSLs.

The use case that comes immediately to mind is elm-html's element functions; I think it's safe to say the primary reason they accept a List of attributes over an Array of them is not that List is the best data structure to hold them (quite the opposite!) but rather because [attr1, attr2, attr3] is the nicest DSL to work with.

Every time I've wanted to construct a Dict or Set so far, it's been by starting with some other variable-length data structure and transforming it, so I'm actually not sure I'd even have used literals for them if they existed! That's not to say my experience generalizes, but rather that I'm not sure there's as strong a case to be made there.

@z5h

This comment has been minimized.

z5h commented Jul 22, 2015

It seems unfortunate to me that we are binding specific implementations of a listy things to syntax. As soon as we want a different 'listy' thing (a skip list, a doubly linked list, etc) we'll run into this problem again.

Might there be a way to create a listy thing and (optionally) assign it's implementation after?

@mgold

This comment has been minimized.

mgold commented Jul 22, 2015

@z5h That's the problem type classes solve, and we're not going there right now. Or, if we decide we need to, this conversation balloons.

@ everyone: Correct me if I'm wrong, but lists are much more performant than arrays. Arrays of less than 32 elements are basically single chunks of memory that must be copied entirely on write, and this is the use case for the vast majority of DSL uses. FP typically uses mapping instead of indexed traversal, because it works so well with immutability.

Or, when you're writing out a literal, you don't really need to index into it. And the HTML library doesn't care what data structure you're using since it just maps over it to get the complete set of attributes, children, or what have you. Sure, you might use indices to construct something dynamically, but then you're no longer writing literals. So I don't agree with Richard (at the moment) that lists are "far from" the best data structure for the HTML library.

Point is, I agree with Evan on code examples. I'd like to see something really ugly with lists, ugly with lists converted to arrays, and beautiful with a made-up array literal syntax.

@Apanatshka

This comment has been minimized.

Apanatshka commented Jul 22, 2015

The module prefix doesn't look so bad to me @evancz. That's the direction I was thinking in too, reminiscent of one of the tasklet macro syntaxes with a record for, in this case, cons and nil or something like that.
You could also try a variation of the Haskell quasi-quoting again like we had for markdown and still do for glsl:

view : Address Action -> Model -> Html
view address model =
    div
     [A| class "todomvc-wrapper"
     , style {D| "visibility" = "hidden" |} 
     |]
     [A| section
         [A| id "todoapp" |]
         [A| lazy2 taskEntry address model.field
         , lazy3 taskList address model.visibility model.tasks
         , lazy3 controls address model.visibility model.tasks
         |]
     , infoFooter
     |]
@TheSeamau5

This comment has been minimized.

TheSeamau5 commented Jul 22, 2015

@mgold I didn't give the example of elm-html to claim that arrays were superior to lists. What I mean, is that there a number of cases where they are the better structure.

For example: Infinite scrolling lists. You need a collection that can store all that information, send actions uniquely to the concerned element, and slice the collection appropriately to only display what is visible. Arrays beat lists on all these fronts.

While you can implement the described behaviors with Arrays today, you end up paying a needless conversion to lists. Why? It's completely unnecessary.

Again, in the particular context of elm-html, I am not advocating the change from lists and tuples to arrays and dicts. What I am pointing out is what @rtfeldman pointed out. The reason elm-html took the routes of lists and tuples is not due to a calm, thoughtful analysis of performance tradeoffs. It is uniquely because lists and tuples have associated literals. Arrays and Dicts do not. Currently, if you want to make a DSL, using Arrays or Dicts or Sets are out of the question, even if they would be ideal data structures for these DSLs.

@TheSeamau5

This comment has been minimized.

TheSeamau5 commented Jul 22, 2015

@Apanatshka Sounds like an interesting idea to find a more general pattern, but also sounds super complex in the short term.

That said, I would be interested in reading a proposal that could explain how these generalized data structures would work and how one would be able to define their own custom data structures.

@texastoland

This comment has been minimized.

texastoland commented Jul 22, 2015

@evancz's module syntax looks intriguing to me to 👍 Swift features literal convertible protocols for transforming standard types into custom ones.

@evancz

This comment has been minimized.

evancz commented Jul 22, 2015

Making things implicit is tricky in my experience. This exists in Haskell for strings as OverloadedStrings, and it's always been super confusing when I use it. Apparently they have a mechanism to do this with lists as well (OverloadedLists). It's possible, there's a formula, but I don't think this is a nice place to be in many ways.

I am generally in agreement with @TheSeamau5 that this is going to get out of control if we try to solve every case, when in fact we have a very specific issue: whether it is elm-webgl or elm-html or Graphics.Element, arrays are faster to use and impose no real costs on people in an ideal world where the syntax is cheap.

Meta Note: I think we outlined pretty much all the general approaches, so to move this forward, share further ideas as a gist with a bunch of examples, similar to how we did things here. We can collect all the ideas and add them in an addendum to the initial issue. I'll try to do better too :) I think @TheSeamau5 already made a good case, but for the others, I think we should see them fully layed out.

@evancz

This comment has been minimized.

evancz commented Jul 22, 2015

Are there languages besides Clojure that do this? If so, what syntax do they use?

@evancz

This comment has been minimized.

evancz commented Jul 22, 2015

Sorry to keep cluttering this issue, but I just remembered VLists which @rtfeldman told me about recently. It might be worthwhile to open another issue exploring this, or better yet, for someone to just make it as a community library.

@rtfeldman

This comment has been minimized.

rtfeldman commented Jul 22, 2015

Correct me if I'm wrong, but lists are much more performant than arrays. Arrays of less than 32 elements are basically single chunks of memory that must be copied entirely on write, and this is the use case for the vast majority of DSL uses. FP typically uses mapping instead of indexed traversal, because it works so well with immutability.

Unfortunately the coefficient for linked lists in JS is huge because each Cons cell is a fully-fledged JS Object instead of just a couple of pointers. 😿

@rtfeldman rtfeldman added the proposal label Jul 22, 2015

@mgold

This comment has been minimized.

mgold commented Jul 22, 2015

Time to benchmark, then.

@BartAdv

This comment has been minimized.

BartAdv commented Aug 18, 2015

Just wanted to add I'm with @z5h on this one. Binding to one specific implementation seems too limiting. In a healthy ecosystem it might be the case you'd like to use other structures than ones provided by core.

@evancz

This comment has been minimized.

evancz commented Aug 25, 2015

I'm in the process of closing down this repo. This idea is cool. I have it in my personal set of "things to consider" so I will revisit it before 1.0

@evancz evancz closed this Aug 25, 2015

@evancz

This comment has been minimized.

evancz commented Aug 25, 2015

If you want to preserve this, move it to a gist. I will probably delete this repo in a couple days and salvage certain pieces (like the CONTRIBUTING page) for managing other repos and online venues.

@masklinn

This comment has been minimized.

masklinn commented Nov 21, 2015

Are there languages besides Clojure that do this? If so, what syntax do they use?

It's a bit late but since I saw this issue and nobody replied

  • Python doesn't have linked list literals (its list type is an array), its collection literals are "array" [1, 2, 3], set {1, 2, 3} and dict {1: 2, 3: 4, 5: 6}
  • Ruby has array literals ([1, 2, 3] and the perl-ish %w{a b c}), and two syntaxes for hash literals: {"a" => 1, "b" => 2} and {a: 1, b: 2}. In the second syntax keys are always symbols (so it's a shorthand to {:a => 1, :b => 2}
  • Erlang has list literals [1, 2, 3] and map literals #{a => 2, "a" => 1}, I don't think it has array or set literals
  • Rust doesn't have any collection literal, though it has a vec![1, 2, 3] macro

The "module" option sounds similar to initializer lists (C/C++/C#), maybe desugared at compile-time so dictionaries can be source encoded as pair sequences or as proplists without needing their own custom syntax?

An other similar idea is Python and C#'s string literal modifiers (in Python3 "foo" is a regular string, b"foo" is a bytes object, and r"foo" is a rawstring where most escape sequences are ignored, C# has @"foo" for the latter which it calls verbatimg string literals and $"foo" for interpolated string literals). ES6 added tagged template literals tagfoo`` where a template literal is preprocessed by an arbitrary tag function.

@reitzig

This comment has been minimized.

reitzig commented Sep 18, 2017

Just to leave one use case on this zombie: representing JSON-style data.

While representing JSON with strong type systems is generally ... interesting, let's say we manage to define an Elm type for the JSON at hand. If it's not completely trivial, it will probably contain some instances of Dict (e.g. for capturing patternProperties keys, to speak in JSON schema terms). Now if we want to include such data statically in an Elm app (I guess that applies to non-JSON data as well, but that's my use case), what code do we use?

Going via Dict.fromList feels wrong, both w.r.t aesthetics and (potentially) efficiency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment