-
Notifications
You must be signed in to change notification settings - Fork 1
Proposal: Default to Arrays over Lists #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I disagree with point 2.
I don't think that's important. In Elm/FP, lists are usually filtered/mapped/folded. I've yet to look up an element by index! Most JS people know map and filter from underscore and the like. Also, there is no reason for someone coming to Elm to know or care how a list is implemented. Just call it a list. Let them do an inefficient update or lookup by index (depending on the underlying data-structure). They won't notice a difference or care. Later, when they are building massive apps and are better versed in Elm/FP, they can dig into the docs and learn about a linked list or Relaxed Radix Balanced Tree. Finally, I see a lot of assumptions made about JS (or Ruby) programmers. Let's once and for all state and agree for the record what we are assuming about them. I'm not against the other points. I want to make sure we are giving JS devs adequate credit. It will hurt everyone if we don't. |
Although I've seen my opinion sway with each comment, I think @z5h articulates something I was trying to get at earlier:
Indexing is a relic of C for-loops because that was (is) how assembly works. Most of the JS I write uses D3 and therefore has no explicit loops. I actually had to write one the other day, and had to remember how to do it! I agree that many JS programmers (and Ruby programmers) are familiar with the basic higher order functions on lists/arrays. I also think most of them know what a linked list is - despite the rise of programmer bootcamps, most people to have a CS degree. |
I generally assume:
|
I agree that this is a minor point, but it's relevant to note that when you're translating code from JS to Elm, it's more likely that the original code used indexed lookups because there was less of a downside then. For example, I write JS in a functional style, but when I had some code that needed to transform a list by considering adjacent elements (e.g. "if the next or previous elements fit this description, we get a different result...") I used indexed lookups because that was cheap and had no downside. When I ported that code to Elm it ended up making more sense to implement Again, agree that this is not a huge deal, but the status quo still seems more likely to cause minor headaches than the proposal would. |
Assuming this premise, why call it a list? Why not just call it an |
@rtfeldman Thanks for the point form assumptions. Good points to consider.
Right, so we can choose the List implementation that makes most sense to seasoned Elm/FP users. (Let's assume Linked list for argument's sake.) If we decide Array implementation in better for seasoned Elmers, the similar arguments will hold. Seasoned Elmers will know to import/use LinkedList if they are doing lots of consing. In either case, I think the right approach is choose what is best for the seasoned Elmers and make sure new users don't have to think about it until they choose to. As for naming, I think Array implies a more specific implementation and performance characteristics whereas List does not. And everyone (even non programmers) know what a list is. |
I think this is an important point to consider in this decision. Elm is also meant to be beginner-friendly and one must consider the learning path in Elm. What are the first concepts that one wishes to stress in learning? What is the first data structure you wish a beginner to learn? Given the convenience of the square bracket syntax, the data structure that will inherit this syntax will necessarily be the first one taught to kids and beginners. Let us consider a specific case from the Elm examples, and an algorithm that is always taught in the first data structures class: quicksort Here is the code for quicksort on lists from the Elm examples quicksort : List comparable -> List comparable
quicksort list =
case list of
[] ->
[]
pivot :: rest ->
let
lower = filter (\n -> n <= pivot) rest
higher = filter (\n -> n > pivot) rest
in
quicksort lower ++ [pivot] ++ quicksort higher And now, here is the code for quicksort on arrays with the square bracket syntax (and assuming that quicksort : Array comparable -> Array comparable
quicksort array =
case split array of
Nothing ->
[]
Just (pivot, rest) ->
let
lower = filter (\n -> n <= pivot) rest
higher = filter (\n -> n > pivot) rest
in
quicksort lower ++ [pivot] ++ quicksort higher where split : Array a -> Maybe (a, Array a)
split array =
case get 0 array of
Nothing ->
Nothing
Just x ->
Just (x, slice 1 (length array) array) The question, which I'd like to put to an actual CS teacher, is : which version is easier to teach to someone completely new to programming? Is there a significant pedagogical difference in having the pattern matching on the data structure itself versus on the result of invoking a function? How would teaching one over another influence the sort of material that is taught and in the patterns that are stressed? For example, if arrays are taught first, consing may be given almost no importance at first. I have absolutely zero experience in teaching, so please take anything in this respect with a truckload of salt. |
Meta Comment: I feel like we have been doing a lot of talking and theorizing on elm-plans. That's fun and easy, but I worry that we can collectively be more directed in how we use this. In this case, I feel like having the discussion is kind of crazy if we don't know what the benchmarks say. The goal is "make rendering stuff faster". Well, we should know how much faster before we do any more work. Maybe it's not a big deal, maybe it is huge. Maybe it can be done with a different API on virtual-dom? Maybe this is not actually important after all? Measure it. Try things out. Once we know this is actually worthwhile, we can look into if it makes sense as a language feature in the first place. It is actually possible to weigh "how does this go for learners" against "100x speed up" or "2x speed up". I don't see how we can make any solid evaluations without the other side of this. Also, I don't want this to become a magic wishlist where "Evan will someday do what I desire." This stuff takes work. Defining new work for me maybe is a nice thing, but if we want this stuff to happen faster, we must systematically find ways to break these things down into pieces that can be worked on by many people. In this case, it's pretty clear. Run benchmarks, build examples. If it's a compelling case, I should hear about the result. So can I ask that we focus more on examples, packages, prototyping, benchmarking, testing, etc. in these threads. That stuff is all raw data that we can build on and none of it blocks on anyone else. (Hopefully I don't sound like a big jerk here. I just think we can be more efficient in achieving our goals.) |
I recently heard this phrase: "We can talk about making a difference, we can make a difference, or we can do both." I feel like elm-plans needs more balance. Maybe we can reformulate this for Elm plans so we can require people opening issues to do more background work. Or have a clear process to independently get that background work done. Point is, maybe lets put a hold on all these discussions until we have clear realistic goals about what elm-plans and for and a process that people can follow to reach those goals. |
@evancz How about we define the process to make enhancements to Elm? We can copy the way the Dart folks do things, for example. They seem to have an open and rigourous process in place. See: I think it's really important to the define how a proposal goes from being an idea in someone's head to actually shipping. What work is required from the person making the proposal? What work is required from Evan (and/or anyone working on implementing language features)? What should happen to proposals that don't meet the standards set? What kind of testing/benchmarking should be required? etc... |
My friend at Google remarked if members had to implement a working draft with proposals then there'd be fewer proposals and more contributors able to fix bugs. Sounded snarky but scalable. |
@dnalot I don't think it's snarky. I think it's honest. I guess it'd depend on how @evancz sees things. A language like Dart, for example, is made mostly with seasoned programmers in mind. Their target audience are large companies with 100K+ LOC and/or game companies who want to ship a big game using web technologies. A language like that has to require super high filters for enhancements or else they would go nowhere. On the other hand, I imagine Elm being more flexible that this as it appeals also to beginners. I think it would make sense for Elm to seek the advice of educators and feedback from beginners to see where are the pain points (be they in the syntax, the libraries, etc...) in learning Elm. Also, from what I can tell, Elm doesn't have Google's resources. As such, most people helping out with Elm (like me), do so from our own free time. This free time varies wildly from week to week and from person to person. On the other hand, the Dart folks have a dedicated team, paid to work on features of the Dart language, extending it as necessary, fixing bugs, writing new virtual machines, etc... all with the full backing of Google. So, I think it's fair for the Google folks to expect working drafts for each and every proposal. But all this is just my opinion. Anyways, let's move this discussion to another thread. After all, this thread is about a specific proposal. |
We can revisit this if someone does benchmarking. |
A quick note to guide discussion: a clear prerequisite for a change of this magnitude would be benchmarking before and after to make sure it actually makes sense from a performance perspective. There's no way to tell what the results of those benchmarks will be without running them, so let's discuss assuming that this won't happen if the benchmarks say it shouldn't—and instead discuss what the other pros and cons would be assuming the benchmarks check out.
Proposed Changes
1. Square brackets are now literals for
Array
values, notList
values. In other words:This would be a breaking change.
2. The default data structure for libraries (in Core, etc.) becomes
Array
instead ofList
.For example, this would mean the following changes for
Dict
:Dict.keys
returns anArray
Dict.values
returns anArray
Dict.toList
becomesDict.toArray
Dict.fromList
becomesDict.fromArray
This would also be a breaking change.
Motivation
A comment in another thread by @TheSeamau5 got me thinking about this and discussing some things with @evancz.
There are several reasons to do this:
1. Performance
We have already seen, in WebGL cases, specific examples of Elm programs that run noticeably faster when using
Array
instead ofList
on otherwise equivalent code. We have yet to seen the reverse, although granted that is harder to reproduce because literals produceList
values and libraries tend to preferList
.Because Elm's Arrays are implemented using Relaxed Radix Balanced Trees, the performance characteristics associated with (for example) recursively pattern matching should be substantially better here than they are with large vanilla JS arrays. For pattern matching on small lists, the extra overhead of instantiating N extra arrays may be comparable (and perhaps even less) to the extra overhead of instantiating N
Cons
cell objects, which is required to instantiate even aList
that does not use pattern matching.Benchmarks should be able to shed better light on this.
2. Familiarity to JS Programmers
As @TheSeamau5 said:
I have encountered a significant number of capable JS programmers who did not study Computer Science in school, and as such have never even had occasion to use a linked list. Linked lists are not terribly hard to explain, of course, but their use by default does add fuel to the "Elm seems alien" fire.
3. Future Compilation Targets
Although this is not happening any time in the near future, it is worth thinking about how this proposal would affect Elm's long-term compilation target prospects.
One of the compilation targets with the most potential is machine code (or WebAssembly). There's no way to benchmark that yet, since Elm does not currently compile to these targets, but we do know that mandatory JS Object overhead will disappear, and that arrays enjoy significant performance benefits from locality over linked lists. Of course, we would need to run (future) benchmarks to ascertain the real impact.
The text was updated successfully, but these errors were encountered: