Skip to content
This repository

Nested comprehensions, bug or a feature? #1191

Open
ocanbascil opened this Issue March 09, 2011 · 26 comments
Can Bascil

I'm having problem with nested array comprehensions.

In python: [ [i,j] for i in range(3) for j in range(3) ]
gives you a 2D array:

[ [0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [2, 0], [2, 1], [2, 2] ]

When I try the same in coffescript : ( [i,j] for i in [0..2] for j in [0..2] )

I end up with a 3D array:
[ [ [ 0, 0 ], [ 1, 0 ], [ 2, 0 ] ],
[ [ 0, 1 ], [ 1, 1 ], [ 2, 1 ] ],
[ [ 0, 2 ], [ 1, 2 ], [ 2, 2 ] ] ]

When I increase the nest depth, python keeps giving flattened results whereas CS increases the array dimensions.

So is this a bug or a feature?

Michael Ficarra
Collaborator

The coffeescript behaviour is what I would expect. It is easy enough to flatten and emulate Python's behaviour from CS, but it seems difficult to emulate the CS behaviour from Python.

edit: after seeing TrevorBurnham's comment below, I guess that you could put square brackets around each comprehension (I don't know python very well). Still, I like the consistency of CS. I'd be open to hearing ways to make the CS comprehensions more versatile, though.

edit again: since this post, my opinion has been changed. I am now strongly in favour of switching to the python style comprehensions.

Trevor Burnham
Collaborator

Feature. This is actually totally consistent, though believe me, I understand your surprise.

In Python, you have to explicitly declare array comprehensions by placing square brackets around each for. But in CoffeeScript, every for returns a list (for that matter, so does every while and until). It's only an optimization that the list isn't generated if it's unused.

So

arr = ( [i,j] for i in [0..2] for j in [0..2] )

assigns the list generated by for j in [0..2] to arr, and each entry in that list is generated by the expression [i,j] for i in [0..2]. Makes sense?

The easiest way to get the flat array you want is to generate the list without the use of comprehensions:

arr = []
arr.push [i, j] for i in [0..2] for j in [0..2]
Satoshi Murakami
Collaborator
satyr commented March 10, 2011

In other words, Coffee doesn't support nested comprenhensions in generic sense.

([i,j] for i in [0..2] for j in [0..2])

is equivalent to

(([i,j] for i in [0..2]) for j in [0..2])

and thus to Python's

[[[i,j] for i in range(3)] for j in range(3)]
Can Bascil
Jonas Elfström

I found the same thing trying to convert this Python by Peter Norvig

def cross(A, B):`  
    "Cross product of elements in A and elements in B."`  
    return [a+b for a in A for b in B]`

to CoffeScript

cross = (A, B) -> (a+b for a in A for b in B)

Squeegy posted a working implementation at http://stackoverflow.com/questions/5685449/nested-array-comprehensions-in-coffeescript/5685926#5685926

Trevor Burnham
Collaborator

Can this issue be closed? It seems like the original question has been answered, and I don't see any proposal for a change in the syntax.

Can Bascil ocanbascil closed this April 20, 2011
Jeremy Ashkenas
Owner

Reopening this for conversation, after: http://brehaut.net/blog/2011/coffeescript_comprehensions

... but I think it should be driven by use cases + consistency. Both patterns can accomplish both goals, but which is more expected and convenient?

Jeremy Ashkenas jashkenas reopened this September 13, 2011
Patrick Mueller

Don't have strong feelings either way, but Brehaut's post also seems to indicate a Harmony incompatibility: "The Harmony project will introduce both generators and comprehensions to Javascript, and provide different semantics for both." That's the big issue to me.

Trevor Burnham
Collaborator

What we need is a separate syntax that passes through to Harmony's list comprehension syntax. Maybe it wouldn't be so bad for

[x for x of obj]

to pass through directly to [x for (x in obj)]. The current behavior of that syntax is nicely consistent, but pretty uncommon in practice (and can be achieved with [].concat (x for x of obj)).

Right now, this is pretty academic—it's not like developers are rushing to writing apps that can only run in Firefox. But when Node can handle Harmony-style list comprehensions, people will expect to be able to write them in CoffeeScript. The same goes for let and certain other Harmony-isms.

I'd suggest that we add such Harmony pass-throughs in CoffeeScript 2.0, along with a compiler flag to emit warnings about compatibility issues. That's the only way I can see for people to happily use CoffeeScript for development in both cutting-edge environments and older browsers.

Jeremy Ashkenas
Owner

That's a whole 'nother kettle of fish -- and doesn't need to be discussed in this issue. Personally, I'd rather have CoffeeScript never support Harmony features until they're available across all common JS runtimes.

Despite a decade of promises, JavaScript is still pegged to what's possible to accomplish in IE6 and IE7, and it's not realistic to expect that to change any time soon. IE9 will never run on XP. (All of this has been said before.)

One of the core principles of CoffeeScript is that you're not writing code that breaks on older engines, and we shouldn't give that up under any circumstances.

Reg Braithwaite

I have no experience with the Harmony situation, so I’ll limit my remarks to what I think about Coffeescript as it is today. I prefer flat comprehensions:

  1. I think it's easier and cleaner to make nested arrays with flat comprehensions than to make flat arrays with nested comprehensions.
  2. Compatibility with Haskell and Python’s mental model.

I think the latter reason is important. Being similar to the two most popular (or notorious) implementations of comprehensions but differing in a significant way throws some extra mental friction into Coffeescript. Most people will use the comprehensions for simple cases and be happy, but then run into an issue when they try to ape a Python or Haskell solution they find elsewhere.

I personally think Coffeescript will ”delight” programmers when they discover that implementing a Python comprehension “just works.”

JM2C.

Michael Ficarra
Collaborator

Yes, I also don't feel a pass-through is appropriate for CoffeeScript. But I'm very happy that this issue was re-opened, as the comprehension syntax/behaviour is one of only two coffeescript features with which I have gripes. I've listed both changes in the "manifesto" of my still-vaporware rewrite (it's coming, I promise!).

I think it's appropriate to accept the harmony-style comprehensions and compile them to their ES3 equivalents. People should be allowed to use these JS features in CS. When we were adding something that didn't exist, it was okay, but now that the same feature exists in JS, choosing a different syntax and different semantics just serves to confuse. I think Andrew Brehaut put it really well himself.

I refer to the CoffeeScript behaviour as ‘map’ oriented comprehensions, and the Python behaviour as ‘mapcat’ oriented comprehensions. The mapcat model is more compositional than the map model. This means that the result of a function that is defined as mapcat oriented comprehension can be the input to itself recursively.

def flatten(l):
  try:
    return [y for x in l for y in flatten(x)]
  except TypeError, e: # horrible test for iterability
    return [l]

But even more importantly for me, it will allow us to explicitly specify when you want to use the "value" of a loop and finally shut everyone up about #899 (no offense, @TrevorBurnham and others). The downside is that you can't claim that "everything has a value" any longer, but I'm not sure that's even the case today.

Jeremy Ashkenas
Owner

If we change this behavior, everything will still have a value -- the question is just if the value of a nested loop is a flat array, or an array of arrays for each inner loop. I think that the current behavior is, almost unarguably, more consistent ... but the proposed behavior may well be more useful.

It's not out of the realm of possibility for us to do both:

results = x + y for x in list for y in otherList

results is [ flat array ]

results = for y in otherList
  for x in list
    x + y

results is [[ array of arrays ]]

... but that would probably be an unacceptable distinction to draw. At the same time, the rationale for doing a flat map gets weaker and weaker the more in-depth and complex your inner loops get. If there's lots of arbitrary code between the loop bodies, I think producing a flat map back out at the top level would feel truly strange, especially because saving the inner loop to a local variable would dramatically change the behavior.

results = for y in otherList
  tempResult = for x in list
    x + y

... always must produce an array of arrays.

Alessandro Vernet

My preference, inspired from Scala, would be to keep the current behavior when using multiple for, each for behaving like a map:

s1 = [1, 2]; s2 = [3, 4]
r = (i for i in s for s in [s1, s2])
# r contains [[1, 2], [3, 4]]

This is similar to what happens in Scala when using multiple for:

val s1 = List(1, 2); val s2 = List(3, 4)
val r = for (s <- Seq(s1, s2)) yield for (i <- s) yield i
# r contains List(List(1, 2), List(3, 4))

Now however in Scala, you get a flatMap when using just one for:

val r = for (s <- List(s1, s2); i <- s) yield i
# r contains List(1, 2, 3, 4)

By analogy, I'd like to get a flatMap in CoffeeScript when using one for, with a syntax TBD, for instance:

r = (i for s in [s1, s2], i in s)
# r would contain [1, 2, 3, 4]
lju

Hi, beginner here. The wikipedia for list comprehensions implies the goal of comprehensions is to mimic set-builder notation. As a mathematician I can say what I'd expect using set-builder notation:

If A = ['a','b'], and B = ['1','2'] then
a+b for a in A for b in B := {a+b : a in A, b in B} = [a1, a2, b1, b2]
[a+b for a in A] for b in B := {{a+b: a in A}: b in B} = [[a1, b1], [a2,b2]]

I'm not sure if this is possible to implement with a parser, but this makes intuitive sense to me.

showell

FWIW, I find CoffeeScript's list comprehensions basically unusable for multiple loops, but I'm biased by my Python experience. It's not a huge issue for me, since the workaround is obvious (just use the imperative style when you have multiple loops).

It's going to be a gotcha for newbies coming from Python. If nothing else, I think the documentation should clearly call out the nested behavior.

showell

Here's an interesting litmus test for list comprehensions:

http://rosettacode.org/wiki/List_comprehensions

I tried it in CS, and I basically gave up.

Here's the Python solution (not mine, but Rosetta is GFDL, see their license):

[(x,y,z) for x in xrange(1,n+1) for y in xrange(x,n+1) for z in xrange(y,n+1) if x**2 + y**2 == z**2]
showell

I hope that the docs eventually call out that CS differs from Python in two important ways. First, there's flattening. Second, there's the order in which loops are nested. Python documents its strategy in the PEP: "The form [... for x... for y...] nests, with the last index varying fastest, just like nested for loops." (http://www.python.org/dev/peps/pep-0202/)

CS arranges the loops in the opposite direction, with the first index varying fastest, the opposite of how nested for loops work:

(x for x in a for y in b)
var x, y, _i, _j, _len, _len2;

for (_i = 0, _len = b.length; _i < _len; _i++) {
  y = b[_i];
  for (_j = 0, _len2 = a.length; _j < _len2; _j++) {
    x = a[_j];
    x;
  }
}
Michael Ficarra
Collaborator

@showell: I agree, that would be a good way to document our comprehensions. Unlike python,

  • Nested comprehensions are not automatically flattened (map instead of concatMap/flatMap semantics)
  • The index of the comprehension nested deepest varies fastest

That seems like a concise way to describe both the nested comprehension semantics and the way it differs from Python.

showell

@michaelficarra I think those two lines would go a long way to avoiding confusion. Does it make sense to open a new issue that specifically addresses documenting the feature as currently implemented?

Michael Ficarra
Collaborator

It would make even more sense to just open a pull request (mentioning and summarising this issue for those that weren't paying attention to our discussion) where we can talk about changes if what was submitted is insufficient/erroneous.

Satoshi Murakami
Collaborator

Calling it comprehension was a mistake. Just saying "our loop returns an array of each result" or something
would've been much less confusing.

showell

See issue 2039. @michaelficarra @satyr I am holding off on a pull request, because I think the main problem here is agreeing on some terminology. Once the terminology gets nailed down, I think the doc patches will be pretty trivial (and very helpful).

George Zahariev
gkz commented January 14, 2012

I really like this idea:
[x, y] for x in [1..3], y in [7..9] when x + y != 10
producing
[[1, 7], [1, 8], [2, 7], [2, 9], [3, 8], [3, 9]]

This way it would not break or change any current code - you could still use multiple fors and get the current result. Eg you could do
[x, y, z] for x in [1..3], y in [7..9] when x + y != 10 for z in ['a', 'b', 'c']
if you wanted to.

This way we could have the current list producing fors, and also real list comprehensions that act like people expect.

I tried taking a stab at getting something like this working, but when I got to the compileNode method of For in nodes.coffee, I got in over my head.

Lucas Beyer

I know this is old, but it's still open. I just want to point out that Julia also uses for a in [1,2], b in [3,4], as suggested by @avernet and @gkz. This has the advantages of 1. not breaking any existing code and 2. the amount of for words corresponds to the depth of the result. The disadvantage is that it's unknown to Pythonistas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.