Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Shortened version of unzip that doesn't create sparse arrays #1047

Merged
merged 4 commits into from Apr 4, 2013

Conversation

Projects
None yet
4 participants
Contributor

almost commented Apr 2, 2013

@jdalton informed me that some browsers have problems with sparse arrays1 so I re-wrote _.unzip to not create them. In the process I managed to reduce the amount of code and I'm quite pleased with the result :)

almost referenced this pull request Apr 2, 2013

Merge pull request #1038 from almost/master
_.unzip as the inverse of _.zip

_.bind(_.pluck, null, tuples) can be _.partial(_.pluck, tuples).

Owner

almost replied Apr 2, 2013

Thanks, have changed it. I didn't know _.partial was in underscore, thought it was just in low-dash

No problem, my "code reads" are free ;)
The _.partial method was added in Underscore v1.4.4, ~9 months after it was added to Lo-Dash.
However, it's still not as robust as Lo-Dash's, which also mimics the behavior of _.bind when called as a constructor.

jdalton commented on 1007164 Apr 2, 2013

I spend a lot of my time unrolling code, but I've got to hand it to you.
You've used the methods available to really pack this down.

slow clap

@almost almost and 2 others commented on an outdated diff Apr 2, 2013

underscore.js
@@ -506,16 +506,8 @@
// `[['a',1],['b',2],['c',3]]` returns the array
// [['a','b','c'],[1,2,3]].
_.unzip = function(tuples) {
- var results = [];
- _.each(tuples, function (tuple, tupleIndex) {
- _.each(tuple, function (value, itemIndex) {
- if (results.length <= itemIndex) {
- results[itemIndex] = [];
- }
- results[itemIndex][tupleIndex] = value;
- });
- });
- return results;
+ var maxLen = _.max(_.pluck(tuples, "length"))
+ return _.map(_.range(maxLen), _.partial(_.pluck, tuples));
@almost

almost Apr 2, 2013

Contributor

return _.times(maxLen, _.partial(_.pluck, tuples)); almost works except in the case of an empty list input in which case maxLen becomes -Infinity which _.times doesn't like. So close :)

@jdalton

jdalton Apr 2, 2013

Contributor

return _.times(maxLen, _.partial(_.pluck, tuples)); almost works except in the case of an empty list input in which case maxLen becomes -Infinity which _.times doesn't like. So close :)

It would work in Lo-Dash ;)

@almost

almost Apr 3, 2013

Contributor

Do you think it should work? I think it probably should, being able to pass in a negative number and have _.times not through an error (just return an empty array) seems like a useful thing and matches up what would happen if you were using a regular for loop. It also matches the behaviour of _.range.

I've created a pull request for the change here #1048

@jashkenas

jashkenas Apr 4, 2013

Owner

Now that it does work, want to revise this PR?

@jdalton

jdalton Apr 4, 2013

Contributor

Here is the perf and size break downs:
http://jsperf.com/unzip-vs

Contributor

almost commented Apr 4, 2013

I've updated the code to use my patch just merged for _.times

jashkenas added a commit that referenced this pull request Apr 4, 2013

Merge pull request #1047 from almost/unzip-no-sparse
Shortened version of unzip that doesn't create sparse arrays

@jashkenas jashkenas merged commit 556be78 into jashkenas:master Apr 4, 2013

1 check passed

default The Travis build passed
Details
Collaborator

michaelficarra commented Apr 4, 2013

@jashkenas: You sure about this? It's even slower than the already slow implementation underscore already had.

Owner

jashkenas commented Apr 4, 2013

I could go either way, but unzip doesn't seem like a super hot path kind of function -- at least for most folks. Feel free to revert if you feel strongly otherwise.

Contributor

almost commented Apr 4, 2013

Happy to provide an iterative version that doesn't suffer from the sparse array problem if you want. But I kind of figured the same as you that most of the time the speed of unzip isn't going to be the limiting thing in a system. But then I always tend to lean more towards clear and less towards fast, and maybe that's not the correct thing to do in a widely used library :p

Collaborator

michaelficarra commented Apr 4, 2013

@almost: I agree completely. I would write my code like this if it was my own. But this is a very popular library that some people may expect to be performant. And with such huge differences in performance, it's probably better to opt for the unrolled/inlined/iterative approach. But we shouldn't just apply that technique to unzip. All underscore functions that depend on other underscore functions would probably benefit greatly.

Contributor

almost commented Apr 4, 2013

I guess the question is how far to take it, the original used _.each but could have been written with just for loops. The newer version makes an improvement to the older version in that it doesn't produce sparse arrays. To write it completely iteratively would be quite a bit larger than it currently is. May well be worth it though, will have a go.

Contributor

almost commented Apr 4, 2013

Ok, just pushed up an iterative version, and it's actually quite short :)

Here are performance comparisons (iterative is fastest by quite a way of course): http://jsperf.com/various-versions-of-unzip

Contributor

almost commented Apr 4, 2013

See #1049

Contributor

jdalton commented Apr 4, 2013

@michaelficarra

I agree completely. I would write my code like this if it was my own. But this is a very popular library that some people may expect to be performant. And with such huge differences in performance, it's probably better to opt for the unrolled/inlined/iterative approach. But we shouldn't just apply that technique to unzip. All underscore functions that depend on other underscore functions would probably benefit greatly.

Nice, you've been picking up what I've been laying down :3

Collaborator

michaelficarra commented Apr 4, 2013

@jdalton: Your JSConf talk a few years ago convinced me of that fact. Also, pulling feature tests out of functions that provide conditional functionality and instead using separate functions. And the convenient JS technique that allows us to have functions that are a combo of both: they have the conditional in them for the first call, but redefine themselves for more efficient subsequent calls. It was a good talk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment