Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds R.stableSort and tests #1456

Closed
wants to merge 3 commits into from
Closed

Conversation

ConorLinehan
Copy link

Addresses #1135.

if (comparator(a, b) === 0) {
return -(copyList.indexOf(a) - copyList.indexOf(b));
} else {
return comparator(a, b);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why call comparator twice?

@davidchambers
Copy link
Member

Should we update R.sort rather than define a new function?


return copyList.sort(function(a, b) {
if (comparator(a, b) === 0) {
return -(copyList.indexOf(a) - copyList.indexOf(b));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to scanning the list twice with indexOf here would be to zip up the input list with the index of each field and then use the index to perform the secondary comparison.

e.g. something like:

var stableSort = R.curry(function (comparator, list) {
  var withIndex = R.zip(list, R.range(0, list.length));
  var sortedWithIndex = withIndex.sort(function(a, b) {
    var order = comparator(a[0], b[0]);
    return order === 0 ? a[1] - b[1] : order;
  });
  return R.map(R.head, sortedWithIndex);
});

I'd be interested to see how these two compare for different sized lists.

@CrossEye
Copy link
Member

Should we update R.sort rather than define a new function?

👍

@CrossEye
Copy link
Member

An alternative to scanning the list twice with indexOf here would be to zip up the input list with the index of each field and then use the index to perform the secondary comparison. [ ... ]

I'd be interested to see how these two compare for different sized lists.

Agreed. I've used that before for stable sorts.

@ConorLinehan
Copy link
Author

@scott-christopher Heres a benchmark of the different methods http://jsperf.com/stable-sort-ramda

@oskarkv
Copy link

oskarkv commented Mar 22, 2016

Hey guys, where are you on this?

@CrossEye
Copy link
Member

@oskarkook:

Nowhere at the moment. It's fallen between the cracks.

@ConorLinehan: Any interest in addressing the concerns brought up in comments here? I would like to include this.

@ConorLinehan
Copy link
Author

@CrossEye Yup will do :)

@ConorLinehan
Copy link
Author

Should we update R.sort rather than define a new function?

After doing the benchmarks (http://jsperf.com/stable-sort-ramda) with both discussed methods do we still want to update R.sort or create the new R.stableSort function with the following implementation

   var stableSort1 = R.curry(function(comparator, list) {
      var withIndex = R.zip(list, R.range(0, list.length));
      var sortedWithIndex = withIndex.sort(function(a, b) {
        var order = comparator(a[0], b[0]);
        return order === 0 ? a[1] - b[1] : order;
      });
      return R.map(R.head, sortedWithIndex);
    });

@CrossEye
Copy link
Member

I would look for ways to optimize the solution, but at a guess, we may need to have a separate stableSort version.

@scott-christopher
Copy link
Member

I've added an extra benchmark case here which uses the same method, but hand-rolls the zip-with-index, etc. It shows a slight boost of performance in Chrome, but negligible in FF. 🤷

    var stableSort3 = R.curry(function(comparator, list) {
      var sorted = [];
      var idx;

      idx = 0;
      while (idx < list.length) {
        sorted.push([list[idx], idx]);
        idx += 1;
      }

      sorted.sort(function(a, b) {
        var order = comparator(a[0], b[0]);
        return order === 0 ? a[1] - b[1] : order;
      });

      idx = 0;
      while (idx < list.length) {
        sorted[idx] = sorted[idx][0];
        idx += 1;
      }
      return sorted;
    });

* @memberOf R
* @category List
* @sig (a,a -> Number) -> [a] -> [a]
* @param {Function} comparator A sorting function :: a -> b -> Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove a -> b -> Int here. It's confusing that it doesn't match (a,a -> Number) above, and at any rate it's unnecessary to duplicate the information.

@davidchambers
Copy link
Member

I would look for ways to optimize the solution, but at a guess, we may need to have a separate stableSort version.

I don't like the sound of this. Users would then need to choose between the "fast" sorting function which doesn't always do the right thing and the "slow" sorting function which does. This is incidental complexity. I'm imagining the entries in @svozza's I want to… table. I want to sort a list quickly and I want to stably sort a list (and I'm not in a hurry). ;)

@CrossEye
Copy link
Member

I don't like the sound of this. Users would then need to choose between the "fast" sorting function which doesn't always do the right thing and the "slow" sorting function which does.

Which could also be written as, I want to sort a list and I want to sort a list so that equivalent items retain their original order, without mentioning speed or with just trailing parenthetical (faster) and (slower) notations.

We can certainly look for more performant sorters. There is no reason to rely on Array.prototype.sort. If we want to take the complexity hit, we can use Timsort, or we could try a mergesort. There are plenty of options.

Or we can live with the performance hit. That might well be the best, but it certainly would seem worth pursuing the options before making that decision.

@davidchambers
Copy link
Member

My view is that we should make a decision on behalf of our users. If we consider stable sorting to be important enough to justify a significant performance hit, we should update R.sort. If not, we should include a recipe for stable sorting in the cookbook and reference it in the description of R.sort.

@CrossEye
Copy link
Member

I can probably agree to that. It's probably not worth having a second function. But I think there is more investigation necessary to find if we can have our cake and still eat at least some of it.

@ConorLinehan
Copy link
Author

Paraphrasing from this, http://stackoverflow.com/questions/3026281/array-sort-sorting-stability-in-different-browsers, it seems chrome uses a none stable sort and are sticking to it(https://bugs.chromium.org/p/v8/issues/detail?id=90)
Last post

Project Member Comment 58 by jkummerow@chromium.org, Nov 10, 2015
Using O(n) stack space is out of the question, as the stack is way too limited (less than 1 MB). Observe how the existing implementation limits itself to O(log n) stack space even in the worst case (i.e. pathologically unlucky pivot elements) by avoiding the recursive descent into the larger half of the array (lines 1036-1042).

All the other browsers seem to use a stable sort. So from the above benchmarks here we would only be able to get near to firefox. Currently we're roughly 50% slower than firefox 47, So I suppose it's picking an acceptable tolerance?

@CrossEye
Copy link
Member

But see also a comparison with various merge-sort implementation:

http://jsperf.com/stable-sort-comparison/5

Timsort is quite a bit faster, but it also is substantially more complex to implement. But most merge-sort implementation are stable, I believe. That might be enough.

If we want stability, we obviously cannot use the native sort. Even if all the current crop of browsers did use a stable version, there is simply no guarantee. We could build atop the native version as above, or we could use our own, mergesort, timsort, or one of many other choices.

@CrossEye
Copy link
Member

So, should we consider an implementation like this instead?:

module.exports = _curry2(function sort(comparator, list) {

  return msort(_slice(list), 0, list.length);

  function msort(list, begin, end) {
    var size = end - begin;
    if (size < 2) return;
    var middle = begin + Math.floor(size / 2);

    msort(list, begin, middle);
    msort(list, middle, end);
    merge(list, begin, middle, end);
    return list;
  }

  function merge(list, begin, middle, end) {
    while (begin < middle) {
      if (comparator(list[begin], list[middle]) > 0) {
        var val = list[begin];
        list[begin] = list[middle];
        insert(list, middle, end, val);
      }
      begin += 1;
    }
  }

  function insert(list, begin, end, val) {
    while (begin + 1 < end && comparator(list[begin + 1], val) < 0) {
      swap(list, begin, begin + 1);
      begin += 1;
    }
    list[begin] = val;
  }

  function swap(list, a, b) {
    var tmp = list[a];
    list[a] = list[b];
    list[b] = tmp;
  }
});

(Perhaps this would need to be pulled into an _internal version to be reused by sortBy.)

Of course this depends hugely on internal mutation, but that doesn't escape the function, so I don't see any issues.

This is a really basic merge sort and maybe one of the ones at http://jsperf.com/stable-sort-comparison/5 would be faster, but anything based on the same ideas might not only add stability but also actually be more performant (for now) than Array.prototype.sort.

@ajhyndman
Copy link

I'd be keen to see this move ahead. I think a not-guaranteed-to-be-stable sort implementation is a bit of a foot-gun in JavaScript, and I think Ramda would do well to guard users from that trap.

@davidchambers
Copy link
Member

It's worth mentioning that Z.sort performs a stable sort. Ramda would get stable sorting as part of the package if #2347 is merged one day.

@wojpawlik
Copy link
Contributor

#2926 (comment)

@wojpawlik wojpawlik closed this Dec 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants