Suboptimal sort algorithm used in /src/pkg/sort/sort.go #467

Closed
gopherbot opened this Issue Dec 26, 2009 · 15 comments

Projects

None yet

5 participants

@gopherbot

by mimenarrator:

The Go sort library implements quicksort, which has a worst-case runtime of
O(n^2), and not introsort, with a worst case runtime of O(n*log(n)), as,
for example, GNU's libstdc++ does. Introsort almost always uses the same
algorithm as quicksort, but when pathological cases are detected that lead
to O(n^2) runtime of quicksort (by keeping track of the recursion depth),
heapsort is used instead, since it is guaranteed O(n*log(n)), though on
average slower than quicksort. This is obviously not of the highest
priority, especially given the rarity of pathological cases for the
quicksort implementation used in Go, but they do exist, and can, for
example, but exploited in denial of service attacks.
@rsc
Contributor
rsc commented Dec 28, 2009

Comment 1:

Owner changed to r...@golang.org.

Status changed to Accepted.

@gopherbot

Comment 2 by noelgrandin:

If you're going to replace the sort algorith, Timsort appears to be gaining traction in 
Python and Java.
http://en.wikipedia.org/wiki/Timsort
http://svn.python.org/projects/python/trunk/Objects/listsort.txt
@gopherbot

Comment 3 by mimenarrator:

Hmm, looks like same average and guaranteed runtimes as introsort. I wonder if anyone
has done any empirical analysis of which is superior and under what conditions.
@krasin
krasin commented Jan 15, 2010

Comment 4:

Timsort is stable and Introsort is not (because introsort relies on quicksort). 
So, I would prefer Timsort.
@gopherbot

Comment 5 by mimenarrator:

Ideally both would be implemented if Introsort is faster, one for use when stability
is necessary and one for use when it is not, while only Timsort would need to be
implemented if it was faster.
@gopherbot

Comment 6 by gshimano:

I will be curious to see implementation in Go of any of the parallel sorting algorithms.
Any reason that's not done yet?
@gopherbot

Comment 7 by jemfinch:

Because it's derived from merge sort, timsort requires at least O(n/2) extra memory. 
That may be cheap in Python (where the lists being sorted are essentially arrays of
pointers to much larger objects), but would not be as universally acceptable in Go.
@gopherbot

Comment 8 by l4zycod3r:

> Because it's derived from merge sort, timsort requires at least O(n/2) extra memory.  
This sort uses pointers to values instead of real storage memory:
Look at http://svn.python.org/projects/python/trunk/Objects/listsort.txt : 
"+ timsort can require a temp array containing as many as N//2 pointers,
  which means as many as 2*N extra bytes on 32-bit boxes.  It can be
  expected to require a temp array this large when sorting random data; on
  data with significant structure, it may get away without using any extra
  heap memory.  This appears to be the strongest argument against it, but
  compared to the size of an object, 2 temp bytes worst-case (also expected-
  case for random data) doesn't scare me much."
@rsc
Contributor
rsc commented Jan 20, 2011

Comment 9:

the extra memory is a big deal.  
criteria for something to replace sort.Sort
1. in place
2. no memory allocation
3. guaranteed n log n time worst case
4. not significantly more complicated than the existing implementation
i don't know of any such algorithms so i'm happy with quicksort for now.
but if someone does, fine.

Labels changed: added priority-low.

Status changed to HelpWanted.

@gopherbot

Comment 10 by pgmmpk:

Timsort is now available at package dashboard.
@gopherbot

Comment 11 by jckanis:

AFAIK what distinguishes Timsort from other sorting algorithms is that it can take a lot
more advantage of already sorted (or reverse-sorted) runs within the collection, to
minimise the number of comparisons it has to do. Timsort does a lot fewer comparisons
when sorting data that already has some structure. There is (I guess) some time overhead
to do this, so it really pays off when the comparison is expensive, for example because
it involves calling a user-defined function or calling interpreted code. This is why it
is a good choice for Python and why (I think) Java uses Timsort when sorting objects,
but quicksort/introsort when sorting primitive types.
@gopherbot

Comment 12 by zzyz666:

We can Fisher–Yates shuffle the elements in linear time and statistically avoid worst
case complexity.
@rsc
Contributor
rsc commented Sep 7, 2011

Comment 13:

This issue was closed by revision 21e49cb.

Status changed to Fixed.

@rsc rsc was assigned by gopherbot Sep 7, 2011
@LopezCatherine

@hatahet, @rsc, regarding 21e49cb, wasn't it simpler to just do shuffling instead? just like zzyz666 pointed out.

@bradfitz
Member
bradfitz commented Sep 2, 2015

@LopezCatherine, this bug is 6 years old. Much has changed in 6 years. Open a new bug instead if something is still suboptimal, but search for open duplicates first.

@emirpasic emirpasic referenced this issue in emirpasic/gods Jun 24, 2016
Closed

Replace Timsort #16

@gopherbot gopherbot locked and limited conversation to collaborators Sep 4, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.