New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal sort algorithm used in /src/pkg/sort/sort.go #467

Closed
gopherbot opened this Issue Dec 26, 2009 · 15 comments

Comments

Projects
None yet
5 participants
@gopherbot

gopherbot commented Dec 26, 2009

by mimenarrator:

The Go sort library implements quicksort, which has a worst-case runtime of
O(n^2), and not introsort, with a worst case runtime of O(n*log(n)), as,
for example, GNU's libstdc++ does. Introsort almost always uses the same
algorithm as quicksort, but when pathological cases are detected that lead
to O(n^2) runtime of quicksort (by keeping track of the recursion depth),
heapsort is used instead, since it is guaranteed O(n*log(n)), though on
average slower than quicksort. This is obviously not of the highest
priority, especially given the rarity of pathological cases for the
quicksort implementation used in Go, but they do exist, and can, for
example, but exploited in denial of service attacks.
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Dec 28, 2009

Contributor

Comment 1:

Owner changed to r...@golang.org.

Status changed to Accepted.

Contributor

rsc commented Dec 28, 2009

Comment 1:

Owner changed to r...@golang.org.

Status changed to Accepted.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Jan 15, 2010

Comment 2 by noelgrandin:

If you're going to replace the sort algorith, Timsort appears to be gaining traction in 
Python and Java.
http://en.wikipedia.org/wiki/Timsort
http://svn.python.org/projects/python/trunk/Objects/listsort.txt

gopherbot commented Jan 15, 2010

Comment 2 by noelgrandin:

If you're going to replace the sort algorith, Timsort appears to be gaining traction in 
Python and Java.
http://en.wikipedia.org/wiki/Timsort
http://svn.python.org/projects/python/trunk/Objects/listsort.txt
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Jan 15, 2010

Comment 3 by mimenarrator:

Hmm, looks like same average and guaranteed runtimes as introsort. I wonder if anyone
has done any empirical analysis of which is superior and under what conditions.

gopherbot commented Jan 15, 2010

Comment 3 by mimenarrator:

Hmm, looks like same average and guaranteed runtimes as introsort. I wonder if anyone
has done any empirical analysis of which is superior and under what conditions.
@krasin

This comment has been minimized.

Show comment
Hide comment
@krasin

krasin Jan 15, 2010

Comment 4:

Timsort is stable and Introsort is not (because introsort relies on quicksort). 
So, I would prefer Timsort.

krasin commented Jan 15, 2010

Comment 4:

Timsort is stable and Introsort is not (because introsort relies on quicksort). 
So, I would prefer Timsort.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Jan 15, 2010

Comment 5 by mimenarrator:

Ideally both would be implemented if Introsort is faster, one for use when stability
is necessary and one for use when it is not, while only Timsort would need to be
implemented if it was faster.

gopherbot commented Jan 15, 2010

Comment 5 by mimenarrator:

Ideally both would be implemented if Introsort is faster, one for use when stability
is necessary and one for use when it is not, while only Timsort would need to be
implemented if it was faster.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Nov 17, 2010

Comment 6 by gshimano:

I will be curious to see implementation in Go of any of the parallel sorting algorithms.
Any reason that's not done yet?

gopherbot commented Nov 17, 2010

Comment 6 by gshimano:

I will be curious to see implementation in Go of any of the parallel sorting algorithms.
Any reason that's not done yet?
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Jan 11, 2011

Comment 7 by jemfinch:

Because it's derived from merge sort, timsort requires at least O(n/2) extra memory. 
That may be cheap in Python (where the lists being sorted are essentially arrays of
pointers to much larger objects), but would not be as universally acceptable in Go.

gopherbot commented Jan 11, 2011

Comment 7 by jemfinch:

Because it's derived from merge sort, timsort requires at least O(n/2) extra memory. 
That may be cheap in Python (where the lists being sorted are essentially arrays of
pointers to much larger objects), but would not be as universally acceptable in Go.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Jan 11, 2011

Comment 8 by l4zycod3r:

> Because it's derived from merge sort, timsort requires at least O(n/2) extra memory.  
This sort uses pointers to values instead of real storage memory:
Look at http://svn.python.org/projects/python/trunk/Objects/listsort.txt : 
"+ timsort can require a temp array containing as many as N//2 pointers,
  which means as many as 2*N extra bytes on 32-bit boxes.  It can be
  expected to require a temp array this large when sorting random data; on
  data with significant structure, it may get away without using any extra
  heap memory.  This appears to be the strongest argument against it, but
  compared to the size of an object, 2 temp bytes worst-case (also expected-
  case for random data) doesn't scare me much."

gopherbot commented Jan 11, 2011

Comment 8 by l4zycod3r:

> Because it's derived from merge sort, timsort requires at least O(n/2) extra memory.  
This sort uses pointers to values instead of real storage memory:
Look at http://svn.python.org/projects/python/trunk/Objects/listsort.txt : 
"+ timsort can require a temp array containing as many as N//2 pointers,
  which means as many as 2*N extra bytes on 32-bit boxes.  It can be
  expected to require a temp array this large when sorting random data; on
  data with significant structure, it may get away without using any extra
  heap memory.  This appears to be the strongest argument against it, but
  compared to the size of an object, 2 temp bytes worst-case (also expected-
  case for random data) doesn't scare me much."
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Jan 20, 2011

Contributor

Comment 9:

the extra memory is a big deal.  
criteria for something to replace sort.Sort
1. in place
2. no memory allocation
3. guaranteed n log n time worst case
4. not significantly more complicated than the existing implementation
i don't know of any such algorithms so i'm happy with quicksort for now.
but if someone does, fine.

Labels changed: added priority-low.

Status changed to HelpWanted.

Contributor

rsc commented Jan 20, 2011

Comment 9:

the extra memory is a big deal.  
criteria for something to replace sort.Sort
1. in place
2. no memory allocation
3. guaranteed n log n time worst case
4. not significantly more complicated than the existing implementation
i don't know of any such algorithms so i'm happy with quicksort for now.
but if someone does, fine.

Labels changed: added priority-low.

Status changed to HelpWanted.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Jan 21, 2011

Comment 10 by pgmmpk:

Timsort is now available at package dashboard.

gopherbot commented Jan 21, 2011

Comment 10 by pgmmpk:

Timsort is now available at package dashboard.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Jan 25, 2011

Comment 11 by jckanis:

AFAIK what distinguishes Timsort from other sorting algorithms is that it can take a lot
more advantage of already sorted (or reverse-sorted) runs within the collection, to
minimise the number of comparisons it has to do. Timsort does a lot fewer comparisons
when sorting data that already has some structure. There is (I guess) some time overhead
to do this, so it really pays off when the comparison is expensive, for example because
it involves calling a user-defined function or calling interpreted code. This is why it
is a good choice for Python and why (I think) Java uses Timsort when sorting objects,
but quicksort/introsort when sorting primitive types.

gopherbot commented Jan 25, 2011

Comment 11 by jckanis:

AFAIK what distinguishes Timsort from other sorting algorithms is that it can take a lot
more advantage of already sorted (or reverse-sorted) runs within the collection, to
minimise the number of comparisons it has to do. Timsort does a lot fewer comparisons
when sorting data that already has some structure. There is (I guess) some time overhead
to do this, so it really pays off when the comparison is expensive, for example because
it involves calling a user-defined function or calling interpreted code. This is why it
is a good choice for Python and why (I think) Java uses Timsort when sorting objects,
but quicksort/introsort when sorting primitive types.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot May 27, 2011

Comment 12 by zzyz666:

We can Fisher–Yates shuffle the elements in linear time and statistically avoid worst
case complexity.

gopherbot commented May 27, 2011

Comment 12 by zzyz666:

We can Fisher–Yates shuffle the elements in linear time and statistically avoid worst
case complexity.
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Sep 7, 2011

Contributor

Comment 13:

This issue was closed by revision 21e49cb.

Status changed to Fixed.

Contributor

rsc commented Sep 7, 2011

Comment 13:

This issue was closed by revision 21e49cb.

Status changed to Fixed.

@LopezCatherine

This comment has been minimized.

Show comment
Hide comment
@LopezCatherine

LopezCatherine Sep 2, 2015

@hatahet, @rsc, regarding 21e49cb, wasn't it simpler to just do shuffling instead? just like zzyz666 pointed out.

LopezCatherine commented Sep 2, 2015

@hatahet, @rsc, regarding 21e49cb, wasn't it simpler to just do shuffling instead? just like zzyz666 pointed out.

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Sep 2, 2015

Member

@LopezCatherine, this bug is 6 years old. Much has changed in 6 years. Open a new bug instead if something is still suboptimal, but search for open duplicates first.

Member

bradfitz commented Sep 2, 2015

@LopezCatherine, this bug is 6 years old. Much has changed in 6 years. Open a new bug instead if something is still suboptimal, but search for open duplicates first.

@golang golang locked and limited conversation to collaborators Sep 4, 2016

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.