{{ message }}

# proposal: sort: add the Dedup function #41517

Closed
opened this issue Sep 21, 2020 · 5 comments
Closed

# proposal: sort: add the Dedup function #41517

opened this issue Sep 21, 2020 · 5 comments
Labels
Projects
Milestone

### rokkerruslan commented Sep 21, 2020 • edited

 There is a small suggestion for extending the sort package. There are tasks where you need to remove duplicates from a slice. Yes, on the one hand we can use the `map`. But the inconvenience is that you have to do this for each separate type in the program. We have a function for sorting a slice, searching by a slice. Why don't we have a slice deduplication feature? One could suggest doing by analogy with `sort.Slice`: `func Slice(slice interface{}, less func(i, j int) bool) {` or with `sort.Search`: `func Search(n int, f func(int) bool) int {` Conceptually, the solution might look like this: ```func Dedup(slice interface{}, eq func(int, int) bool) { // Dedup in sort package. ptr := reflect.TypeOf(slice) if ptr.Kind() != reflect.Ptr && ptr.Elem().Kind() != reflect.Slice { panic("obj is not a slice") } v := reflect.ValueOf(slice) if v.Kind() == reflect.Ptr { v = v.Elem() } l := v.Len() // Fast path for slices of size 0 and 1. Nothing to deduplicate. if l <= 1 { return } next := 0 for i := 0; i < l; i++ { if i < l-1 && eq(i, i+1) { continue } v.Index(next).Set(v.Index(i)) next++ } v.Set(v.Slice(0, next)) } func main() { // With simple type input := []int{1, 1, 1, 3, 5, 5, 7, 42} sort.Dedup(&input, func(i int, i2 int) bool { return input[i] == input[i2] }) // With complex type (we check only A field) input2 := []ComplexType{{A: 1, B: 1}, {A: 1, B: 2}, {A: 2, B: 2}} sort.Dedup(&input2, func(i int, i2 int) bool { return input2[i].A == input2[i].A }) }``` Solution inspired https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup The change may not affect the runtime of the language, be compatible with version 1. Can we consider this option? If so, I will prepare a real CL.
added this to the Proposal milestone Sep 21, 2020
added the label Sep 21, 2020

### beoran commented Sep 21, 2020

 This should probably be named 'Uniq' for the posix uniq command, and we had best wait until we get generics in Go. Then this will be trivial to implement.
changed the title proposal: add the sort.Dedup function proposal: sort: add the Dedup function Sep 21, 2020
added this to Incoming in Proposals Sep 21, 2020

### ianlancetaylor commented Sep 21, 2020

 Thanks for the proposal. It's not obvious to me that `sort.Interface` is always the right approach here--you don't use it in your example code. So it's not clear to me that this is a great fit for the existing sort package. Also, https://golang.org/doc/faq#x_in_std. This can be written outside of the standard library, so is it a common enough operation that it ought to be in the standard library? Can you point to some examples of code that would use this if it were available? I also tend to agree with @beoran that this ought to wait for generics. Thanks again.

### rsc commented Sep 23, 2020

 I'm not convinced this should be tied up with sorting at all. Yes, if you sort first then you can remove more duplicates. But the removing duplicates operation can make sense even on unsorted data (without a sort step). I expected that this would be slices.Uniq once generics happen.

### mpx commented Sep 24, 2020 • edited

 For my use cases, I've found it's far more common to start with unsorted data. Hence it may be faster to deduplicate via a map since sorting is O(n.log(n)) - provided the extra temporary memory use is acceptable. Providing this functionality via the `sort` package would also encourage using a less efficient method, and/or risks people being surprised when unsorted data is not correctly de-duplicated. Waiting for generics seems preferable.

### rokkerruslan commented Sep 24, 2020

 I think that finding duplicates is a part that should be presented in the language library, but implementation using generics is good. So I close it. Thanks for the comments!
moved this from Incoming to Declined in Proposals Oct 1, 2020