Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: sort: add the Dedup function #41517

Closed
rokkerruslan opened this issue Sep 21, 2020 · 5 comments
Closed

proposal: sort: add the Dedup function #41517

rokkerruslan opened this issue Sep 21, 2020 · 5 comments
Labels
Projects
Milestone

Comments

@rokkerruslan
Copy link

@rokkerruslan rokkerruslan commented Sep 21, 2020

There is a small suggestion for extending the sort package. There are tasks where you need to remove duplicates from a slice. Yes, on the one hand we can use the map. But the inconvenience is that you have to do this for each separate type in the program. We have a function for sorting a slice, searching by a slice. Why don't we have a slice deduplication feature?

One could suggest doing by analogy with sort.Slice:

func Slice(slice interface{}, less func(i, j int) bool) {

or with sort.Search:

func Search(n int, f func(int) bool) int {

Conceptually, the solution might look like this:

func Dedup(slice interface{}, eq func(int, int) bool) { // Dedup in sort package.
    ptr := reflect.TypeOf(slice)
    if ptr.Kind() != reflect.Ptr && ptr.Elem().Kind() != reflect.Slice {
        panic("obj is not a slice")
    }

    v := reflect.ValueOf(slice)
    if v.Kind() == reflect.Ptr {
        v = v.Elem()
    }

    l := v.Len()

    // Fast path for slices of size 0 and 1. Nothing to deduplicate.
    if l <= 1 {
        return
    }

    next := 0
    for i := 0; i < l; i++ {
        if i < l-1 && eq(i, i+1) {
            continue
        }

        v.Index(next).Set(v.Index(i))
        next++
    }

    v.Set(v.Slice(0, next))
}

func main() {
    // With simple type
    input := []int{1, 1, 1, 3, 5, 5, 7, 42}
    sort.Dedup(&input, func(i int, i2 int) bool {
        return input[i] == input[i2]
    })

    // With complex type (we check only A field)
    input2 := []ComplexType{{A: 1, B: 1}, {A: 1, B: 2}, {A: 2, B: 2}}
    sort.Dedup(&input2, func(i int, i2 int) bool {
        return input2[i].A == input2[i].A
    })
}

Solution inspired https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup

The change may not affect the runtime of the language, be compatible with version 1. Can we consider this option?

If so, I will prepare a real CL.

cc @davecheney

@gopherbot gopherbot added this to the Proposal milestone Sep 21, 2020
@gopherbot gopherbot added the Proposal label Sep 21, 2020
@beoran
Copy link

@beoran beoran commented Sep 21, 2020

This should probably be named 'Uniq' for the posix uniq command, and we had best wait until we get generics in Go. Then this will be trivial to implement.

@ianlancetaylor ianlancetaylor changed the title proposal: add the sort.Dedup function proposal: sort: add the Dedup function Sep 21, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Sep 21, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 21, 2020

Thanks for the proposal.

It's not obvious to me that sort.Interface is always the right approach here--you don't use it in your example code. So it's not clear to me that this is a great fit for the existing sort package.

Also, https://golang.org/doc/faq#x_in_std. This can be written outside of the standard library, so is it a common enough operation that it ought to be in the standard library? Can you point to some examples of code that would use this if it were available?

I also tend to agree with @beoran that this ought to wait for generics.

Thanks again.

@rsc
Copy link
Contributor

@rsc rsc commented Sep 23, 2020

I'm not convinced this should be tied up with sorting at all.
Yes, if you sort first then you can remove more duplicates.
But the removing duplicates operation can make sense even on unsorted data (without a sort step).

I expected that this would be slices.Uniq once generics happen.

@mpx
Copy link
Contributor

@mpx mpx commented Sep 24, 2020

For my use cases, I've found it's far more common to start with unsorted data. Hence it may be faster to deduplicate via a map since sorting is O(n.log(n)) - provided the extra temporary memory use is acceptable.

Providing this functionality via the sort package would also encourage using a less efficient method, and/or risks people being surprised when unsorted data is not correctly de-duplicated.

Waiting for generics seems preferable.

@rokkerruslan
Copy link
Author

@rokkerruslan rokkerruslan commented Sep 24, 2020

I think that finding duplicates is a part that should be presented in the language library, but implementation using generics is good. So I close it. Thanks for the comments!

@rsc rsc moved this from Incoming to Declined in Proposals Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Declined
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.