Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: slices: add functions MinNFunc and MaxNFunc #67376

Open
jech opened this issue May 15, 2024 · 5 comments
Open

proposal: slices: add functions MinNFunc and MaxNFunc #67376

jech opened this issue May 15, 2024 · 5 comments
Labels
Milestone

Comments

@jech
Copy link

jech commented May 15, 2024

Proposal Details

I propose the addition of a function

func MinNFunc[S ~[]E, E any](n int, x S, cmp func(a, b E) int) S

which returns the n smallest elements of x in increasing order. If len(x) < n, then MinNFunc returns x sorted.

MinNFunc(n, x, cmp) is equivalent to

    y := SortFunc(x, cmp)
    if len(y) < n {
        return y
    }
    return y[:n]

but it avoids an allocation of size O(len(x)).

For symmetry, there could be a function MaxNFunc which returns the n largest elements of x in decreasing order. MaxNFunc(n, x, cmp) is equivalent to

MinNFunc(n, x, func(a, b E) int {
    return -cmp(a, b)
})
@jech jech added the Proposal label May 15, 2024
@gopherbot gopherbot added this to the Proposal milestone May 15, 2024
@seankhliao
Copy link
Member

this sounds very specialized
does it have wide usage elsewhere?
https://go.dev/doc/faq#x_in_std

@earthboundkid
Copy link
Contributor

earthboundkid commented May 15, 2024

I think this is the wrong proposal. Instead container/heap should be made generic, and you can just pop the first N items off a heap. See #47632.

@jech
Copy link
Author

jech commented May 16, 2024

this sounds very specialized
does it have wide usage elsewhere?

I am not aware of it being included in the stdlib of any popular language, granted. However, it is a simple, well-defined operation that I often find useful and that is not completely trivial to implement efficiently, but that can reuse a lot of the machinery in pdqsort.

As an additional data point, if you perform a web search for "k largest elements", it appears to be a fairly popular operation, but perhaps not for the right reasons (it would seem it's a standard interview question, for some reason).

Instead container/heap should be made generic, and you can just pop the first N items off a heap. See #47632.

If you mean that MinNFunc can be implemented using a priority queue (and therefore using a binary heap) in O(n log k) time, then you're absolutely right. However, doing it efficiently is not completely trivial (the naive implementation copies every element, even those that end up being discarded.)

If you mean that I should be using a priority queue instead of computing the prioritary elements on demand, then I cannot do that: the priority changes dynamically (it depends on the network peers, don't ask), and there's no obvious way to determine when the priorities have changed sufficiently to justify rebuilding the priority queue.

I agree that it'd be nice to have a generic priority queue, but that's orthogonal to this proposal.

@tsenart
Copy link

tsenart commented Sep 5, 2024

I was just now going to open a new proposal, and ran into this issue. Let me start by chiming in here. What you're describing here sounds like a selection algorithm which has average case O(n), like quick select. I've recently implemented pdqselect and published it open source: https://github.com/tsenart/pdqselect

It's based on Go's pdqsort implementation, and in fact all the zsort*.go files are copied from the Go tree. The algorithm is an adaptation of pdqsort to be a selection algorithm like quick select. It exists similarly in the Rust and C++ ecosystems:

For inclusion in the stdlib, we could have something like this (follow my module's current design, more or less):

  • sort.Select(sort.IntSlice([]int{42, 12, 1}, 2))
  • slices.Select([]int{42, 12, 1}, 2)
  • slices.SelectFunc([]int{42, 12, 1}, 2, cmp.Compare)

With each of these ensuring only that after they're called the elements at slice[0:k] (where k=2 in the examples) are the first k elements if the slice was sorted (but they aren't necessarily sorted amongst themselves).

Not sure if the Go team is open to consider including something like this in the stdlib, but worth a discussion, given there's precedent in other languages, general utility and most of the code is already written — there's not a lot of new code in my module, mostly just pdqselect.go with all three variations for sort.Interface, cmp.Ordered and func(a, b E) int, as well as fuzz tests and benchmarks.

@tsenart
Copy link

tsenart commented Sep 6, 2024

Also, cc @zhangyunhao116 and @eliben, would like your input on the above, given your pdqsort work recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

5 participants