Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: slices: new package to provide generic slice functions #45955

Open
ianlancetaylor opened this issue May 5, 2021 · 182 comments
Open

proposal: slices: new package to provide generic slice functions #45955

ianlancetaylor opened this issue May 5, 2021 · 182 comments

Comments

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 5, 2021

This proposal is for use with #43651. We propose defining a new package, slices, that will provide functions that may be used with slices of any type. If this proposal is accepted, the new package will be included with the first release of Go that implements #43651 (we currently expect that that will be Go 1.18).

This description below is focused on the API, not the implementation. In general the implementation will be straightforward.

// Package slices defines various functions useful with slices of any type.
// Unless otherwise specified, these functions all apply to the elements
// of a slice at index 0 <= i < len(s).
package slices

import "constraint" // See #45458 

// Equal reports whether two slices are equal: the same length and all
// elements equal. Floating point NaNs are not considered equal.
// The elements are compared in index order, and the comparison
// stops at the first unequal pair.
func Equal[T comparable](s1, s2 []T) bool

// EqualFunc reports whether two slices are equal using a comparison
// function on each pair of elements. The elements are compared in
// index order, and the comparison stops at the first index for which
// eq returns false.
func EqualFunc[T any](s1, s2 []T, eq func(T, T) bool) bool

// Compare does a lexicographic comparison of s1 and s2.
// The elements are compared sequentially starting at index 0,
// until one element is not equal to the other. The result of comparing
// the first non-matching elements is the result of the comparison.
// If both slices are equal until one of them ends, the shorter slice is
// lexicographically less than the longer one
// The result will be 0 if s1==s2, -1 if s1 < s2, and +1 if s1 > s2.
func Compare[T constraint.Ordered](s1, s2 []T) int

// CompareFunc is like Compare, but uses a comparison function
// on each pair of elements. The elements are compared in index order,
// and the comparisons stop after the first time cmp returns non-zero.
// The result will be the first non-zero result of cmp; if cmp always
// returns 0 the result is 0 if len(s1) == len(s2), -1 if len(s1) < len(s2),
// and +1 if len(s1) > len(s2).
func CompareFunc[T any](s1, s2 []T, cmp func(T, T) int) int

// Index returns the index of the first occurrence of v in s, or -1 if not present.
func Index[T comparable](s []T, v T) int

// IndexFunc is like Index but uses a comparison function.
func IndexFunc[T any](s []T, v T, cmp func(T, T) bool) int

// Contains reports whether v is present in s.
func Contains[T comparable](s []T, v T) bool

// ContainsFunc is like Contains, but it uses a comparison function.
func ContainsFunc[T any](s []T, v T, cmp func(T, T) bool) bool

// LastIndex? LastIndexFunc?

// Map calls the function f on each element of s,
// appending the results to d. It returns the modified d.
// If T1 and T2 are the same type, d and s may be the same slice;
// if the slices overlap but are not the same, the results are unspecified.
// To create a new slice with the mapping results, pass nil for d.
func Map[T1, T2 any](d []T2, s []T1, f func(T1) T2) []T2

// Filter appends to d each element e of s for which keep(e) returns true.
// It returns the modified d. d and s may be the same slice;
// if the slices overlap but are not the same, the results are unspecified.
// To create a new slice with the filtered results, pass nil for d.
func Filter[T any](d, s []T, keep func(T) bool) []T

// Reduce reduces a []T1 to a single value of type T2 using
// a reduction function. This applies f cumulatively to the elements of s.
// For example, if s a slice of int, the sum of the elements of s is
// Reduce(s, 0, func(a, b int) int { return a + b }).
func Reduce[T1, T2 any](initializer T2, s []T1, f func(T2, T1) T2) T2

// Insert inserts the values v... into s at index i, returning the modified slice.
// In the returned slice r, r[i] == the first v.  This panics if !(i >= 0 && i <= len(s)).
// This function is O(len(s) + len(v)).
func Insert[T any](s []T, i int, v ...T) []T

// Remove removes the element at index i from s, returning the modified slice.
// This panics if !(i >= 0 && i < len(s)).  This function is O(len(s)).
// This modifies the contents of the slice s; it does not create a new slice.
func Remove[T any](s []T, i int) []T

// RemoveSlice removes j-i elements from s starting at index i, returning the modified slice.
// This can be thought of as a reverse slice expression: it removes a subslice.
// This panics if i < 0 || j < i || j > len(s).
// This modifies the contents of the slice s; it does not create a new slice.
func RemoveSlice[T any](s []T, i, j int) []T

// Should RemoveSlice just replace Remove, since RemoveSlice(s, x, x+1)
// is the same as Remove(s, x)?

// Uniq replaces consecutive runs of equal elements with a single copy.
// This is like the uniq command found on Unix.
// This modifies the contents of the slice s; it does not create a new slice.
func Uniq[T comparable](s []T) []T

// UniqFunc is like Uniq, but uses a comparison function.
func UniqFunc[T any](s []T, cmp func(T, T) bool) []T

// Resize returns s with length c. If the length increases, the new trailing
// elements will be set to the zero value of T.  If the length decreases,
// the result is s[:c].  The capacity of the returned slice is not specified.
func Resize[T any](s []T, c int) []T
@gopherbot gopherbot added this to the Proposal milestone May 5, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals May 5, 2021
@randall77

This comment has been hidden.

@cespare

This comment has been hidden.

@cespare
Copy link
Contributor

@cespare cespare commented May 5, 2021

In general I think a slices package will be necessary. Functions like Equal, Insert, and Remove will be especially welcome improvements over the current boilerplate.

I have two concerns about the list of functions here:

  1. Too much bloat to start with. I think we should err on the side of being too minimal and add new functions in future releases as they prove worthwhile. For instance, I think that CompareFunc will be rarely used and doesn't obviously pull its weight here.

  2. Too much emphasis on a functional programming style using higher-order functions. My experience with languages like JS and Java (and quite a few others which were more C-like in their initial incarnations and added a lot of higher-order function stuff later) is that there is an unfortunate dichotomy between writing for loops and using higher-order functions.

    Code which heavily uses the latter looks very different on the page from the former, impeding readability.

    The higher-order function style also tends to be much slower, so when writing a loop there's a decision burden every time between saving a line or two and writing fast code. (TIMTOWTDI is not really the Go ethos.) And programmers usually prefer concision, so the net result is often slow, HOF-heavy code.

    The HOF style of code is also a bit clunky with Go's current function literals; adding many more HOFs will create pressure to add new syntax (#21498).

    In summary, I feel like this nudges the language in a non-Go-like direction. A thing that I really like about Go is that, for many tasks, I just write a plain, clear for loop which gets compiled to the obvious machine code. I don't look forward to a future where the prevailing style is to replace each for loop with a sufficiently clever slices.Reduce.

With these ideas in mind, I'd divide this list of functions into three groups.

Clearly worthwhile in a v1 of slices:

  • Equal
  • Compare
  • Index
  • Contains
  • LastIndex
  • Insert
  • InsertSlice
  • Remove
  • RemoveSlice
  • Resize

Borderline:

  • EqualFunc
  • ContainsFunc
  • Map
  • Filter
  • FilterInPlace

Not worth it:

  • CompareFunc
  • IndexFunc
  • LastIndexFunc
  • Reduce
  • MapInPlace
@ianlancetaylor

This comment has been hidden.

@ianlancetaylor

This comment has been hidden.

@mnasruul

This comment was marked as off-topic.

@IceWreck
Copy link

@IceWreck IceWreck commented May 5, 2021

Why not container/slices and later if needed container/maps ?

Anyways this package would be a welcome addition because its a pain (boilerplate wise) to implement common data structures in Go compared to C++ and this would slightly reduce that boilerplate

@sfllaw
Copy link

@sfllaw sfllaw commented May 5, 2021

Should this new package mention https://github.com/golang/go/wiki/SliceTricks in its documentation? Should SliceTricks document the new package? How does this wiki page fit into the revised ecosystem?

@sfllaw

This comment has been hidden.

@mvdan

This comment has been hidden.

@sfllaw

This comment has been hidden.

@fzipp

This comment has been hidden.

@komuw

This comment has been hidden.

@Merovius
Copy link

@Merovius Merovius commented May 5, 2021

One think to point out is that the Map and MapInPlace could be unified under MapAppend[D, S any](dst []D, src []S, f func(S) D), which appends to dst - Map would be equivalent to passing nil for dst and MapInPlace would be passing dst[:0]. Similar for Filter. Not saying we should, just that we could. And I find InPlace a slightly weird bikeshed color for Map, where D and S can be different, thus the mapping is never "in place".

Personally, I'm a bit concerned about Index and especially Contains. IMO they suggest that unordered slices are a suitable data structure for sets. Anecdotally, using x in y for a Python list was the single most common mistake I had to deduct points for [edit] in an algorithms and datastructure class. Really need to hire a copy-editor [/edit]

I'd be less concerned if there was the equivalent thing for sorted slices as well, so that people at least be made to think about it. Though that would probably live in sort.

@extrasalt
Copy link

@extrasalt extrasalt commented May 5, 2021

Would flatten and flatMap also be included?

I can think of cases where a map would produce a slice of slices and that might need flattening

@DylanMeeus
Copy link

@DylanMeeus DylanMeeus commented May 5, 2021

I think this is a good idea, and it opens the door to including more functions by default in the future. Functions like these are going to either be provided by the standard library, or we'll end up with multiple "third party" libraries which do these things all in their own way, which will be used by a ton of projects anyway. Especially when people migrate from other languages (Java, C#, ..) they will look for such functions.

Providing this out of the box would be in-line with how Go provides other frequently used constructs natively (like handling json data).

@sluongng
Copy link

@sluongng sluongng commented May 5, 2021

// Resize returns s with length c. If the length increases, the new trailing
// elements will be set to the zero value of T.  The capacity of the returned
// slice is not specified.
func Resize[T any](s []T, c int) []T

Not obvious what would be the behavior in case where len(s) > c. Would the returning slice be a truncated s, taking the first c elements or will s be returned as-is?

// Map turns a []T1 to a []T2 using a mapping function.
func Map[T1, T2 any](s []T1, f func(T1) T2) []T2

// Filter returns a new slice containing all elements e in s for which keep(e) is true.
func Filter[T any](s []T, keep func(T) bool) []T

// MapInPlace copies the elements in src to dst, using a mapping function
// to convert their values. This panics if len(dst) < len(src).
// (Or we could return min(len(dst), len(src)) if that seems better.)
func MapInPlace[type D, S](dst []D, src []S, f func(S) D)

// FilterInPlace modifies s to contain only elements for which keep(e) is true.
// It returns the modified slice.
func FilterInPlace[T any](s []T, keep func(T) bool) []T

I wonder it's might worth to provide some interface that would return a lazily-evaluated value here? As the chaining of Map and Filter is very common in Java Stream API.
So I wonder if we gona need a separate CL for something like Promise[T any].

// Insert inserts v into s at index i, returning the modified slice.
// In the returned slice r, r[i] == v.  This panics if !(i >= 0 && i <= len(s)).
// This function is O(len(s)).
func Insert[T any](s []T, i int, v T) []T

// InsertSlice inserts si into s at index i, returning the modified slice.
// In the returned slice r, r[i] == si[0] (unless si is empty).
// This panics if !(i >= 0 && i <= len(s)).
func InsertSlice[T any](s, si []T, i int) []T

// Remove removes the element at index i from s, returning the modified slice.
// This panics if !(i >= 0 && i < len(s)).  This function is O(len(s)).
// This modifies the contents of the slice s; it does not create a new slice.
func Remove[T any](s []T, i int) []T

// RemoveSlice removes j-i elements from s starting at index i, returning the modified slice.
// This can be thought of as a reverse slice expression: it removes a subslice.
// This panics if i < 0 || j < i || j > len(s).
// This modifies the contents of the slice s; it does not create a new slice.
func RemoveSlice[T any](s []T, i, j int) []T

I don't agree with usage of panic here.
Perhaps returning an err or there should be an Optional type/struct that can be either a value or an error.
Might be worth implementing these in a separate CL instead of the initial CL.

@sanggonlee
Copy link

@sanggonlee sanggonlee commented May 5, 2021

These seem like they would make our lives slightly easier, but I thought one of the philosophies of Go was that operations not in O(1) runtime shouldn't hide their complexity? I thought that was the main reason there were no simple commonly used abstractions like map, filter, etc in Go.
Although I suppose the runtime of these functions are quite universally understood by most developers...

@bcmills
Copy link
Member

@bcmills bcmills commented May 5, 2021

One slice operation I've found myself writing frequently is “return the same slice with its cap reduced to length”.

#38081 was declined partially on the grounds that it could be written more clearly as a generic function (#38081 (comment)). I think such a function belongs in the slices package.

@bcmills
Copy link
Member

@bcmills bcmills commented May 5, 2021

I agree with @cespare that some of the functional APIs here seem premature, especially given the lack of a concise lambda.

I would add that functional languages tend to rely heavily on covariance, which Go lacks. That makes functions like Reduce and even Map much less useful than the corresponding functions in functional programming languages, especially when compared to the for loops we can already write. (I looked at Map in particular https://github.com/bcmills/go2go/blob/master/map.go2, but found it quite awkward compared to the typical functional map.)

On the other hand, Map and Reduce would be much more clearly useful if we had some way to express “assignable to” as a constraint, which I believe would be a coherent addition to the existing design. It's not obvious to me changing those functions to use such a constraint would be backward-compatible, so I think they should be omitted until we have more hands-on experience with generic functional programming in Go.

@empath-75
Copy link

@empath-75 empath-75 commented May 5, 2021

If you're going to do map/filter/reduce doesn't it make more sense to design a more generic interface first, and then implement it for slices? There are a lot more datastructures than slices that could benefit from such a thing.

@bcmills
Copy link
Member

@bcmills bcmills commented May 5, 2021

I agree with @sluongng that the behavior of Resize seems unclear. I also don't really understand its purpose — can't we already resize a slice using the built-in slice operator?

I think a Clone method that returns a copy of a slice (up to its length) without accepting a length parameter — analogous to the bytes.Clone proposed in #45038 — would be much clearer for decreasing a length.

For increasing a length, I wonder if it would be clearer to provide a function that increases the capacity instead, analogous to (*bytes.Buffer).Grow:

// Grow returns s with capacity at least c.
// If cap(s) >= c already, Grow returns s as-is.
func Grow[T any](s []T, c int) []T

Then, increasing the length of the slice is trivial:

	s := slices.Grow(s, n)[:n]

That might also address @mdempsky's use case in proposal #24204 (see #24204 (comment)), since “allocate a new []T with capacity at least c” could be written as:

	s := slices.Grow([]T{}, c)
@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented May 5, 2021

The code I was writing yesterday would have benefited from the existence of slices.Grow.

I am also strongly in favor of the Append idiom rather than InPlace. Append is already used throughout the standard library, whereas InPlace has no current uses.

@arroo
Copy link

@arroo arroo commented May 5, 2021

one thing I have found useful on more than one occasion from JS's Reduce implementation is including the index as an argument to the provided function.
so it would be:

func Reduce[T1, T2 any](s []T1, initializer T2, f func(T2, T1, int) T2) T2
@bcmills
Copy link
Member

@bcmills bcmills commented May 5, 2021

The Slice variant of Insert seems like too much duplication to me. We have only one append in the language, not separate append variants for one element and multiple elements — why should Insert be any different?

If we make Insert analogous to append, that gives the signature:

func Insert[T any](s []T, i int, v ...T) []T

which seems like it handily covers the use-cases for both the proposed Insert and InsertSlice.

@bcmills
Copy link
Member

@bcmills bcmills commented May 5, 2021

I think the signature for RemoveSlice may be surprising. I would intuitively expect it to accept a length instead of a second index.

That being the case, I think a function named RemoveN that explicitly accepts a length would be clearer — it's a bit more awkward for the “two indices” case, but it would be much more natural for many cases and also a lot less likely to be misread:

// RemoveN removes n elements from s starting at index i, returning the modified slice.
// This panics if i < 0 || n < 0 || i+n > len(s).
// This modifies the contents of the slice s; it does not create a new slice.
func RemoveN[T any][s []T, i, n int) []T

(The N suffix already has a precedent in the standard library in strings.SplitN.)

@ianlancetaylor
Copy link
Contributor Author

@ianlancetaylor ianlancetaylor commented Jun 8, 2021

I've made some updates to the proposal.

  • In Reduce moved initializer before the slice argument to match the order of the function arguments. (The order of the type parameters seems less important as they will normally be inferred.)
  • Added Uniq and UniqFunc.
  • Changed Map and Filter to append to a passed-in slice, subsuming MapInPlace and FilterInPlace which are now removed.
  • Added a question: should we replace Remove with RemoveSlice?
@cespare
Copy link
Contributor

@cespare cespare commented Jun 8, 2021

I think that Uniq and UniqFunc are a mistake.

People who are familiar with the Unix uniq command (or, my googling suggests, C++'s std::unique) may well expect the proposed behavior (deduplicating runs of equal elements).

I think that a whole lot of other people will be surprised by the behavior, because what "Uniq" sounds like is that it's going to return only the unique elements. That is what Ruby's uniq method does, too:

irb(main):001:0> [1, 3, 1].uniq
=> [1, 3]

I'm not suggesting that Uniq should do this; nothing in this package should use O(n) memory. But this is what people will expect. I anticipate that this function will be misused much like strings.Trim is today: lots of programmers will expect that it does something different from what it actually does; it may appear to "work" as they expect, at first; and then we get lots of confused questions.

I also think that "finding all the unique elements" is more often what people need. I rarely need the proposed behavior; the only times I can remember using functions like that are as an optimization for finding the unique elements of already-sorted data without building a map. However, I frequently need to get the unique elements of an unsorted slice; I imagine that if we add a generic sets package the way to do that will be something along the lines of

sets.FromSlice(s).Slice()

If we keep Uniq and UniqFunc, which I don't think we should for a v1 of a slices package, I would propose renaming them to be more specific; perhaps UniqRuns or UniqConsecutive or something along those lines.

@colin-sitehost
Copy link

@colin-sitehost colin-sitehost commented Jun 8, 2021

personally, I am fine with Uniq{,Func}; if they are performant, well documented, and useful, that should be fine. plus, there are already too many sharp edges in the stdlib to expect everything to "just work". there is an expectation that you read the docs: e.g. time.Time.Equal.

I also think that slices.Uniq(sort.Slice(s)) is not a bad ideom, esp since it reads like ... | sort | uniq in bash.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Jun 9, 2021

FWIW, I thought a lot about whether Uniq is a good idea or not when I was writing my stringutil package. I ended up deciding on having both Unique(dst, src []string) []string and UniqueVia(dst, src []string, m map[string]bool) []string. I am still not sure if that was the correct judgment or not for my needs, but I think it would be a mistake for the standard library to add a function with clear potential for beginner misuse.

@Merovius
Copy link

@Merovius Merovius commented Jun 9, 2021

I agree with @cespare that there is potential for confusion and would lean against it's inclusion in this form. I did need Uniq a couple of times though, so I think it would be useful to have in some form. Perhaps the potential for conclusion can be eliminated or reduced by naming or by the context it's placed in.

Is there ever a case where you want the constraint of Uniq to be comparable instead of constraint.Ordered? ISTM that to be useful, you need a guarantee that equal elements are consecutive and the way to get that guarantee is by sorting.

The reason I'm asking is because the one context where I had to write the code of Uniq is in the context of representing a set as an ordered slice. I hope we'll get a type/package for that at some point - and perhaps the place of Uniq is there, not the slices package, because it doesn't really operate on any slice, but only meaningfully on ordered slices?

@nightlyone
Copy link
Contributor

@nightlyone nightlyone commented Jun 9, 2021

What I don't understand: Why is an Equal and Compare for slices suddenly decidable? The current Go comparison operator doesn't support slices. What changed here and why not make the operator work instead for slices with comparable elements (and exclude slices of interfaces)?

@ianlancetaylor could you give some details in the proposal what changed here and why this trade-off (require a package vs. making == work for slices) has been chosen?

@HALtheWise
Copy link

@HALtheWise HALtheWise commented Jun 9, 2021

@Merovius
Copy link

@Merovius Merovius commented Jun 9, 2021

@HALtheWise It's possible to implement such a generalized Uniq in order-preserving way, in O(n) time and memory:

func Uniq[T comparable](s []T) []T {
	out := s[:0]
	found := make(map[T]bool)
	for _, v := range s {
		if found[v] {
			continue
		}
		found[v] = true
		out = append(out, v)
	}
	return out
}
@HALtheWise
Copy link

@HALtheWise HALtheWise commented Jun 9, 2021

Yes, I am well aware of that solution and have coded it several times (this why I expected it in the standard library). I was simply pointing out that the previously mentioned one-liner workarounds brought up in this thread (sets.FromSlice(input).Slice() or slices.Uniq(sort.Slice(input))) don't satisfy my typical use case.

@Merovius
Copy link

@Merovius Merovius commented Jun 9, 2021

I guess there is the possibility to have all three:

  • Add a separate hash set package, which can be used if you just want to treat them as an orderless bag-of-things
  • Add the proposed version of Uniq to the sort package (or the like), together with other algorithms to operate on sorted data (like binary search)
  • Add a generalized Deduplication-function here, similar to the one above.

Personally, I'd be a bit sad if we added the last one, but not the second one - so far, I've ~never needed to remove duplicates from an unordered slice, but I use ordered, deduplicated slices quite frequently as sets.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Jun 9, 2021

Personally, I'd be a bit sad if we added the last one, but not the second one - so far, I've ~never needed to remove duplicates from an unordered slice, but I use ordered, deduplicated slices quite frequently as sets.

Do you find that you ever need this for types besides string? I find myself needing to deduplicate strings quite often, but other types essentially never.

@Merovius
Copy link

@Merovius Merovius commented Jun 9, 2021

Do you find that you ever need this for types besides string? I find myself needing to deduplicate strings quite often, but other types essentially never.

I think it depends on what kind of code you write. None of the concrete cases I'm thinking of used strings.

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Jun 10, 2021

@Merovius

@HALtheWise It's possible to implement such a generalized Uniq in order-preserving way, in O(n) time and memory:

For the record, I think you'll find the time complexity of that is O(n * log(n)) because map access is O(log(n)).

@Merovius

This comment was marked as off-topic.

@rsc
Copy link
Contributor

@rsc rsc commented Jun 10, 2021

Let's not get hung up on the asymptotic complexity of map-based Uniq.
It remains a fact that non-map-based Uniq is going to be far faster regardless.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Jun 10, 2021

🚲 : How about the name Dedupe? To me, it implies removing successive duplicates better than Uniq does. To me, uniqueness is a global property, so my expectation is that it will create a set, like Ruby's uniq does. There's still a chance of misunderstanding Dedupe but less so.

@thomasf
Copy link

@thomasf thomasf commented Jun 10, 2021

@carlmjohnson to me Dedupe sound like deduplicate which literally means to remove duplicates. I don't see how it's better or worse than Uniq except that its worse only because it's longer and composed of two syllables. Without reading any documentation I would assume that both of them actually would all duplicate entries independent of order.

@carlmjohnson
Copy link
Contributor

@carlmjohnson carlmjohnson commented Jun 10, 2021

@carlmjohnson to me Dedupe sound like deduplicate which literally means to remove duplicates.

It removes successive duplicates, as opposed to all duplicates. This is different from "unique" in that there's no such phrase "successive uniques" in my idiolect of American English. "Uniqueness" is a global property. It's reason the internet grammar people put forth for disliking the phrase "very unique". As I wrote "There's still a chance of misunderstanding Dedupe but less so."

@Merovius
Copy link

@Merovius Merovius commented Jun 10, 2021

@carlmjohnson Dedupe doesn't mention successiveness though. That is, while you are right that "successive unique" as a combo makes little sense, that doesn't mean Dedupe implies (or even suggests) successive. I agree with @thomasf that Dedupe is no improvement.

FWIW, it can be argued that Uniq is a better name, because it's not Unique. That is, it intentionally deviates from the correct spelling in favor of the same spelling as the uniq command. While Unique is ambiguous, arguably, Uniq isn't.

But really, I agree that the name is too ambiguous and will be misunderstood and I don't think there really is a better one, aside from a monstrosity like RemoveSuccessiveDuplicates or something.

@AndrewLivingston
Copy link

@AndrewLivingston AndrewLivingston commented Jun 10, 2021

I wonder if the debate over dedupe vs. uniq and what the behavior should be is demonstrating that there is no such general, all-purpose function that deserves a general name that gives it pride of place?

TBH, I don't see the problem with something like RemoveSuccessiveDuplicates or UniqSuccessive. I'd much prefer that. Just because Unix tradition exists and the docs will exist doesn't mean we should force people to always go to the docs to remember what Uniq means in Go. Especially because other languages like Ruby have muddied the waters by using uniq to mean something else.

I do a lot of PHP, and plenty of docs exist. Even experienced developers spend a lot of time with them. It's not great.

@thomasf
Copy link

@thomasf thomasf commented Jun 10, 2021

I think the risk of people forgetting that Uniq does only works on consecutive items in Go is reasonably high. When I am implementing something like json apis its not likely that I would be testing for slices explicitly for deduplication correctness if there is some Uniq going in somewhere in the code.

I have certainly forgot that the coreutils uniq behaves like this a few times and created shell scripts with unintended consequences. This is very likely because a lot of libraries/languages have different opinions about what a uniq/unique function should do and some times I mix them up in my head.

Maybe it's ok if Uniq is slower and removes all duplicates or that we just have to bite the bullet and name it as something with an extra syllable like AdjacentUniq or whatever is needed to communicate the specifics of the operation a little bit closer.

I am not totally against the idea of having both functions either. When I do deduplication the most common scenario are pretty small slices where the performance difference would not be important. If you need more performance and you know more about the state of your list you can always use the AdjacentUniq and if you need even more performance you are probably going to use for loops do more than one thing in each loop iteration anyways.

@AndrewLivingston
Copy link

@AndrewLivingston AndrewLivingston commented Jun 10, 2021

I don't necessarily see why we can't have both functions. I often do the full deduplication variant on pretty small slices where the performance negligence of having a single method call in the stdlib would be ok. If you need more performance you can use the other one and if you need even more you probably going to use for loops anyway to do as much as possible in one loop.

I agree. I'd prefer some naming scheme that fully describes both and puts them next to each other, like UniqAdjacent and...well, the other one is hard. UniqFull ...is probably not it. But entirely avoiding having a function with just the name Uniq would be great.

@thomasf
Copy link

@thomasf thomasf commented Jun 10, 2021

If all is not considered to be the default there are a lot of functions you could suffix with All for no good reason. As long as the default short named functions behavior is the least likely one to cause bugs in a program from how it's named I am fine with it not being spelled out.

@AndrewLivingston
Copy link

@AndrewLivingston AndrewLivingston commented Jun 10, 2021

Yeah...it's just there's rarely ambiguity from the default name having a lot of baggage. Unique and UniqueAdjacent wouldn't bring baggage.

@rsc
Copy link
Contributor

@rsc rsc commented Jun 10, 2021

Given how much bikeshedding there is over Uniq/Unique/UniqAdjacent/etc, it does seem like we should avoid that word.
I'm inclined to just make up a good short name like we did for strings.Cut, and then it won't have any baggage. For example, Collapse or Compact. Rather than bikeshed that too, I'm happy to wait to see what Ian's next revision looks like.

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Jun 10, 2021

Squash ?

@AndrewLivingston
Copy link

@AndrewLivingston AndrewLivingston commented Jun 10, 2021

There are only two hard things in Computer Science: cache invalidation and naming things.

-- possibly Phil Karlton

@ianlancetaylor
Copy link
Contributor Author

@ianlancetaylor ianlancetaylor commented Jun 10, 2021

@nightlyone

What I don't understand: Why is an Equal and Compare for slices suddenly decidable? The current Go comparison operator doesn't support slices. What changed here and why not make the operator work instead for slices with comparable elements (and exclude slices of interfaces)?

@ianlancetaylor could you give some details in the proposal what changed here and why this trade-off (require a package vs. making == work for slices) has been chosen?

There are two possible definitions for slice equality: do the two slices refer to same underlying array, or do the two slices contain equal elements? Because of this ambiguity, it's not appropriate for the language to define == on slice values. Whichever choice we pick, some people will prefer the other choice.

The language does not provide any operations on slices that take time corresponding to the length of the slice (unless you count the builtin functions copy and append, but they look like function calls, not operations). That suggests that if we are forced to pick a definition for == for slice values, we should pick the one that tests whether the two slices refer to the same backing array.

Conversely, it's easy to check whether two slices refer to the same backing array by writing code like len(a) == len(b) && &a[0] == &b[0]. That suggests that if we are forced to pick a definition for == for slice values, we should pick the one that tests whether the slices contain equal elements.

When there is no obvious answer to a question, and there is no need to answer the question, better to not answer it. That avoids confusion.

None of these arguments apply to a function like slices.Equals.

@tmthrgd
Copy link
Contributor

@tmthrgd tmthrgd commented Jun 12, 2021

I'd find some sort of generic Join method very useful, like bytes.Join:

// Join concatenates the elements of s to create a new slice. The separator
// sep is placed between elements in the resulting slice.
func Join[T any](s [][]T, sep []T) []T

Though I don't know how useful a separator would be with non-byte slices, so perhaps just:

// Join concatenates the elements of s to create a new slice.
func Join[T any](s [][]T) []T

I can also have situations where using varidac arguments would be preferable, so again perhaps:

// Join concatenates the elements of s to create a new slice.
func Join[T any](s ...[]T) []T
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet