Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: slices: function for count specific value on slice #62301

Open
dozheiny opened this issue Aug 26, 2023 · 22 comments
Open

proposal: slices: function for count specific value on slice #62301

dozheiny opened this issue Aug 26, 2023 · 22 comments
Labels
Milestone

Comments

@dozheiny
Copy link

The slices package has no functions for counting a specific value for now.
I think it's good the definition of function is like this:

func Counts[S []E, E comparable](s S, e E) int

An example of the Counts function is like this:

names := []string{"Alex", "Gopher", "Bob", "Alice", "Gopher"}
slices.Counts(names, "Gopher") // 2
slices.Counts(names, "Alex") // 1
slices.Counts(names, "James") // 0
@seankhliao
Copy link
Member

seankhliao commented Aug 26, 2023

how common is this operation that it needs to be in the standard library?
https://go.dev/doc/faq#x_in_std

this would just be reduce in #61898

@seankhliao seankhliao changed the title slices: function for count specific value on slice proposal: slices: function for count specific value on slice Aug 26, 2023
@gopherbot gopherbot added this to the Proposal milestone Aug 26, 2023
@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 26, 2023
@jimmyfrasche
Copy link
Member

It's possible to define this in term of slices.Index: https://go.dev/play/p/H9HZqXlyI2e

@dozheiny
Copy link
Author

how common is this operation that it needs to be in the standard library? https://go.dev/doc/faq#x_in_std

this would just be reduce in #61898

I think it's widespread.
In multiple base codes, I want to count a specific value in slices and write a function that counts for me.

@dozheiny
Copy link
Author

It's possible to define this in term of slices.Index: https://go.dev/play/p/H9HZqXlyI2e

Also, It's possible to implement the CountsFunc function. https://go.dev/play/p/NunxDFQh8pZ?v=gotip

@jimmyfrasche
Copy link
Member

it's simpler to just define it directly, too: https://go.dev/play/p/6kCIzk-yTgl

how does this come up in your code bases? I've certainly needed to know the count of things before but I usually use a multiset as a histogram because when I need the count of one thing I usually need the count of several others too.

@dozheiny
Copy link
Author

it's simpler to just define it directly, too: https://go.dev/play/p/6kCIzk-yTgl

how does this come up in your code bases? I've certainly needed to know the count of things before but I usually use a multiset as a histogram because when I need the count of one thing I usually need the count of several others too.

Typically, I require the CountsFunc in codebases. Within my codebases, I often deal with large slices containing numerous objects and fields that require counting for specific objects within that slice. This allows me to make comparisons among these objects.

@jimmyfrasche
Copy link
Member

Here's what I usually do: https://go.dev/play/p/PZoRXcl8qc3 that lets me build the counts up for the whole thing and then I query the map. This is better than doing repeated O(n) queries on the slice if you need to run multiple queries. If you just want to run one query the search is faster and it needs to get rebuilt if the slice changes. I'm usually just doing this for one off analysis or a one time ETL so I usually have all the slices "done" before I need to check the counts and I tend to need to check the counts of multiple items.

@dozheiny
Copy link
Author

dozheiny commented Aug 26, 2023

Here's what I usually do: https://go.dev/play/p/PZoRXcl8qc3 that lets me build the counts up for the whole thing and then I query the map. This is better than doing repeated O(n) queries on the slice if you need to run multiple queries. If you just want to run one query the search is faster and it needs to get rebuilt if the slice changes. I'm usually just doing this for one off analysis or a one time ETL so I usually have all the slices "done" before I need to check the counts and I tend to need to check the counts of multiple items.

This implementation is better than mine,
It's better than returns counts of one specific value.

But what about struct slices? Typically, you would require the CountsFunc function for struct slices. However, CountsFunc only returns counts of a specific value.

@seankhliao seankhliao removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 26, 2023
@jimmyfrasche
Copy link
Member

Have a func from your value to something comparable. When you build the map use

h[key(v)] += 1

and check the map using h[key(v)]

Or use a different data structure.

@AndrewHarrisSPU
Copy link

But what about struct slices? Typically, you would require the CountsFunc function for struct slices. However, CountsFunc only returns counts of a specific value.

As noted earlier, the Reduce in xiter is enough, maybe a bit dense:

h := make(map[string]int)
Reduce(h, func(h map[string]int, s string) map[string]int { h[s]++; return h }, seq)

Less generally:

// could easily use `vs Seq[V]` instead

func Counts[V comparable](vs []V) map[V]int {
	return CountsFunc(vs, func(v V) V { return v })
}

func CountsFunc[V any, K comparable](vs []V, key func(V) K) map[K]int {
	h := make(map[K]int)
	for _, v := range vs {
		h[key(v)]++
	}
	return h
}

Counting like this is a no-brainer in e.g. a data science library or language, but strategies for optimal counting diverge ... no opinion on whether or where this fits culturally in Go.

@ianlancetaylor
Copy link
Contributor

C++ has std::count and std::count_if that do this operation on iterators. Java has java.utils.Collections.frequency that does this operation on a collection. As far as I know C++ std::vector and Java Vector do not support this operation directly. I think for Go we should hold off on adding this to the slices package, and instead consider whether to add it to the iters package if we adopt #61898.

@earthboundkid
Copy link
Contributor

Python has collections.Counter which consumes an iterator to make a histogram map: https://docs.python.org/3/library/collections.html#collections.Counter

@dozheiny
Copy link
Author

@ianlancetaylor So, will this function be added to the slices package or x/exp/xiter?

@ianlancetaylor
Copy link
Contributor

@dozheiny This is a proposal for adding new API. No decision has been made. I'm saying that I think it would be preferable to add this to x/exp/iter rather than slices. If anybody wants to make an argument that slices, or perhaps both, is better, this is the place to do it.

@DeedleFake
Copy link

I think this definitely makes way more sense as a function for iterators, not slices. It's a function that inherently needs to be implemented via iteration over a slice anyways, so by implementing it for iterators it'll be exactly the same but not limited to just slices.

@earthboundkid
Copy link
Contributor

earthboundkid commented Aug 27, 2023

For what it’s worth, I feel like the simple case of just counting how many times X appears in a slice or iterator is better of just being done in a loop. I have used languages that have a lot of fancy functions to do this stuff for you, and the end result is you spend more time looking up the documentation for the iterator library than it would take to just write the loop. It creates a second way to do things that only saves a trivial amount of typing but takes up a certain amount of cognitive load.

Obviously these things are a spectrum. I’m happy to have slices.Clone even though it’s trivial because it’s shorter and more clear for a very common operation. As the operations become less common though, the value of having them in a library drops off pretty rapidly.

@AndrewHarrisSPU
Copy link

A quirky argument for Count in xiter, precisely using map[K]int: the iteration order of built-in maps is guaranteed to shuffle. I believe Count would be in many cases naive for performance, and using a map would often be part of the reason. But there is utility in testing something fancier against a simply and correctly written solution, and the shuffling is a great property here.

@PierreTurnbull
Copy link

I don't get why there's a strings.Count but no slices.Count.

@earthboundkid
Copy link
Contributor

earthboundkid commented Jun 27, 2024

@PierreTurnbull That's a fair question. My 2¢ is that strings.Count is doing something different than slices.Count would do.

From the strings.Count docs:

Count counts the number of non-overlapping instances of substr in s.

Counting non-overlapping instances is slightly tricky. Implementing count of cond(item) in items is not tricky. strings.Count also uses an assembly language version of count for one byte needles, which is also tricky.

I think it would be better to give up on this proposal instead propose that a container/histogram package be with a new type similar to Python's collection.Counter class because efficiently doing a count of everything in a sequence is a little tricky.

@ianlancetaylor
Copy link
Contributor

We can implement slices.Count the same way as bytes.Count. But that should be a separate proposal, not this one.

@PierreTurnbull
Copy link

PierreTurnbull commented Jun 27, 2024

@earthboundkid I think trivial features like counting elements should be implemented in the simplest way. The developer that uses/learns the language should not be required to know the technical details of how it works under the hood in order to understand the language. It should be possible to count elements in any kind of iterable data structure in the simplest way, following a standard convention (eg : strings.Count, slices.Count, map.Count, anyiteratoryoucanimagine.Count). Things should be simple, especially when one the main motive of the language is simplicity.

As a beginner in Go, I might be missing some intellectual keys to understanding the way things are done so please excuse me if I'm missing an obvious point. This thing and few others leave me quite confused and skeptic concerning the quality of the language and its standard libraries, especially in terms of DX. This first impression is in complete opposition with the positive opinions I heard about this language, which doubles my confusion.

@ianlancetaylor
Copy link
Contributor

@PierreTurnbull As you point out, there is an existing strings.Count function. However, it does not do what this proposal suggests. So while I agree that we should have slices.Count that acts like strings.Count and bytes.Count, this proposal is asking for something different: the ability to count the number of times a specific element occurs in a slice. We should not have a slices.Count that is different from the existing strings.Count and bytes.Count.

Further, the desire to count the number of times an element occurs in a slice would, I think, be better satisfied by adding a function to the iter package or, more likely to start, the x/exp/iter package. That would correct to the C++ std::count function. It would support, as you suggest, counting not just elements in a slice but also in a map, another container, and so on.

Iterators, and for that matter the slices package, are new to Go, and we are working through the details of how to best handle them. We move carefully because we are very strict about backward compatibility, so once we have implemented something we can never change it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

9 participants