-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add collections.counts() #44815
Comments
As suggested in http://mail.python.org/pipermail/python-list/2007-April/433986.html this is a patch to add a counts() function to the collections module. Usage looks like: >>> items = 'acabbacba'
>>> item_counts = counts(items)
>>> for item in 'abcd':
... print item, item_counts[item]
...
a 4
b 3
c 2
d 0 Yes, it's only a 4-line function, but it's a frequently re-written 4-line function. |
Does it have to be a defaultdict? I.e. is it important that item_counts['d'] not raise KeyError? |
I think it's okay if it's not a defaultdict. That was the simplest implementation, but I certainly have no problem calling d.get() when necessary. Should I change the implementation to use a dict()? |
I guess it's simplicity of implementation vs. simplicity of use. And I'm not even sure which is easier to use. It's just that defaultdicts are a very new thing and still feel "weird" -- even though I pushed for the implementation based on popular demand I'm not a user myself. Perhaps ask around on python-dev? |
A summary of the python-dev thread (http://mail.python.org/pipermail/python-dev/2007-April/072502.html) Since the number of times an unseen item was seen is 0, most people felt returning 0 was more natural behavior than raising KeyError. There was some discussion of alternate names, but most people were fine with counts(). Raymond suggested making it a classmethod of dict, but people were a little concerned about adding to dict's already complex API, and since the result of counts() needed to return 0s instead of raising KeyErrors, it wouldn't really have the same behavior as a plain dict() anyway. |
Attaching an updated patch for Py2.7.
Working on docs and unittests. Nice example (most common words in a text file): >>> import re
>>> words = re.findall('\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)] |
Added a few more in-module tests. |
Some comments:
|
The counts/counter moniker emerged from the python-dev discussion and To me, MultiSet or CountingSet is too offputtingly computer-sciency and As noted previously, standalone unittests are forthcoming (and a doc Thanks for looking at the initial patch. |
Georg, could you give this a once over before I commit? Thanks. |
Yes, I'll have a look this evening. |
Attaching an update with improved docs. Thanks for looking at this. |
Isn't collections.defaultdict(lambda:0) enough for this purpose? |
The whole point was to have a function (or class) that accumulates a >>> d = collections.defaultdict(lambda: 0)
>>> d.update('aaabbac')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: dictionary update sequence element #0 has length 1; 2 is
required The feature request here was mainly a request to provide an abbreviation |
In counter6.diff line 56 "Assigning a count of zero or reducing the count to the zero leaves the" suggest s/the zero/zero/ |
Attaching new patch with small changes:
Questions:
|
Thanks for the review comments. Incorporated all suggested changes and Decided to leave __repr__() with a sort. Though it's not strictly |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: