You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project . I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows:
Performance was horrible. After some digging, I found out that Counter  does not have __iadd__ and += copies the entire left-hand side in __add__. I've attached a patch that fixes the issue (for += only, and I've not patched the testsuite.)
If this is not implemented because it is backwards incompat, then it might be useful to add a note to update's docstring explaining that it is much more efficient than +=. I was very surprised that it took *minutes* to add a few thousand moderate-sized Counters.
I'll add the in-place methods including __iadd__, __isub__, __iand__, and __ior__.
If speed is your issue, you should continue to use the update() method which will always be faster because it doesn't have a step to strip zeros and negative values from the existing Counter. Also, update() has a fast-path written in C.