## Collections Module

The collections module is a built_in module that implements specialized container data types providing alternatives to Python's general purpose built-in containers.

## Counter

Counter is dict subclass which helps count hashable objects. Inside of it, elements are stored as dictionary keys and the counts of the objects are stored as the value.

In [2]:
from collections import Counter
# Counter is technically a dictionary subclass

In [3]:
# Counter() wint list
mylist = [1,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,5,5,5,5,5]
Counter(mylist)

Counter function counts the number of repeated values and make the dictionary { value : number of times it has been repeated}


In [26]:
mylist = ['a','a','a',1,1,1,1,1,2,2,2]
Counter(mylist)

Counter({'a': 3, 1: 5, 2: 3})

In [27]:
# Counter with words in a sentence

sentence = "How many times does each word show up in this sentence with a word"

a = sentence.split()

Counter(a)

Counter({'How': 1,
         'many': 1,
         'times': 1,
         'does': 1,
         'each': 1,
         'word': 2,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1,
         'with': 1,
         'a': 1})

In [14]:
Counter(sentence.lower().split())

Counter({'how': 1,
         'many': 1,
         'times': 1,
         'does': 1,
         'each': 1,
         'word': 2,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1,
         'with': 1,
         'a': 1})

In [15]:
# Counter with strings
letters = 'aaaaaabbbbccccccccccddddddeeeeeeee'

In [18]:
c = Counter(letters)
c

Counter({'a': 6, 'b': 4, 'c': 10, 'd': 6, 'e': 8})

In [20]:
# gives most common in ascending order

c.most_common(2)

[('c', 10), ('e', 8)]

In [24]:
#list unique elements

list(c)

['a', 'b', 'c', 'd', 'e']

In [25]:
list(c.values())

[6, 4, 10, 6, 8]

In [28]:
#total of all counts

sum(c.values())

34

In [36]:
# total of all counts

sum(Counter(letters).values())

34

In [37]:
# convert to a list of (elem,cnt) pair

c.items()

dict_items([('a', 6), ('b', 4), ('c', 10), ('d', 6), ('e', 8)])

In [39]:
#Convert to regular dictionary 

dict(c)

{'a': 6, 'b': 4, 'c': 10, 'd': 6, 'e': 8}

## defaultdict

defaultdict is a dictionary-like object which provides all methods provided by a dictionary but takes a first argument (default_factory) as a default data type for dictionary. Using defaultdict is faster than doing the same using dict.set_default method.

### A defaultdict will never raise  a KeyError. Any key that does not exist gets the value returned by the default factory.

In [40]:
from collections import defaultdict

In [41]:
d = {}

In [42]:
d['one']

KeyError: 'one'

In [43]:
d = defaultdict(object)

In [44]:
d['one']

<object at 0x22c153eb2c0>

In [45]:
d

defaultdict(object, {'one': <object at 0x22c153eb2c0>})

In [50]:
for i in d:
    print(i)

one


In [51]:
# Can also initialize
d = defaultdict(lambda: 0)

In [52]:
d['one']

0

## namedtuple

The standard tuple uses numerical indexes its members, for example

In [53]:
t = (12,13,14)

In [54]:
t[1]

13

For simple use cases, this is usually enough. On the other hand,remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields and is constructed far from where it is used. A namedtuple assigns names, as well as the numerical index, to each member.

Each kind of namedtuple is represented by its own class, created by using the namedtuple() factory function. The arguments are the name of the new class and a string containing the names of the elements.

You can basically think of namedtuples as a very quick way of creating a new object/class type with some attribute fields. For example:

In [55]:
from collections import namedtuple

In [56]:
Dog = namedtuple('Dog', ['age','breed','name'])

sam = Dog(age=2,name='Sammy',breed='Lab')

frank = Dog(age=3,breed='Shepard',name='Frankie')

We construct the namedtuple by first passing the object type name (Dog) and then passing a string with the variety of fields as a string with spaces between the field names. We can then call on the various attributes:

In [57]:
sam

Dog(age=2, breed='Lab', name='Sammy')

In [58]:
sam.age

2

In [59]:
sam[0]

2