# Collections module in Python

## Counter

Counter is a dictionary subclass which helps count the objects that are hashable. Elements inside of it are stored as dictionary keys and the counts of the objects are stored as the value.

So let's import it first.

In [1]:
from collections import Counter

Now we will create a list with multiple numeric elements and count the elements. Same will be done with a string.

In [2]:
list1 = [11,11,22,33,44,55,44,44,33,33,77,66]

In [3]:
Counter(list1)

Counter({11: 2, 22: 1, 33: 3, 44: 3, 55: 1, 77: 1, 66: 1})

In [4]:
str1 = "aaabbbeeedddihhh"

In [6]:
Counter(str1)

Counter({'a': 3, 'b': 3, 'e': 3, 'd': 3, 'i': 1, 'h': 3})

Now we will use counter to count the words in a sentence.

In [11]:
str2 = 'Python is fun fun, Python is cool, it is really fun'

In [12]:
words = str2.split()
Counter(words)

Counter({'Python': 2,
         'is': 3,
         'fun': 2,
         'fun,': 1,
         'cool,': 1,
         'it': 1,
         'really': 1})

Now let us look at some methods of counter.

In [13]:
c = Counter(words)

To get the most common words we do the following. In the brackets we can mention how many common words we need.

In [14]:
c.most_common(3)

[('is', 3), ('Python', 2), ('fun', 2)]

To get the total of all counts.

In [15]:
sum(c.values())

11

To list unique elements.

In [16]:
list(c)

['Python', 'is', 'fun', 'fun,', 'cool,', 'it', 'really']

To convert to set, dictionary, list of (element, count) pairs.

In [17]:
set(c)

{'Python', 'cool,', 'fun', 'fun,', 'is', 'it', 'really'}

In [18]:
dict(c)

{'Python': 2, 'is': 3, 'fun': 2, 'fun,': 1, 'cool,': 1, 'it': 1, 'really': 1}

In [20]:
list_of_pairs = c.items()

In [21]:
list_of_pairs

dict_items([('Python', 2), ('is', 3), ('fun', 2), ('fun,', 1), ('cool,', 1), ('it', 1), ('really', 1)])

To convert from a list of (element, count) pairs

In [22]:
Counter(dict(list_of_pairs))

Counter({'Python': 2,
         'is': 3,
         'fun': 2,
         'fun,': 1,
         'cool,': 1,
         'it': 1,
         'really': 1})

To get n least common elements. Consider n=4

In [24]:
c.most_common()[:-4-1:-1]

[('really', 1), ('it', 1), ('cool,', 1), ('fun,', 1)]

To remove zero and negative counts

In [25]:
c += Counter()

To reset all counts.

In [27]:
c.clear()

In [28]:
c

Counter()

## defaultdict

It is a dictionary like object which provides all methods provided by dictionary but takes first argument (default_factory) as default data type for the dictionary. The speciality of this is that it is faster than doing dict.set_default method. Additionally a defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.

In [29]:
from collections import defaultdict

Let's first create a ordinary dictionary and ask for a key which is not in the dictionary. You will observe that it will raise a key error.

In [36]:
dict1 = {'key1':11}

In [37]:
dict1

{'key1': 11}

In [38]:
dict1['key1']

11

In [39]:
dict1['key2']

KeyError: 'key2'

Now let's create a defaultdict and do the same.

In [40]:
dict2 = defaultdict(object)

In [41]:
dict2['key1']

<object at 0x18ec393dff0>

So as you can see there were no keys in the dictionary but when we called for one, it did not raise an error. Infact it returned that it is an object along with the address. It is an object since it is what we have mentioned. Usually this is used with lambda functions too where we mention the default value that should be assigned to the key.

In [42]:
dict3 = defaultdict(lambda:0)

In [43]:
dict3['key1']

0

In [44]:
dict3['key2'] = 1

In [46]:
dict3

defaultdict(<function __main__.<lambda>()>, {'key1': 0, 'key2': 1})

## OrderedDict

An OrderedDict is a dictionary subclass that remembers the order in which its content are added.

In [74]:
from collections import OrderedDict

In [76]:
dict4 = OrderedDict()

dict4['apple'] = 1
dict4['mango'] = 5
dict4['grapes'] = 3
dict4['guava'] = 2
dict4['pear'] = 5

In [77]:
dict4

OrderedDict([('apple', 1),
             ('mango', 5),
             ('grapes', 3),
             ('guava', 2),
             ('pear', 5)])

In [78]:
for key,value in dict4.items():
    print (key, value)

apple 1
mango 5
grapes 3
guava 2
pear 5


So as you can see, the way the pairs were added in the same way they are ordered. Now consider 2 dictonaries.

In [80]:
d1 = OrderedDict()

d1['key1'] = 1
d1['key2'] = 2

In [81]:
d2 = OrderedDict()

d2['key2'] = 2
d2['key1'] = 1

In [83]:
print(d1 == d2)

False


Even though the elements in both the dictionaries are same, it returns false because they were added in a different order. Now if we consider normal dictionary, it will return True.

In [84]:
d1 = {}

d1['key1'] = 1
d1['key2'] = 2

In [85]:
d2 = {}

d2['key2'] = 2
d2['key1'] = 1

In [86]:
print(d1 == d2)

True


## namedtuple

The standard tuple uses numerical indexes to grab any element, as shown below.

In [87]:
t1 = (11,22,33)

In [88]:
t1[1]

22

However, remembering which index to use is hard in case the tuple is really huge. A namedtuple assigns names, as well as the numerical index, to each member.

Each kind of namedtuple is represented by its own class, created by using the namedtuple() factory function. 

namedtuples can be thought of a very quick way of creating a new object/class type with some attribute fields.

In [89]:
from collections import namedtuple

The arguments are the name of the new class and a string containing the names of the elements/attributes.

In [91]:
Fruit = namedtuple('Fruit', 'color seed')

In [92]:
apple = Fruit(color='red', seed='little')

In [93]:
apple

Fruit(color='red', seed='little')

So now the class is made and you can use it as usual. Along with calling a member by using indexing you can call the attributes too.

In [94]:
apple.color

'red'

In [96]:
apple[1]

'little'