___

<a href='https://www.udemy.com/user/joseportilla/'><img src='../Pierian_Data_Logo.png'/></a>
___
<center><em>Content Copyright by Pierian Data</em></center>

# Collections Module

The collections module is a built-in module that implements specialized container data types providing alternatives to Python’s general purpose built-in containers. We've already gone over the basics: dict, list, set, and tuple.

Now we'll learn about the alternatives that the collections module provides.

## Counter

*Counter* is a *dict* subclass which helps count hashable objects. Inside of it elements are stored as dictionary keys and the counts of the objects are stored as the value.
Without using the Counter class, we would have to manually maintain counters or create a dictionary of counters and loop through the entire iterable and increment the corresponding counters accordingly. The Counter class does that automatically for us.

Let's see how it can be used:

In [1]:
from collections import Counter #Note that the module name is 'collections' with a lower case 'c'

**Counter() with lists**

In [5]:
lst = [1,2,2,2,2,3,3,3,1,2,1,12,3,2,32,1,21,1,223,1]
print(type(Counter(lst))) # Note that the type if collections.Counter
Counter(lst) #We just pass the iterable to the Counter constructor. 
#It gives us a Counter object with count of each unique element in the iterable
#Note that the output looks very similar to a dictionary. Infact Counter is a subclass of Dictionary

<class 'collections.Counter'>


Counter({1: 6, 2: 6, 3: 4, 12: 1, 32: 1, 21: 1, 223: 1})

**Counter with strings**

In [3]:
Counter('aabsbsbsbhshhbbsbs')

Counter({'a': 2, 'b': 7, 's': 6, 'h': 3})

**Counter with words in a sentence**

In [6]:
s = 'How many times does each word show up in this sentence word times each each word'

words = s.split()

Counter(words)

Counter({'How': 1,
         'many': 1,
         'times': 2,
         'does': 1,
         'each': 3,
         'word': 3,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1})

In [11]:
# Methods with Counter()
c = Counter(words)
#help(c)
c.most_common(3) #Counter object has its own methods that we can invoke.
#The most_common takes the number of most common elemnts to display.
#Note that since its a subclass of Dictionary, it has attributes,methods of a dictionary

[('each', 3), ('word', 3), ('times', 2)]

## Common patterns when using the Counter() object

    sum(c.values())                 # total of all counts
    c.clear()                       # reset all counts
    list(c)                         # list unique elements
    set(c)                          # convert to a set
    dict(c)                         # convert to a regular dictionary
    c.items()                       # convert to a list of (elem, cnt) pairs
    Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
    c.most_common()[:-n-1:-1]       # n least common elements
    c += Counter()                  # remove zero and negative counts

## defaultdict

defaultdict is a dictionary-like object which provides all methods provided by a dictionary but takes a first argument (default_factory) as a default data type for the dictionary. Using defaultdict is faster than doing the same using dict.set_default method.

With defaultdict if we try to fetch value for a key that does not exist, it will assign a default value

**A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.**

In [13]:
from collections import defaultdict

In [14]:
d = {}

In [15]:
d['one'] 

KeyError: 'one'

In [18]:
d  = defaultdict(object) #Here we specify 'object' as the default value

In [10]:
d['one'] #Even though the key 'one' was not existing in our dictionary, it didnt fail. Instead it provided a default value

<object at 0x216de27bcf0>

In [11]:
for item in d: #Also note that it not only prevents keyError, it adds that key to the dictionary with the default value as the value
    print(item)

one


Can also initialize with default values:

In [19]:
d = defaultdict(lambda: 0) #instead of using 'object' as the default value , We can provide a default factory func like lambda

In [20]:
d['one']

0

# namedtuple
The standard tuple uses numerical indexes to access its members, for example:

In [14]:
t = (12,13,14)

In [15]:
t[0]

12

For simple use cases, this is usually enough. On the other hand, remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields and is constructed far from where it is used. **A namedtuple assigns names, as well as the numerical index, to each member.**

**Each kind of namedtuple is represented by its own class, created by using the namedtuple() factory function.** The arguments are the name of the new class and a string containing the names of the elements.

You can basically think of namedtuples as a very quick way of creating a new object/class type with some attribute fields.
For example:

In [22]:
from collections import namedtuple

In [23]:
Dog = namedtuple('Dog',['age','breed','name']) #The type name is 'Dog'. And this named tuple will have field names as age,breed,name

sam = Dog(age=2,breed='Lab',name='Sammy') #The type 'Dog' acts like its own class.

frank = Dog(age=2,breed='Shepard',name="Frankie")

We construct the namedtuple by first passing the object type name (Dog) and then passing a string with the variety of fields as a string with spaces between the field names. We can then call on the various attributes:

In [24]:
sam

Dog(age=2, breed='Lab', name='Sammy')

In [25]:
sam.age

2

In [26]:
sam.breed

'Lab'

In [27]:
sam[0] #note that we can still access using index position.

2

## Conclusion

Hopefully you now see how incredibly useful the collections module is in Python and it should be your go-to module for a variety of common tasks!