# Collections adds fuctionality to the basic Python ecosysetem
The base data types in Python are quite powerful but there are times when you want to do something and you need a little more out of the base data types. Collections adds that functionality. Technically the collection data types are a different data type, but often you will use them the same. Only the initilization changes.

First notice we are importing the fuctions from the "collections" class. We need to do this to gain access to the functions. They are not part of the default name space.

In [1]:
from collections import OrderedDict, defaultdict, deque, Counter, namedtuple
from datetime import datetime, timedelta

## Ordered Dictionary
An ordered dictionary will keep the order of insertion when called in a loop. A regular dictionary may or may not follow the order of insertion. The results below will most likely match but using an ordered dictionary will ensure the order is preserved.

In [2]:
norm_dict = dict()
norm_dict["a"] = 1
norm_dict["b"] = 2
norm_dict["c"] = 3
norm_dict["d"] = 4

ord_dict = OrderedDict()
ord_dict["one"] = 1
ord_dict["two"] = 2
ord_dict["three"] = 3
ord_dict["four"] = 4

for key, value in norm_dict.items():
    print(key, value)

for key, value in ord_dict.items():
    print(key, value)

a 1
b 2
c 3
d 4
one 1
two 2
three 3
four 4


## Default Dictionary
A default dictionary will allow setting a default value for any new entry. This includes when the key does not exist. So when the dictionary is called without a value it is added to the dictionary with the default value as the value.

Create the default dictionary and use lambda to set the default value to use if created with no value.

In [3]:
ice_cream = defaultdict(lambda: "Vanilla")
ice_cream["Sarah"] = "Chunky Monkey"
ice_cream["Abdul"] = "Butter Pecan"

print("ice_cream.keys():", ice_cream.keys())
print('ice_cream["Sarah"]:', ice_cream["Sarah"])
print('ice_cream["Joe"]:', ice_cream["Joe"])

ice_cream.keys(): dict_keys(['Sarah', 'Abdul'])
ice_cream["Sarah"]: Chunky Monkey
ice_cream["Joe"]: Vanilla


There is a way to use a default value with a regular dictionary, but it will require calling with the default value.

In [4]:
default_value = "Vanilla"
ice_cream = dict()
ice_cream["Sarah"] = "Chunky Monkey"
ice_cream["Abdul"] = "Butter Pecan"

print("ice_cream.keys():", ice_cream.keys())
print('ice_cream["Sarah"]:', ice_cream.get("Sarah", default_value))
print('ice_cream["Joe"]:', ice_cream.get("Joe", default_value))

ice_cream.keys(): dict_keys(['Sarah', 'Abdul'])
ice_cream["Sarah"]: Chunky Monkey
ice_cream["Joe"]: Vanilla


## Dictionary as a counter
Using a default dictionary with the int as the default will start at 0. Using increment notation will add a number to the default of 0 or the value if one is set. This allows using a dictionary as a counter of items

In [5]:
food_list = "spam spam spam spam spam spam eggs spam".split()
food_count = defaultdict(int)  # default value of int is 0
print('food_count:', food_count)
for food in food_list:
    food_count[food] += 1  # increment element's value by 1

print(food_count)
print('spam:', food_count['spam'])
print('eggs:', food_count['eggs'])

food_count: defaultdict(<class 'int'>, {})
defaultdict(<class 'int'>, {'spam': 7, 'eggs': 1})
spam: 7
eggs: 1


## Speed gains with lists
deques are much faster at prepending than lists. They also allow faster removal of first element. They are helpful when adding and removing to both front and end of a list of items.

In [6]:
num = 100000
list_a = list()  # Initialize an empty list
deque_b = deque()  # Initialize an empty deque

# Create a function to print the time for execution
def print_time(type, loctation, num, start_datetime):
    diff = datetime.utcnow() - start_datetime
    print(f"{type} elapsed time for {loctation} {num} values: {diff}\n")
    
start_datetime = datetime.utcnow()
for ii in range(num):
    list_a.insert(0, ii)
print_time('list', 'prepending', num, datetime.utcnow())

start_datetime = datetime.utcnow()
for ii in range(num):
    deque_b.appendleft(ii)
print_time('deque', 'prepending',num, start_datetime)

start_datetime = datetime.utcnow()
while len(list_a) > 0:
    list_a.pop(0)
print_time('list', 'apending', num, start_datetime)

start_datetime = datetime.utcnow()
while len(deque_b) > 0:
    deque_b.popleft()
print_time('deque', 'apending', num, start_datetime)

list elapsed time for prepending 100000 values: 0:00:00.000007

deque elapsed time for prepending 100000 values: 0:00:00.004772

list elapsed time for apending 100000 values: 0:00:00.774992

deque elapsed time for apending 100000 values: 0:00:00.006743



## Named Tuple
A tuple stores information by index number. A named tuple stores information by a name for each piece of information. Main reason for using a named tuple over a dictionary is that tuple is imutable.

In [7]:
# Create the new named tuple with the named keys.
Person = namedtuple("Person", "name age gender")

bob = Person(name="Bob", age=30, gender="male")
print("\nRepresentation:", bob)

jane = Person(name="Jane", age=29, gender="female")
print("\nField by name:", jane.name)

print("\nFields by name:")
for p in [bob, jane]:
    print("{} is a {} year old {}".format(p.name, p.age, p.gender))


Representation: Person(name='Bob', age=30, gender='male')

Field by name: Jane

Fields by name:
Bob is a 30 year old male
Jane is a 29 year old female


## Counting
A counter is a container that keeps track of how many times values are added. The standard example is with letters.

There are three ways to initialize a counter object:

In [8]:
list_counter = Counter(['a', 'b', 'c', 'a', 'b', 'b'])
dict_counter = Counter({'a':2, 'b':3, 'c':1})
keyword_couner = Counter(a=2, b=3, c=1)

We can create an empty counter and then update with a second call. This looks a little strange with the input being a single string, but the update expects a list object. If a string is entered it will automatically convert to a list. When Python converts a single sting to a list all the charactrs are separated into individual elements. Go ahead and try print(list('abcabb')).

In [9]:
my_first_counter = Counter()
my_first_counter.update('abcabb')

for counter_element in [list_counter, dict_counter, keyword_couner, my_first_counter]:
    print(counter_element)

Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})


As new data is added the couner is updated.

In [10]:
my_first_counter.update(['a', 'd'])
print('\nUpdated my_first_counter:', my_first_counter)


Updated my_first_counter: Counter({'a': 3, 'b': 3, 'c': 1, 'd': 1})


## Counters with words
But we can also use the counter on full words.

Start off with a sentence.

In [11]:
my_str = "This this this is really really good."

Next split the single string of a sentence into a list of words by splitting at white spaces. The .split() method assumes whitespace if no delimiter is provided.

In [12]:
my_list = my_str.split()

Now use the list of words as input into the Counter. It will return a dictionary with the words as keys and values as number of times that word is counted. Notice how the count is case sensitive.

In [13]:
my_dict = Counter(my_list)
print('myDict:', my_dict)

myDict: Counter({'this': 2, 'really': 2, 'This': 1, 'is': 1, 'good.': 1})


To remove case sensitive counter just lower the calse of all characters befor entering into the Counter()

In [14]:
my_dict = [ii.lower() for ii in my_list]
my_dict = Counter(my_dict)
print('my_dict:', my_dict)

my_dict: Counter({'this': 3, 'really': 2, 'is': 1, 'good.': 1})


## Counter with other objects
A counter will also work with numbers other objects.

In [15]:
my_numbers = [0, 1, 2, 3, 4, 4, 4, 55, 67, 67, 10242425242552]
number_dict = Counter(my_numbers)
print('number_dict:', number_dict)

number_dict: Counter({4: 3, 67: 2, 0: 1, 1: 1, 2: 1, 3: 1, 55: 1, 10242425242552: 1})


To just see the n number of most common use the .most_common() method. Notice how the type is different. It is no longer a dictionary but a list of tuples. This is important when you want to extract the values and use them. They will be extracted with different syntax.

In [16]:
print('number_dict.most_common(2):', number_dict.most_common(2))

number_dict.most_common(2): [(4, 3), (67, 2)]


Convert the most comon results to a regular dictionary

In [17]:
number_dict_most_common_as_dict = dict(number_dict.most_common(2))
print('number_dict_most_common_as_dict:', number_dict_most_common_as_dict)

number_dict_most_common_as_dict: {4: 3, 67: 2}


Here is are some of the more common patterns used with Counters()

In [18]:
total = sum(number_dict.values())  # total of all counts
print('total:', total)

total: 11


Get unique elements or "keys" if that is easier to understand

In [19]:
number_dict_list = list(number_dict)
print('number_dict_list:', number_dict_list)

number_dict_list: [0, 1, 2, 3, 4, 55, 67, 10242425242552]


Reset all counts, or essentially just make number_dict set to None

Counters have the ability to perform mathematical operations from within the Counter object.

In [20]:
c = Counter(Tom=3, Jerry=1)
d = Counter(Tom=1, Jerry=2, Scooby=4)

c_plud_d = c + d  # add two counters together:  c[x] + d[x]
c_minus_d = c - d  # subtract (keeping only positive counts)
c_intersect_d = c & d  # intersection:  min(c[x], d[x]) 
c_union_d = c | d  # union:  max(c[x], d[x])

print('c_plud_d:', c_plud_d)
print('c_minus_d:', c_minus_d)
print('c_intersect_d:', c_intersect_d)
print('c_union_d:', c_union_d)

c_plud_d: Counter({'Tom': 4, 'Scooby': 4, 'Jerry': 3})
c_minus_d: Counter({'Tom': 2})
c_intersect_d: Counter({'Tom': 1, 'Jerry': 1})
c_union_d: Counter({'Scooby': 4, 'Tom': 3, 'Jerry': 2})
