# Built-in Data Structures and Collections
<a href="https://colab.research.google.com/github/rambasnet/Python-Fundamentals/blob/master/notebooks/Ch09-2-Built-in-DataStructures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- all builtin functions are listed here with examples: https://docs.python.org/3/library/functions.html


## zip( )
- built-in zip class can help us quickly create list of tuples and then a dictionary

In [1]:
help(zip)

Help on class zip in module builtins:

class zip(object)
 |  zip(*iterables) --> A zip object yielding tuples until an input is exhausted.
 |  
 |     >>> list(zip('abcdefg', range(3), range(4)))
 |     [('a', 0, 0), ('b', 1, 1), ('c', 2, 2)]
 |  
 |  The zip object yields n-length tuples, where n is the number of iterables
 |  passed as positional arguments to zip().  The i-th element in every tuple
 |  comes from the i-th iterable argument to zip().  This continues until the
 |  shortest argument is exhausted.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and ret

In [7]:
zdata = zip([1, 2, 3], ('a', 'b', 'c'))

In [8]:
zdata

<zip at 0x7fc53addf6c0>

In [9]:
alist = list(zdata)

In [10]:
alist

[(1, 'a'), (2, 'b'), (3, 'c')]

In [11]:
# create dict
adict = dict(alist)
print(adict)

{1: 'a', 2: 'b', 3: 'c'}


## exercise
Create a dict that maps lowercase alphabets to integers, e.g., a maps to 1, b maps to 2, ..., z maps to 26 and print it

In [12]:
import string
lettersToDigits = dict(zip(string.ascii_lowercase, range(1, 27)))

In [13]:
print(lettersToDigits)

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}


## exercise
Create a dict that maps lowercase alphabets to their corresponding ASCII values , e.g., a maps to 97, b maps to 98, ..., z maps to 122 and print the dictionary in alphabetical order

In [14]:
import string
lettersToDigits = dict(zip(string.ascii_lowercase, range(ord('a'), ord('z')+1)))

In [15]:
print(lettersToDigits)

{'a': 97, 'b': 98, 'c': 99, 'd': 100, 'e': 101, 'f': 102, 'g': 103, 'h': 104, 'i': 105, 'j': 106, 'k': 107, 'l': 108, 'm': 109, 'n': 110, 'o': 111, 'p': 112, 'q': 113, 'r': 114, 's': 115, 't': 116, 'u': 117, 'v': 118, 'w': 119, 'x': 120, 'y': 121, 'z': 122}


In [4]:
# generate enemerator: list of index and corresponding value from the iterable
letters = enumerate(string.ascii_lowercase)

In [5]:
letters

<enumerate at 0x7ff00f6cc740>

In [6]:
list(letters)

[(0, 'a'),
 (1, 'b'),
 (2, 'c'),
 (3, 'd'),
 (4, 'e'),
 (5, 'f'),
 (6, 'g'),
 (7, 'h'),
 (8, 'i'),
 (9, 'j'),
 (10, 'k'),
 (11, 'l'),
 (12, 'm'),
 (13, 'n'),
 (14, 'o'),
 (15, 'p'),
 (16, 'q'),
 (17, 'r'),
 (18, 's'),
 (19, 't'),
 (20, 'u'),
 (21, 'v'),
 (22, 'w'),
 (23, 'x'),
 (24, 'y'),
 (25, 'z')]

In [2]:
# create a dict that maps 1..26 to A..Z
# use enumerate built-in function
import string
numToLetter = dict(enumerate(string.ascii_uppercase, start=1))

In [3]:
numToLetter

{1: 'A',
 2: 'B',
 3: 'C',
 4: 'D',
 5: 'E',
 6: 'F',
 7: 'G',
 8: 'H',
 9: 'I',
 10: 'J',
 11: 'K',
 12: 'L',
 13: 'M',
 14: 'N',
 15: 'O',
 16: 'P',
 17: 'Q',
 18: 'R',
 19: 'S',
 20: 'T',
 21: 'U',
 22: 'V',
 23: 'W',
 24: 'X',
 25: 'Y',
 26: 'Z'}

## Set Types - set, frozenset
- https://docs.python.org/3/library/stdtypes.html#set 
- as set object is an unordered collection of distinct hashable objects
- set is mutable
- frozenset is immutable

In [16]:
help(set)

Help on class set in module builtins:

class set(object)
 |  set() -> new empty set object
 |  set(iterable) -> new set object
 |  
 |  Build an unordered collection of unique elements.
 |  
 |  Methods defined here:
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iand__(self, value, /)
 |      Return self&=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __ior__(self, value, /)
 |      Return self|=value.
 |  
 |  __isub__(self, value, /)
 |      Return self-=value.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __ixor__(self, value, /)
 |      Re

In [17]:
# create aset from a list
aset = set([1, 2, 1, 3, 'hello', 'hi', 3])

In [18]:
# check the length of aset
len(aset)

5

In [19]:
print(aset)

{1, 2, 3, 'hello', 'hi'}


In [20]:
# membership test
'hi' in aset

True

In [21]:
'Hi' in aset

False

In [None]:
# see all the methods in set
help(set)

In [22]:
aset

{1, 2, 3, 'hello', 'hi'}

In [26]:
# add 100 again; no effect as 100 already is a member of aset
aset.add(100)

In [27]:
aset

{1, 100, 2, 3, 'hello', 'hi'}

In [28]:
bset = frozenset(aset)

In [29]:
bset

frozenset({1, 100, 2, 3, 'hello', 'hi'})

In [30]:
help(frozenset)

Help on class frozenset in module builtins:

class frozenset(object)
 |  frozenset() -> empty frozenset object
 |  frozenset(iterable) -> frozenset object
 |  
 |  Build an immutable unordered collection of unique elements.
 |  
 |  Methods defined here:
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
 |  __or__(self,

In [31]:
intersection = bset.intersection(aset)

In [32]:
intersection

frozenset({1, 100, 2, 3, 'hello', 'hi'})

In [33]:
cset = aset.copy()

In [34]:
cset.add(500)

In [35]:
print(cset.intersection(aset))

{1, 2, 3, 100, 'hello', 'hi'}


In [36]:
cset.union(aset)

{1, 100, 2, 3, 500, 'hello', 'hi'}

## Collections
https://docs.python.org/3/library/collections.html#module-collections

## deque
- list-like container with fast appends and pops on either end

In [37]:
from collections import deque

In [38]:
a = deque([10, 20, 30])

In [39]:
# add 1 to the right side of the queue
a.append(1)

In [40]:
a

deque([10, 20, 30, 1])

In [41]:
# add -1 to the left side of the queue
a.appendleft(-1)

In [42]:
a

deque([-1, 10, 20, 30, 1])

In [43]:
help(deque)

Help on class deque in module collections:

class deque(builtins.object)
 |  deque([iterable[, maxlen]]) --> deque object
 |  
 |  A list-like sequence optimized for data accesses near its endpoints.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __bool__(self, /)
 |      self != 0
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __copy__(...)
 |      Return a shallow copy of a deque.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, 

## defaultdict
- dict subclass that calls a factory function to supply missing values

In [44]:
from collections import defaultdict

In [45]:
dd = defaultdict(int) # uses 0 value to supply for missing key

In [46]:
dd

defaultdict(int, {})

In [51]:
# increment value of key 'a' by 1
dd['b'] += 1

In [52]:
dd

defaultdict(int, {'a': 2, 'b': 1})

## OrderedDict
- https://docs.python.org/3/library/collections.html#collections.OrderedDict
- dict subclass that remembers the order entries were added
- from Python 3.6 dict works as OrderedDict to some extent
- remembers the order the keys were last inserted
- if a new entry overwrites an existing entry, the original insertion position is changed and moved to the end
    - application in generating Most Recently Used (MRU) and LRU caches

- important method:

```python
popitem(last=True)
```
- returns and removes a (key, value) pair
- the pairs are returned in LIFO order if last is true or FIFO order if false.

## Counter
- one of the applications of dict is to keep count of certain keys (e.g., word histogram)
- can use Counter -- dict subclass for counting hashable objects
- in case of a tie, Counter remembers the order of the key

In [53]:
from collections import Counter

In [54]:
c = Counter('apple') # a new counter from an iterable

In [55]:
c

Counter({'a': 1, 'p': 2, 'l': 1, 'e': 1})

In [56]:
# counter from iterable
d = Counter(['apple', 'apple', 'ball'])

In [57]:
d

Counter({'apple': 2, 'ball': 1})

In [58]:
e = Counter({'apple': 10, 'ball': 20}) # counter from mapping

In [59]:
e

Counter({'apple': 10, 'ball': 20})

In [60]:
f = c+e

In [61]:
f

Counter({'a': 1, 'p': 2, 'l': 1, 'e': 1, 'apple': 10, 'ball': 20})

In [64]:
f = f+d

In [65]:
f

Counter({'a': 1, 'p': 2, 'l': 1, 'e': 1, 'apple': 14, 'ball': 22})

In [69]:
f.most_common(3)

[('ball', 22), ('apple', 14), ('p', 2)]

In [67]:
help(Counter)

Help on class Counter in module collections:

class Counter(builtins.dict)
 |  Counter(iterable=None, /, **kwds)
 |  
 |  Dict subclass for counting hashable items.  Sometimes called a bag
 |  or multiset.  Elements are stored as dictionary keys and their counts
 |  are stored as dictionary values.
 |  
 |  >>> c = Counter('abcdeabcdabcaba')  # count elements from a string
 |  
 |  >>> c.most_common(3)                # three most common elements
 |  [('a', 5), ('b', 4), ('c', 3)]
 |  >>> sorted(c)                       # list all unique elements
 |  ['a', 'b', 'c', 'd', 'e']
 |  >>> ''.join(sorted(c.elements()))   # list elements with repetitions
 |  'aaaaabbbbcccdde'
 |  >>> sum(c.values())                 # total of all counts
 |  15
 |  
 |  >>> c['a']                          # count of letter 'a'
 |  5
 |  >>> for elem in 'shazam':           # update counts from an iterable
 |  ...     c[elem] += 1                # by adding 1 to each element's count
 |  >>> c['a']                

## heapq
- min/max priority queue
- https://docs.python.org/3/library/heapq.html
- heaps are binary trees for which every parent node has a value less than or equal to any of its children
    - min priority queue
- for max priority queue, negate the values of of the keys in the priority queue
- use `[]` list to build heap one element at a time or use `heapify()` function to transform a list into the priority queue

In [70]:
import heapq

In [71]:
# build heap one element at a time
heap = []
for i in range(10, 0, -1):
    heapq.heappush(heap, i)

In [72]:
heap

[1, 2, 5, 4, 3, 9, 6, 10, 7, 8]

In [73]:
# pop the elements from the queue
while heap:
    print('priority:', heapq.heappop(heap))
# essentially is a heapsort with O(nlogn)

priority: 1
priority: 2
priority: 3
priority: 4
priority: 5
priority: 6
priority: 7
priority: 8
priority: 9
priority: 10


In [74]:
import random
# sample 10 randome integers between 1 and 50
alist = random.sample(range(1, 50), 10)

In [75]:
alist

[27, 47, 6, 11, 25, 22, 3, 29, 49, 20]

In [76]:
heapq.heapify(alist)

In [77]:
alist

[3, 11, 6, 29, 20, 22, 27, 47, 49, 25]

In [78]:
heapq.heappop(alist)

3

In [79]:
alist

[6, 11, 22, 29, 20, 25, 27, 47, 49]

In [80]:
# pop the elements from the queue
while alist:
    print('priority:', heapq.heappop(alist))
# essentially is a heapsort with O(nlogn)

priority: 6
priority: 11
priority: 20
priority: 22
priority: 25
priority: 27
priority: 29
priority: 47
priority: 49


In [82]:
somelist = [(4, 'read'), (1, 'write'), (3, 'delete')]

In [83]:
heapq.heapify(somelist)

In [84]:
somelist

[(1, 'write'), (4, 'read'), (3, 'delete')]

In [85]:
while somelist:
    print(heapq.heappop(somelist)[1])

write
delete
read


In [86]:
# maxheap example
jobs = [(-4, 'read'), (-1, 'write'), (-3, 'delete')]

In [87]:
heapq.heapify(jobs)

## Exercises

### Kattis problems
- Some kattis problems that can be solved using Python built-in data structures

1. sort - https://open.kattis.com/problems/sort
2. Trending Topic - https://open.kattis.com/problems/trendingtopic
3. FizzBuzz2 - https://open.kattis.com/problems/fizzbuzz2
4. CD - https://open.kattis.com/problems/cd 
    - Hint: implement set intersection of sorted list; don't use built-in set as it's slower for Python
5. Keyboardd - https://open.kattis.com/problems/keyboardd
    - Hint: two Counters; print the difference
6. Course Scheduling - https://open.kattis.com/problems/coursescheduling
    - Hint: Counter of courses, defaultdict(set) of courseToStudents
7. Train Boarding - https://open.kattis.com/problems/trainboarding
    - Hint: Counter or List 
8. Shopping List - https://open.kattis.com/problems/shoppinglist
    - Hint: Use set to keep track of intersection and sort the final list   
9. Knigs of the Forest - https://open.kattis.com/problems/knigsoftheforest
    - Hint: sort contestents based on year and use priority queue keeping K contestents per year and finding the winner
10. Seven Wonders - https://open.kattis.com/problems/sevenwonders
    - Hint: Counter
11. Select Group - https://open.kattis.com/problems/selectgroup
    - Stack for RPN parsing and Set
12. Zipf's Law - https://open.kattis.com/problems/zipfslaw
    - Use Counter to store frequency of each word
    - parse character by character and ignore words with length 1
    - words contain only alphabets; ignore case; multiple test cases in input
13. Jane Eyre - https://open.kattis.com/problems/janeeyre
    - Simulate using Priority Queue and a sorted list of gifts or two sorted lists of books less than Jane Eyre