# Python 3: Built-in Data Structures

## List Comprehensions

Filtering with list comprehensions:

In [1]:
numbers = [1,2,3,4,5,6,7,8,9,10]
even = [number for number in numbers if number %2 ==0] # Filtering even numbers
even_filter = list(filter(lambda n : not n%2, numbers))
print(even)
print(even_filter)

[2, 4, 6, 8, 10]
[2, 4, 6, 8, 10]


Creating tuples with list comprehensions:

In [2]:
even = [2,4,6,8,10]
odd = [1,3,5,7,9]

# Pairing every number from even with every number from odd
pair = [(number1,number2) for number1 in even for number2 in odd]

print(pair)

[(2, 1), (2, 3), (2, 5), (2, 7), (2, 9), (4, 1), (4, 3), (4, 5), (4, 7), (4, 9), (6, 1), (6, 3), (6, 5), (6, 7), (6, 9), (8, 1), (8, 3), (8, 5), (8, 7), (8, 9), (10, 1), (10, 3), (10, 5), (10, 7), (10, 9)]


## Tuples
Tuples are immutable lists.

Unpacking tuples:

In [3]:
fruits = [('banana', 'yellow', 10), ('apple', 'red', 19)]

# Unpacking the tuple with a for loop
for name, _, _ in fruits:
    print(name)

banana
apple


Unpacking using the `*` operator:

In [4]:
# Function returning the sum of its arguments
def add(a, b):
  return a+b

numbers = (1, 2)
result = add (*numbers) # Unpacking numbers with * : add(1,2)
print(result)

3


## Challenge: Read the Line Segments

Given a list of nested tuples of line segments:
`[..., (ID, (x1, y1), (x2, y2)), ...]`

find all line segments that only lie within the $2^{nd}$ quadrant. Return their `ID`s in a list.

Note: The second quadrant holds values for $x \le 0$ and $y \ge 0$.

Assumption: No line segment holds `(0, 0)` as an ending point. 

In [5]:
def filter_line_segments(lines):
    return [ID for ID, (x1, y1), (x2, y2) in lines if 
            x1 <= 0 and y1 >= 0 and 
            x2 <= 0 and y2 >= 0]

# Given the following lines, the expected output is: [2]
lines = [(1, (2, 4), (1, 9)), 
         (2, (-2, 4), (-1, 9)), 
         (3, (-2, -4), (-1, -9))]

filter_line_segments(lines)

[2]

In [6]:
assert filter_line_segments([(1, (-2, 10), (4, -3)),
                             (2, (-2, 1), (-9, 2)),
                             (3, (1, 1), (4, -5)),
                             (4, (-3, 3), (2, -4))]) == [2]
assert filter_line_segments([(1, (-2, 1), (-6, 8)),
                             (2, (-9, 2), (-7, 1)),
                             (3, (-2, 1), (-5, 9)),
                             (4, (-7, 8), (-1, 9))]) == [1,2,3,4]
assert filter_line_segments([(1, (1, 2), (3, 1))]) == []

## Namedtuple: An Extension of Tuple

The `namedtuple` is an extended version of the built-in tuple sequence. It is immutable. 

Let's explore why we need a `namedtuple` data structure in addition to the regular `tuple` data structure.

### Limitations of  tuples
Characteristics of tuples:
* We can only access data from a tuple by using an index.
* Tuples don't guarantee that the data they hold are of the same type, which can make debugging difficult.

`namedtuple` allow us to use human-readable identifiers to access fields. Define it like this:

```python
my_named_tuple = namedtuple(typename, fieldnames)
```

In [7]:
from collections import namedtuple

Fruit = namedtuple('Fruit', 'name color price')

# Create objects of type Fruit
fruit_1 = Fruit('orange', 'orange', 1.50)
fruit_2 = Fruit('pineapple', 'brown', 4)

print(fruit_1)
print(fruit_2)

print("\n-- Access field names directly --")
print(fruit_1.name, fruit_1.color, fruit_1.price)
print(fruit_2.name, fruit_2.color, fruit_2.price)

print("\n-- Access fields using indices --")
print(fruit_1[0], fruit_1[1], fruit_1[2])
print(fruit_2[0], fruit_2[1], fruit_2[2])

Fruit(name='orange', color='orange', price=1.5)
Fruit(name='pineapple', color='brown', price=4)

-- Access field names directly --
orange orange 1.5
pineapple brown 4

-- Access fields using indices --
orange orange 1.5
pineapple brown 4


Since we can still access the values within a `namedtuple` with indices, the unpacking methods do still work.

### `namedtuple` methods and properties
* `_fields` retrieves the name of the fields
* `_make(iterable)` creates an instance using the iterable of our `namedtuple` type
  * Alternatively, we can use the `*` operator, e.g. `Fruit(*iterable)`
* `_asdict` returns a dictionary that maps field names to their values
* `_replace(**kwargs)` updates the value of a field

In [8]:
# Print the field names
print(Fruit._fields)

# Create a Fruit instance from an interable
iterable = ['banana', 'yellow', 1.0]
banana = Fruit._make(iterable)
print(banana)

# Alternatively, use the * operator
iterable = ['cherry', 'red', 5.99]
cherry = Fruit(*iterable)
print(cherry)

# Create a dictionary that maps the field names to their values
cherry_dict = cherry._asdict()
print(cherry_dict)

# Update the price of the cherry
cheaper_cherry = cherry._replace(price=3.05)
print(cheaper_cherry)

('name', 'color', 'price')
Fruit(name='banana', color='yellow', price=1.0)
Fruit(name='cherry', color='red', price=5.99)
OrderedDict([('name', 'cherry'), ('color', 'red'), ('price', 5.99)])
Fruit(name='cherry', color='red', price=3.05)


Namedtuples are a shortcut to defining an immutable class in Python manually.

## Stacks & Queues

* Stack: LIFO, push, pop
* Queue: FIFO, enqueue, dequeue

### Using List
Use `append()` and `pop()` to implement a stack. It leads to an amortized time of $O(1)$.

It would make a poor choice for a queue, though, leading to $O(n)$. We'd use `append()` and `pop(0)` for that poor decision.

_Note:_ Popping from an empty list returns an error.

### Using `collections.deque`
`deque` is like a double-ended queue that lets you add or remove elements from either end. It takes $O(1)$ time for both operations, whether it's a stack or a queue implementation.

The backend implementation of `deque` is a doubly-linked list, which is why random access in the worst case is $O(n)$ unlike the list.

* Stack:
  * use `append()` and `pop()`
* Queue:
  * use `append()` and `popleft()`

In [9]:
from collections import deque

stack = deque()
queue = deque()

print("Stack:")
# Push elements in stack
stack.append('a')
stack.append('b')
stack.append('c')
print(stack)

# Pop element from stack
stack.pop()
print(stack)

print("\nQueue:")
# Enqueue element in queue
queue.append('a')
queue.append('b')
queue.append('c')
print(queue)

# Dequeue element from queue
queue.popleft()
print(queue)

Stack:
deque(['a', 'b', 'c'])
deque(['a', 'b'])

Queue:
deque(['a', 'b', 'c'])
deque(['b', 'c'])


## Challenge: Prefix to Postfix Conversion

Write a function that takes a prefix expression and converts it to a postfix expressoin using a stack.

For example:
* Expression: $A+B$
* Prefix expression: $+AB$
* Postfix expression: $AB+$

More examples:
* $A + B$
  * $+AB$
  * $AB+$
* $(A+B)*(C-D)$
  * $*+AB - CD$
  * $AB+CD-*$
* $(A-(B/C))*((A/K)-L)$
  * $*-A/BC-/AKL$
  * $ABC/-AK/L-*$

In [10]:
from collections import deque

def postfix(prefix):
    stack = deque()
    operators = set(['+', '-', '*', '/'])
    for ch in prefix[::-1]:
        if ch in operators:
            stack.append(''.join([stack.pop(), stack.pop(), ch]))
        else:
            stack.append(ch)
            
    return stack.pop()


prefixes = ['+AB', '*+AB-CD', '*-A/BC-/AKL']
postfixes = ['AB+', 'AB+CD-*', 'ABC/-AK/L-*']

for pre, post in zip(prefixes, postfixes):
    print('--')
    pf = postfix(pre)
    print(pre, ' ==> ', post, ' | ', pf)
    assert post == pf


--
+AB  ==>  AB+  |  AB+
--
*+AB-CD  ==>  AB+CD-*  |  AB+CD-*
--
*-A/BC-/AKL  ==>  ABC/-AK/L-*  |  ABC/-AK/L-*


## Sets
A set is a mutable and unordered collection of items that doesn't allow duplicates.

Useful operations on sets:
* intersection
* union

In [11]:
set1 = {1, 2, 3, 4, 5, 6, 7}
set2 = {5, 6, 7, 8, 9, 10}

set1.add(11)

print('set1 = ', set1)
print('set2 = ', set2)
print('Union = ', set1.union(set2))
print('Intersection = ', set1.intersection(set2))

set1 =  {1, 2, 3, 4, 5, 6, 7, 11}
set2 =  {5, 6, 7, 8, 9, 10}
Union =  {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
Intersection =  {5, 6, 7}


### `frozenset`: an immutable set
`frozenset` creates an immutable static set. 

As opposed to `set`, `frozenset` can be used as:
* dictionary keys
* elements within other sets

## `Counter`: A High-Performance Container
`Counter` is a `dict` subclass for counting hashable objects. It implements the concept of multiset that allows elements in a set to have multiple occurences. It is an unordered container.

The `Counter()` constructor either accepts an interable (e.g. list, dict).

In [12]:
from collections import Counter

c1 = Counter('Hello, my name is Kat! I love building software.')
c2 = Counter({'bananas': 10, 'apples': 5, 'yoghurt': 2})

print('Counts the characters in a string: \n', c1)
print('\nCounter created from a dict: \n', c2)

print('Access elements directly:')
print('c1[i] = ', c1['i'])
print('If an element does not exist, Counter returns 0:')
print('c2[pineapples] = ', c2['pineapples'])

Counts the characters in a string: 
 Counter({' ': 8, 'l': 4, 'e': 4, 'o': 3, 'i': 3, 'a': 3, 'm': 2, 'n': 2, 't': 2, 's': 2, 'b': 1, 'K': 1, 'f': 1, '!': 1, 'u': 1, 'g': 1, 'w': 1, 'y': 1, 'H': 1, 'v': 1, ',': 1, 'I': 1, 'r': 1, 'd': 1, '.': 1})

Counter created from a dict: 
 Counter({'bananas': 10, 'apples': 5, 'yoghurt': 2})
Access elements directly:
c1[i] =  3
If an element does not exist, Counter returns 0:
c2[pineapples] =  0


### Counter methods
* `update([iterable-or-mapping])`
  * elements are counted from an iterable or added-in from another mapping or counter.
* `elements()`
  * returns an iterator with elements repeating the number of times equal to their count
* `most_common([n])`
  * returns a list of the `n` most common elements and their counts
  * n is optional. if not added, `most_common()` will return all elements
  * `most_common()[:-n-1:-1]` returns `n` least common elements
* `subtract([iterable-or-mapping])`
  * subtract an iterable from another iterable/mapping.

In [13]:
c = Counter({'bananas': 10, 'apples': 5, 'yoghurt': 2})

print('\n--- update([iterable-or-mapping]) ---')
print('before: ', c)
c.update(['bananas'])
c.update(['pineapples'])
print('after: ', c)

print('\n--- elements() ---')
fruits = []
for i in c.elements():
    fruits.append(i)
print(fruits)

print('\n--- most_common([n]) ---')
print(c.most_common(2))
print(c.most_common()[:-3:-1])

print('\n--- subtract([iterable-or-mapping]) ---')
x = Counter(a=4, b=2, c=0, d=-2)
y = Counter(a=1, b=2, c=3, d=4)
print('x = ', x)
print('y = ', y)
x.subtract(y)
print('x.subtract(y): ', x)


--- update([iterable-or-mapping]) ---
before:  Counter({'bananas': 10, 'apples': 5, 'yoghurt': 2})
after:  Counter({'bananas': 11, 'apples': 5, 'yoghurt': 2, 'pineapples': 1})

--- elements() ---
['apples', 'apples', 'apples', 'apples', 'apples', 'yoghurt', 'yoghurt', 'pineapples', 'bananas', 'bananas', 'bananas', 'bananas', 'bananas', 'bananas', 'bananas', 'bananas', 'bananas', 'bananas', 'bananas']

--- most_common([n]) ---
[('bananas', 11), ('apples', 5)]
[('pineapples', 1), ('yoghurt', 2)]

--- subtract([iterable-or-mapping]) ---
x =  Counter({'a': 4, 'b': 2, 'c': 0, 'd': -2})
y =  Counter({'d': 4, 'c': 3, 'b': 2, 'a': 1})
x.subtract(y):  Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})


### Summary `Counter`
`Counter` extends `set` by keeping not only track of distinct values, but also how many times they arise.

## Dictionaries
Dictionaries are an unordered data structure and hold key:value pairs. Only immutable data types are allowed as keys. 

### `defaultdict`
`defaultdict` is a subclass of `dict`. The `defaultdict([default_factory[, ...]])` constructor is used to create a dictionary. The argument `default_factory` attribute is used to specify the type of dictionary object to be returned. By default, it's `None`. It can be specified for example as `int`, `list`, `set`, etc.

In [14]:
from collections import defaultdict

s = [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('b', 1), ('a', 4)]

d = defaultdict()
for key, value in s:
    d[key] = value   # Making key value pairs

print(d.items())

d = defaultdict(list)
for key, value in s:
    d[key].append(value)    # Making key value pairs

print(d.items())

d = defaultdict(set)
for key, value in s:
    d[key].add(value)    # Making key value pairs

print(d.items())

d = defaultdict(int)
for key, value in s:
    d[key] = value   # Making key value pairs

print(d.items())

d = defaultdict(dict)
for key, value in s:
    d[key] = {key: value}  # Making key value pairs

print(d.items())

dict_items([('a', 4), ('b', 1), ('d', 4), ('c', 3)])
dict_items([('a', [1, 4]), ('b', [2, 1]), ('d', [4]), ('c', [3])])
dict_items([('a', {1, 4}), ('b', {1, 2}), ('d', {4}), ('c', {3})])
dict_items([('a', 4), ('b', 1), ('d', 4), ('c', 3)])
dict_items([('a', {'a': 4}), ('b', {'b': 1}), ('d', {'d': 4}), ('c', {'c': 3})])


### `OrderedDict`
`OrderedDict` is a subclass of `dict` and returns items in the order that the keys have been inserted. The order does not get changed even when the value of a key is changed.

In [15]:
from collections import OrderedDict

d = OrderedDict()   

# Inserting elements
d['a'] = 1
d['b'] = 2
d['c'] = 3
print(d)

# Changing a value
d['a'] = 5
print(d)

OrderedDict([('a', 1), ('b', 2), ('c', 3)])
OrderedDict([('a', 5), ('b', 2), ('c', 3)])


#### `dict` + `sorted`
There are different ways to specify the order for a `dict` using `sorted`.

In [16]:
d['z'] = 4
d['g'] = 5
print(d)

# Sort in ascending order using keys
print(OrderedDict(sorted(d.items(), key=lambda t: t[0])))

# sort in ascending order using values
print(OrderedDict(sorted(d.items(), key=lambda t: t[1])))


OrderedDict([('a', 5), ('b', 2), ('c', 3), ('z', 4), ('g', 5)])
OrderedDict([('a', 5), ('b', 2), ('c', 3), ('g', 5), ('z', 4)])
OrderedDict([('b', 2), ('c', 3), ('z', 4), ('a', 5), ('g', 5)])


## Challenge: Count with Dictionary

Write a function that counts occurences of each letter in message using `defaultdict`. Then, sort the items on the basis of values using `OrderedDict`.

Input:
* String

Output:
* `OrderedDict`

Example:
* Input
  * `'Welcome to Educative'`
* Output
  * `OrderedDict([('d', 1), ('a', 1), ('i', 1), ('l', 1), ('v', 1), ('W', 1), ('m', 1), ('u', 1), ('E', 1), ('c', 2), ('t', 2), (' ', 2), ('o', 2), ('e', 3)])`

In [17]:
from collections import defaultdict, OrderedDict
s = 'New York, New York. I want to wake up, in a city that doesn\'t sleep. And find I\'m queen of the hill, top of the heap'

def count_letters(message):
    d = defaultdict(int)
    for ch in message:
        d[ch] += 1
    return OrderedDict(sorted(d.items(), key=lambda item: item[1]))
    
print(count_letters(s))
print(count_letters('Welcome to Educative'))

OrderedDict([('A', 1), ('m', 1), ('c', 1), ('y', 1), ('q', 1), ('Y', 2), ("'", 2), ('u', 2), ('s', 2), ('I', 2), ('r', 2), ('N', 2), ('.', 2), ('f', 3), ('l', 3), (',', 3), ('k', 3), ('d', 3), ('i', 4), ('p', 4), ('w', 4), ('h', 5), ('a', 5), ('n', 6), ('o', 7), ('t', 9), ('e', 11), (' ', 25)])
OrderedDict([('i', 1), ('v', 1), ('E', 1), ('a', 1), ('m', 1), ('u', 1), ('W', 1), ('l', 1), ('d', 1), (' ', 2), ('o', 2), ('t', 2), ('c', 2), ('e', 3)])


By using `defaultdict(int)`, when a letter is first encountered, `default_factory` calls `int()` to supply a default count of zero.