# Dictionary

A data structure whose items are pairs of `key:value`; `keys` are unique (form a set).

In [1]:
d0 = {}   #  an empty dictionary
d1 = {'one':1, '2':'two', 3:'3', 2:'two'}  # a dictionary with 4 items
print(d1)

d1[4] = '****'  # ad a new item
d1[3] = '4-1' # modifies the value of 3
d1[2] = ('2', 'two') # modifies the value of 2

print( d1['one'], d1[2], d1['2'] )

# print(d1['five']) KeyError

{'one': 1, '2': 'two', 3: '3', 2: 'two'}
1 ('2', 'two') two


### Operations

The `in` checks whether an object is a key of the dictionary. Its cost can be assumed constant.

In [2]:
print( 'one' in d1)

True


The 'indexing' returns the value corresponding to a given key. Its cost can be assumed constant.

In [3]:
print(d1[2])

('2', 'two')


The function `len` can be use to get the number of items (paris) in the dictionary. The methods `keys` and `values` return the list of the keys and of the values of the dictionary.

In [4]:
print(len(d1))
print(d1.keys())
print(d1.values())

5
dict_keys(['one', '2', 3, 2, 4])
dict_values([1, 'two', '4-1', ('2', 'two'), '****'])


### Improving efficiency with dictionaries : the intersection problem

If we put the items of the lists in two dictionaries, the `in` operation will require constant cost instead of linear cost.

In [5]:
def intersection( a, b ):
    '''
    Parameters
    ----------
    a, b : lists of items

    Returns
    -------
    a list c that contains the intersection of a and b (no repetitions!)

    '''
    
    n, m = len(a), len(b) # O(1) time
    c = [] #  the output, the intersection between a and b
    
    d_b = {}
    for x in b: # O(m)
        d_b[x] = None
    d_a = {}
    for x in a: # O(n)
        d_a[x] = None
    
    for x in d_a: # for n times
        if (x in d_b): # O(1)
            c.append(x) # O(1)
            
    return c

a = [4, 2, 9, 0, 2, 8, 3, 4]
b = [3, 1, 9, 10, 7, 8, 3]

c = intersection(a, b)
print(c)

[9, 8, 3]


The time complexity is: `O(n+m)`
The space complexity: `O(n+m)` because the size of `d_a` and `d_b`. The next version of the function takes in input two dictionaries the encode sets. In this case, it is more worthwhile to optimize the main loop.

In [6]:
def intersection(s0, s1):
    '''
    Input: s0, s1 are dictionaries tha implement sets.
        The items of the set implemented by s0 are the keys
        of s0
        
    Returns: the dictionary that describes the intersection set
    '''
    # n = len(s0), m = len(s1)
    
    s = {}   # O(1) in a.c. 
    
    # this are two aliases
    d0, d1 = (s0, s1) if min(len(s0), len(s1)) == len(s0) else (s1, s0)
    
    for x in d0: # for min(n, m)
        if x in d1: #  O(1) in a.c.
            s[x] = None # O(1) in a.c.
              
    return s

    # Time complexity O( min(n, m) )

Similarly, we can implement a `union` function.

In [7]:
def union(s0, s1):
    '''
    Input: s0, s1 are dictionaries tha implement sets.
        The items of the set implemented by s0 are the keys
        of s0
        
    Returns: the dictionary that describes the union set
    '''
    u = {}
    
    # n, m = len(s0), len(s1)
    
    for x in s0:
        u[x] = None
    for x in s1:
        u[x] = None
        
    # Time complexity is O(n+m) in a.c.
    return u

In [8]:
a = { 'one': None, 'two': None, 'zero': None, 'three':None }
b = { 'five': None, 'two': None, 'zero': None, 'nine':None }

print(intersection(a, b))
print(union(a, b))

{'two': None, 'zero': None}
{'one': None, 'two': None, 'zero': None, 'three': None, 'five': None, 'nine': None}


# Problem

Given two list sorted in increasing order, merge the two list in a new sorted list.

In [9]:
a = [2, 5, 7, 10, 13, 13, 32]
b = [0, 2, 4, 5, 10, 12, 21, 34, 50, 51, 90]

c = sorted(a+b)

Let `n` be the sum of the sizes of `a` and `b`. In the worst case the previous solution requires `O(n log n)` time. This solution do not use the hypotesis.

A more efficient solution compares items of `a` and `b` from the smaller, it appends to the output list the minimum between the current items from `a` and `b`. Every time that the algorithm choose the minimum from a list, the current item in this list is updated with the next one.

This part of the algorithm ends when one of the two list will be totally consumed. The algorithm ends by appending the remaining elements of the other list at the end of the output list.

At each iteration it is consumed an item from `a` or from `b`. So, after `O(n)` steps the first part of the algorithm ends. The cost of the second part is also `O(n)` because we have at most `n` append operations.

In [10]:
a = [3, 4, 6, 8, 9, 10,100,200]
b = [1, 2, 4, 4, 7, 9, 10, 20, 22, 22, 25]

c = []

i, j = 0, 0
na, nb = len(a), len(b)
while i < na and j < nb:
    if a[i] <= b[j]: # time compl O(1) for the block
        c.append(a[i])
        i += 1
    else:
        c.append(b[j])
        j += 1

if j < nb: # and so i == na
    c.extend(b[j:])
else: # i < na
    c.extend(a[i:])

The `extend` method of lists extends appends to the list all the items from another list.

The space complexity is constant because, except the input and output lists, no other data structures that grow as a function of `len(a)+len(b)` are used.

# The merge sort algorithm

Two sorted list can be merged in a new sorted list with the next *merge* algorithm.

```python
a = [3, 4, 6, 8, 9, 10,100,200]
b = [1, 2, 4, 4, 7, 9, 10, 20, 22, 22, 25]

c = []

i, j = 0, 0
na, nb = len(a), len(b)
while i < na and j < nb:
    if a[i] <= b[j]: # time compl O(1) for the block
        c.append(a[i])
        i += 1
    else:
        c.append(b[j])
        j += 1

if j < nb: # and so i == na
    c.extend(b[j:])
else: # i < na
    c.extend(a[i:])
```

The time complexity `O( len(a)+len(b) )`. This is used to desing a efficient recursive sorting algorithm called **merge sort**.

## The sorting algorithm

Let `a` be a sequence of `n` items (numbers). We can consider two consecutive items as two sorted sequence of size one, so we can run the `merge` algorithm on these sequences. This procedure is applied on the `n/2` pairs of consecutive items. After this first step - named step 0 - the sequence contains `n/2` sorted lists of size 2. At step 1 are merged consecutive sequence of size 2: the result are `n/4` sorted sub-sequences of size 4. In a general step `h` we start from `n/(2**h)` sorted list of size `2**h`; the consecutive sublists are merged together obtaining `n/(2**(h+1))` sorted lists of size `2**(h+1)`. Doing this, at the last step `t`, we start from
    
    2 = n/(2**t)
    
sublists of size `n/2` that are merged in the final sorted list of size `n`. Observe that, from the equation above

    2**(t+1) = n
    
that is, the number of steps are `O(log n)`.

### Computational cost

At step `h-1` are performed `n/(2**h)` merge operations on pairs of segments of size `2**h` per pair. The cost of a single merge is `O(2**h)`, then the cost of all merges at step `h` is `O(n)`. This is true for all the `O(log n)` steps, and it implies that the time complexity of the algorithm is `O(n log n)`.

### Implementation

We need a modified version of the `merge` function that works on two consecutive segments of the same list. It modifies the input list, so the contents of the working list `c` is copied in `a`.

In [11]:
def merge(a, lx, cx, rx):
    '''
    Parameters
    ----------
    a : is a list of items that can be compared with <
    lx, cx, rx : indexes of a lx < cx < rx
    
    a[lx:cx] is sorted
    a[cx:rx] is sorted
    
    Returns: None
    Modifies a: a[lx:rx] will be sorted
    '''
    c = [] 
    
    i, j = lx, cx

    while i < cx and j < rx:
        if a[i] <= a[j]: 
            c.append(a[i])
            i += 1
        else:
            c.append(a[j])
            j += 1
    
    if j < rx: 
        c.extend(a[j:rx])
    else: 
        c.extend(a[i:cx])
        
    for i in range(len(c)):
        a[lx+i] = c[i]
        
    # time complexity O(rx-lx)
    # space complexity O(rx-lx) because of c
    
def merge_sort(a, lx=0, rx=None):
    '''
    Parameters
    ----------
    a : is a list of items that can be compared with <
    lx, rx : int

    Returns
    -------
    None.

    a[lx:rx] will be sorted
    '''
    if rx == None:
        rx = len(a)
    
    if lx < rx-1:
        cx = (lx+rx)//2
        merge_sort(a, lx, cx)
        merge_sort(a, cx, rx)
        merge(a, lx, cx, rx)
            

a = [2, 4, 3, 2, 6, 7, 5, 6, 7, 8, 9,1, 2, 3]
merge_sort(a)
print(a)

[1, 2, 2, 2, 3, 3, 4, 5, 6, 6, 7, 7, 8, 9]


The space complexity is given by the memory used by the frames of the recursive calls in the stack plus the additional memory used in the function `merge`. The number of frames in the stack is `O(log n)`, one for each recursive level. The maximum amount of memory used by the `merge` function is `O(n)` so the space complexity is `O(n)`. 