## 17_05 Find the majority element

You are reading a sequence of strings.  You know a priori that more than half the strings are repetitions of a single string (the "majority element") but the positions where the majority element occurs are unknown.  Write a program that makes a single pass over the sequence and identifies the majority element for example, if the input is (b, a, c, a, a, b, a, a, c, a), then a is the majority element (it appears in 6 out of the ten places).

### Hint
Take advantage of the existence of the majority element to perform elimination.

### Initial Remarks
The chapter is Greedy Algorithms and invariants, so I think that I can try something greedy. If I know the length of the sequence, then I know what it takes to constitute a majority element.  If I read the sequence in order, I can tell whether or not a given string could possibly be the majority element.  For instance, at the sixth string of a sequence of ten, I am guaranteed to have seen the majority element at least once.  At the seventh element, I need to have seen the majority element at least twice.  If I keep count of the number of times that I see an element, I should be able to determine what to start eliminating at the n/2 mark, where n is the length of the sequence.



In [4]:
import math
from collections import defaultdict
def solution_1(seq):
    print("Sequence is : {}".format(seq))
    d = {}
    majority = math.ceil(len(seq) / 2)
    print("Majority element occurs at least {} times".format(majority))
    d = defaultdict(lambda: majority, d)
    i = 0
    while i < len(seq):
        key = seq[i]
        d[key] -= 1
        print(d)
        if d[key] == 0:
            print("Found majority element {} at index {}".format(key, i))
            return key
        i += 1

sequences = [
    ["b", "a", "c", "a", "a", "b", "a", "a", "c", "a"],
    ["a", "a", "a", "a", "b", "b", "b"],
    ["a"],
    ["b", "b", "b", "a", "a", "a", "a"],
    ["b", "a", "a"]
    
]
for sequence in sequences:
    %time solution_1(sequence)

Sequence is : ['b', 'a', 'c', 'a', 'a', 'b', 'a', 'a', 'c', 'a']
Majority element occurs at least 5 times
defaultdict(<function solution_1.<locals>.<lambda> at 0x7f576c08d1e0>, {'b': 4})
defaultdict(<function solution_1.<locals>.<lambda> at 0x7f576c08d1e0>, {'b': 4, 'a': 4})
defaultdict(<function solution_1.<locals>.<lambda> at 0x7f576c08d1e0>, {'b': 4, 'a': 4, 'c': 4})
defaultdict(<function solution_1.<locals>.<lambda> at 0x7f576c08d1e0>, {'b': 4, 'a': 3, 'c': 4})
defaultdict(<function solution_1.<locals>.<lambda> at 0x7f576c08d1e0>, {'b': 4, 'a': 2, 'c': 4})
defaultdict(<function solution_1.<locals>.<lambda> at 0x7f576c08d1e0>, {'b': 3, 'a': 2, 'c': 4})
defaultdict(<function solution_1.<locals>.<lambda> at 0x7f576c08d1e0>, {'b': 3, 'a': 1, 'c': 4})
defaultdict(<function solution_1.<locals>.<lambda> at 0x7f576c08d1e0>, {'b': 3, 'a': 0, 'c': 4})
Found majority element a at index 7
CPU times: user 848 µs, sys: 97 µs, total: 945 µs
Wall time: 3.14 ms
Sequence is : ['a', 'a', 'a', 'a', 'b

### Concluding Remarks:

Since I know that the majority element occurs at least ceil(n/2) times, I use a default dictionary where each key has that value.  Each time I encounter a key, I subtract 1 from that value.  The first key whose value turns zero is the majority element.

Time complexity $$ O(n) $$
Additional Space complexity $$ O(k) $$