# Finding Duplicates

You are given an array of length n + 1 whose elements belong to the set {1, 2, ..., n}. By the pigeonhole principle, there must be a duplicate. Find it in linear time and space.

In [1]:
t_set = [1,2,3,4,5,6,6,7,8,9] # 6 repeats

In [2]:
def find_dup(array): 
    array = sorted(array)
    for index, value in enumerate(array): 
        print(index, value)

In [15]:
# find_dup(t_set) # just reviewing the format of enumerate

In [16]:
def find_dup(array): 
    array = sorted(array) 
    for index, value in enumerate(array): 
        if array[index] == array[index+1]: 
            return value

In [17]:
find_dup(t_set)

6

In [18]:
# what if the duplicate happens at the end of the list? 
# oh actually nothing! because you return and leave the stream
t_set1 = [1,2,3,4,5,5] # 5 repeats

In [19]:
find_dup(t_set1)

5

In [20]:
# what if there is no duplicate? 
t_set2 = [1,2,3,4,5,6,7,8] 

In [21]:
# find_dup(t_set2) 
# this is where you hit an error
# this says that "list index out of range" and you get in trouble

## Smarter solution

- One kind of what we were doing
- One that is much smarter hah

In [22]:
def duplicate(lst): 
    i = 0 
    while i < len(lst): 
        if lst[i] != i: 
            j = lst[i] 
            if lst[j] == lst[i]: 
                return j
            lst[i], lst[j] = lst[j], lst[i] 
        else: 
            i += 1
    raise IndexError('Malformed input.') 
    
# ok this is similar to what we had going for us

In [23]:
def duplicate(lst): 
    n = len(lst) - 1
    return sum(lst) - (n*(n+1)//2)

# the idea is that you should sum up the array and then subtract it by the sum of 1 to n.
# this uncovers the duplicate! 