# Longest Distinct Subarray

Given an array of elements, return the length of the longest subarray where all its elements are distinct.

For example, given the array [5, 1, 3, 5, 2, 3, 4, 1], return 5 as the longest subarray of distinct elements is [5, 2, 3, 4, 1].


In [1]:
# naive solution

def is_distinct(arr): 
    d = {} 
    for e in arr: 
        if e in d: 
            return False
        d[e]= True
        
    return True

def distinct_subarray(arr): 
    max_distinct_subarray = [] 
    for i in range(len(arr)): 
        for j in range(i + 1, len(arr) + 1): 
            subarray = arr[i:j] 
            if is_distinct(subarray) and len(subarray) > len(max_distinct_subarray): 
                max_distinct_subarray = subarray
    return len(max_distinct_subarray) 

## $O(n^3)$ time and $O(n)$ space... 

- We need to get O(n^2) subarrays
- Iterate over each subarray which can be up to O(n) in length

## Obviously this is bad... how can we be faster? 

We can keep track of the indices of the last occuring elements and the longest running distinct subarray. **This is an important principle of keeping track of two different figures while moving through an array or piece of data.** 

When we look at the element at the next index, there are two cases for the longest distinct subarray. First case: element doesn't exist in the dictionary, then the new longest subarray is the same as the previous one with the current element appended... Second case is that if it does exist in the dictionary, then the longest distinct subarray starts from d[i] + 1 to the current index. 

In [3]:
def distinct_subarray(arr): 
    d = {} # most recent occurrences of each element
    
    result = 0 
    longest_distinct_subarray_start_index = 0 
    for i, e in enumerate(arr): 
        if e in d: 
            # if d[e] appears in teh middle of the current longest distinct subarray ....
            
            if d[e] >= longest_distinct_subarray_start_index: 
                result = max(result, i - longest_distinct_subarray_start_index)
                longest_distinct_subarray_start_index = d[e] + 1
        d[e] = i 
        
    return max(result, len(arr) - longest_distinct_subarray_start_index) 

## Much more efficient...

This runs in $O(n)$ time and $O(1)$ space. Reflects an important principle of **keeping track of values while moving through the array, instead of always starting back at the beginning and repeating wasteful calculations.**