Given a string, find the length of the longest substring, which has no repeating characters.

bruteforce: "unique" keyword means think set()<br>
generate all possible substrings and find the maxlength among the unique substrings

In [1]:
# O(n^3) time | O(k) space len of set
def longestUniqueSubstring(string):
    maxlength = 0
    for i in range(len(string)):
        for j in range(len(string)):
            if len(string[i:j+1]) == len(set(string[i:j+1])):
                maxlength = max(maxlength, len(string[i:j+1]))
    return maxlength

- In brute force, repeatedly check each substring to see if it has duplicates. But this is unnecessary. If a substring $s_{ij}$, from index `i` to `j-1` is already checked to have no duplicates, then only need to check if `s[j]` in already in the substring.

- To check if a char is already in the current substring, scan the entire substring leading to O(n^2) time.

- Or do better by using hashmap as a sliding window, such that checking if a char is already in the current substring is O(1) time.

dynamic sliding window using hashmap to store chars in current window<br>
1. until unique keep adding into hashmap and storing maxlength so far
2. if already present in hashmap, shrink the window by incrementing start pointer until past the character already present (encountered first time). While shrinking, remove the characters from the hashmap.
3. check the maxlength after removal (need to check maxlength in both if and else condition OR just if condition)

In [2]:
# O(2n) time | O(k) space size of set
def longestUniqueSubstring(string):
    maxlength = 0
    start = 0
    d = {}
    for i, char in enumerate(string):
        if char not in d:
            d[char] = 1
        else:
            while string[start] != char:
                del d[string[start]]
                start += 1
            start += 1
        maxlength = max(maxlength, i-start+1)
    return maxlength 

or dynamic sliding window using hashset to store chars in current window --- set tells us whether a character exists or not<br>
1. if not in set, add and slide the end pointer to the right
2. stop when s[end] is already in set and store the maxlength so far
3. increment start pointer to right and remove each char from set as incrementing until s[end] is no longer in set

In [3]:
def longestUniqueSubstring(string):
    maxlength = 0
    start = 0
    end = 0
    d = set()
    while start < len(string) and end < len(string):
        if string[end] not in d:
            d.add(string[end])
            end += 1
            maxlength = max(maxlength, end-start)
        else:
            d.remove(string[start])
            start += 1
    return maxlength

In [4]:
string = "tmmzuxt"
longestUniqueSubstring(string)

5

optimal sliding window<br>
1. use hashmap to store the last index of each character processed.
2. whenever encounter repeating character, shrink window by skipping chars such that only distinct characters inside window.<br>
shinking via skipping all chars until past repeated character via `start=max(start, 1+d[char])`

The reasoning is that if s[j] have duplicate in range [i,j) with index j', we don't need to increase i little by little. We can skip all the elemnts in the range [i,j'] and let i to be j'+1 directly.

In [5]:
# O(n) time | O(k) space
def longestUniqueSubstring(string):
    maxlength = 0
    start = 0
    d = {}
    for i, char in enumerate(string):
        if char not in d:
            d[char] = i
        else:
            start = max(start, d[char]+1)
            d[char] = i
        maxlength = max(maxlength, i-start+1)
    return maxlength

In [6]:
#variant
def longestUniqueSubstring(string):
    maxlength = 0
    start = 0
    d = {}
    for i, char in enumerate(string):
        if char in d:
            start = max(start, d[char]+1)
        d[char] = i
        maxlength = max(maxlength, i-start+1)
    return maxlength

In [7]:
string = "tmmzuxt"
longestUniqueSubstring(string)

5