Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.

Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".

 

Example 1:

Input: s = "banana"
Output: "ana"
Example 2:

Input: s = "abcd"
Output: ""
 

Constraints:

2 <= s.length <= 3 * 104
s consists of lowercase English letters.

In [None]:
class Solution:
    def longestDupSubstring(self, s: str) -> str:
        n = len(s)
        seen = set()
        max_len = 0
        res = ""

        for i in range(n):
            for j in range(i + 1, n + 1):
                substr = s[i:j]
                if substr in seen:
                    if len(substr) > max_len:
                        max_len = len(substr)
                        res = substr
                else:
                    seen.add(substr)
        return res
    
# tc - O(n^3):
# - O(n^2) for the nested loops
# - O(n) for checking if the substring is in the set
# sc - O(n^2):
# - in the worst case, we store all substrings in the set


In [8]:
Solution().longestDupSubstring(s = "banana")

'ana'

# Understanding the Rolling Hash



### 🔍 What is Rolling Hash?

A **rolling hash** is a hashing technique where the hash of a substring can be **quickly updated** as you slide a window over a string — instead of recalculating the hash from scratch.

It’s especially used in:

* **Rabin-Karp algorithm** for substring search
* **Longest Duplicate Substring** (like Leetcode 1044)
* Efficient string pattern matching

---

### 📦 How It Works

Let’s say you have a string `s = "banana"` and you want to compute hash for all substrings of length `3`.

#### Step 1: Base Idea (Polynomial Hash)

We treat the string as a number in a **base B**, where each character is a digit:

Let’s pick:

* Base = 26 (since 26 lowercase letters)
* Mod = A large prime (to avoid overflow and reduce collisions)

For a string `"abc"`:

```
hash("abc") = (ord('a') - ord('a')) * 26² +
              (ord('b') - ord('a')) * 26¹ +
              (ord('c') - ord('a')) * 26⁰
            = 0*676 + 1*26 + 2 = 28
```

#### Step 2: Rolling the Hash

Now suppose you computed `hash("abc")`, and now you want `hash("bcd")`.

Instead of recalculating, you do:

```
new_hash = (old_hash - (ord(s[i]) - ord('a')) * 26^(L-1)) * 26 + (ord(s[i+L]) - ord('a'))
```

Where:

* `s[i]` is the old first character being removed
* `s[i+L]` is the new character being added at the end
* `L` is the window size

This works in **O(1)** time per substring.




In [None]:
def rabin_karp(text, pattern):
    n, m = len(text), len(pattern)
    base = 26  # Number of characters in input charset
    prime = 10**9 + 7  # A large prime number for modulo

    # Precompute base^(m-1) % prime
    base_power_pattner_len = 1
    for _ in range(m - 1):
        base_power_pattner_len = (base_power_pattner_len * base) % prime

    # Compute initial hash values
    p_hash = 0  # hash value for pattern
    t_hash = 0  # hash value for text window

    for i in range(m):
        p_hash = (base * p_hash + (ord(pattern[i]) - ord('a'))) % prime
        t_hash = (base * t_hash + (ord(text[i]) - ord('a'))) % prime
        print(f"Initial hashes: p_hash={p_hash}, t_hash={t_hash}")

    for i in range(n - m + 1):
        # If hash values match, do actual string comparison
        if p_hash == t_hash:
            if text[i:i + m] == pattern:
                return f"Pattern found at index {i}"

        # Compute hash for next window
        if i < n - m:
            # Remove leading character and add trailing character.
            t_hash = t_hash - (ord(text[i]) - ord('a')) * base_power_pattner_len

            # Add the next character in the window.
            t_hash = (t_hash * base + (ord(text[i + m]) - ord('a'))) % prime
            
            # We might get negative values of t_hash, convert it to positive
            if t_hash < 0:
                t_hash += prime

    return "Pattern not found"


In [17]:
rabin_karp("banana", "ana")

Initial hashes: p_hash=0, t_hash=1
Initial hashes: p_hash=13, t_hash=26
Initial hashes: p_hash=338, t_hash=689


'Pattern found at index 1'

In [18]:
h = 0
base = 26
for c in "abc":
    h = h * base + (ord(c) - ord('a'))
    print(h)
print(h)


# this is same as 0*26² + 1*26¹ + 2*26⁰ = 0 + 26 + 2 = 28

0
1
28
28


In [19]:
26*26


676

In [None]:
class Solution:
    def longestDupSubstring(self, s: str) -> str:
        n = len(s)
        base = 26
        mod = 2**63 - 1
        nums = [ord(c) - ord('a') for c in s]

        def rabin_karp_check(L):
            h = 0
            for i in range(L):
                h = (h * base + nums[i]) % mod
            seen = {h}
            baseL = pow(base, L, mod)
            for start in range(1, n - L + 1):
                h = (h * base - nums[start - 1] * baseL + nums[start + L - 1]) % mod
                if h in seen:
                    return start
                seen.add(h)
            return -1

        left, right = 1, n - 1
        start = -1
        max_len = 0

        while left <= right:
            mid = (left + right) // 2
            pos = rabin_karp_check(mid)
            if pos != -1:
                left = mid + 1
                start = pos
                max_len = mid
            else:
                right = mid - 1

        return s[start:start + max_len] if start != -1 else ""
# tc:
# - O(log n) for the binary search.
# - o(N) for the rabin_karp_check function.
# tc - O(n log n)

# sc - O(n) for the nums array and the set in rabin_karp_check.

In [48]:
Solution().longestDupSubstring(s = "banana")

'ana'

In [36]:
class Solution:
    def longestDupSubstring(self, s: str) -> str:
        from collections import defaultdict
        
        n = len(s)
        base = 26
        mod = 2**63 - 1
        
        # convert s into array of int (a=0, b=1,...)
        nums = [ord(c) - ord('a') for c in s]
        
        def rabin_karp_check(length):
            if length == 0:
                return -1
                
            h = 0
            for i in range(length):
                h = (h * base + nums[i]) % mod
            
            seen = defaultdict(list)
            seen[h].append(0)
            baseL = pow(base, length, mod)  # precompute base^L % mod
            
            for start in range(1, n - length + 1):
                # remove the leading character and add the next character
                h = (h - (nums[start - 1] * baseL) % mod + mod) % mod
                h = (h * base + nums[start + length - 1]) % mod
                
                if h in seen:
                    # Check for actual substring match to handle hash collisions
                    current_substr = s[start:start + length]
                    for prev_start in seen[h]:
                        if s[prev_start:prev_start + length] == current_substr:
                            return start
                
                seen[h].append(start)
            
            return -1
        
        # binary search
        low, high = 0, n - 1
        result_start = -1
        max_len = 0
        
        while low <= high:
            mid = (low + high) // 2
            pos = rabin_karp_check(mid)
            if pos != -1:
                low = mid + 1
                result_start = pos
                max_len = mid
            else:
                high = mid - 1
        
        return s[result_start:result_start + max_len] if result_start != -1 else ""

# Test the solution
solution = Solution()
test_cases = [
    "banana",
    "abcd",
    "aa",
    "abcdef",
    "aaaa"
]

for test in test_cases:
    result = solution.longestDupSubstring(test)
    print(f"Input: '{test}' -> Output: '{result}'")

Input: 'banana' -> Output: ''
Input: 'abcd' -> Output: ''
Input: 'aa' -> Output: ''
Input: 'abcdef' -> Output: ''
Input: 'aaaa' -> Output: 'aaa'
