# **Problem Statement**  
## **9. Implement the Rabin–Karp Algorithm for Substring Search.**

Implement the **Rabin–Karp algorithm** to find all occurrences of a given pattern `pat` in a text `txt`.

The Rabin–Karp algorithm uses **hashing** to compare substrings of the text with the pattern, which helps improve efficiency compared to the naive string matching algorithm.

Required Task:
- Input two strings, `txt` (the main text) and `pat` (the pattern to search).
- Output all starting indices where the pattern is found in the text.


### Constraints & Example Inputs/Outputs

- 1 ≤ len(txt), len(pat) ≤ 10⁵
- The text and pattern contain only lowercase English letters (`a-z`).
- Return indices in **0-based indexing**.

Example:
```python
| Text (`txt`) | Pattern (`pat`) | Output | Explanation |
|---------------|----------------|---------|-------------|
| "abracadabra" | "abra" | [0, 7] | "abra" occurs at index 0 and 7 |
| "aaaaa" | "aa" | [0, 1, 2, 3] | Overlapping matches included |
| "abcdef" | "gh" | [] | No matches |
| "hello" | "ll" | [2] | Match at index 2 |
```

### Solution Approach

Here are the 2 best possible approaches:
1. **Naive Approach Review:**
   - Compare each substring of `txt` (of length equal to `pat`) with `pat`.
   - Time Complexity: O((n - m + 1) * m), where n = len(txt), m = len(pat).

2. **Idea of Rabin–Karp:**
   - Convert each substring and the pattern into a **hash value**.
   - Instead of comparing substrings character by character, compare hash values.
   - If the hashes match, verify the substring to avoid false positives (due to hash collisions).

3. **Rolling Hash Technique:**
   - Use a base (like 256 for ASCII) and a large prime number (like 101) for modulus.
   - Compute the hash of the pattern and the first substring of text.
   - Then **slide the window** one character at a time and update the hash in O(1).

4. **Hash Update Formula:**

   - hash_new = (d * (hash_old - ord(txt[i]) * h) + ord(txt[i + m])) % q

     where:
        - `d` = base (number of possible characters)
        - `q` = a prime number (to reduce collisions)
        - `h = pow(d, m-1) % q`

5. **Compare hashes and confirm actual substring matches.**


### Solution Code

In [1]:
# Approach1: Brute Force Approach
def naive_substring_search(txt, pat):
    n, m = len(txt), len(pat)
    result = []
    for i in range(n - m + 1):
        if txt[i:i+m] == pat:
            result.append(i)
    return result


### Alternative Solution

In [2]:
# Approach2: Optimized (Rabin-Karp Algorithm)
def rabin_karp(txt, pat, q=101):
    n, m = len(txt), len(pat)
    d = 256  # number of characters in the input alphabet
    p = 0  # hash value for pattern
    t = 0  # hash value for text
    h = 1
    result = []

    # The value of h would be "pow(d, m-1) % q"
    for _ in range(m - 1):
        h = (h * d) % q

    # Calculate hash value for pattern and first window of text
    for i in range(m):
        p = (d * p + ord(pat[i])) % q
        t = (d * t + ord(txt[i])) % q

    # Slide the pattern over text
    for i in range(n - m + 1):
        # Check if hash values match
        if p == t:
            # Confirm by actual substring check to avoid collision error
            if txt[i:i + m] == pat:
                result.append(i)

        # Calculate hash for next window
        if i < n - m:
            t = (d * (t - ord(txt[i]) * h) + ord(txt[i + m])) % q
            if t < 0:
                t = t + q

    return result


### Alternative Approaches

1. **Knuth–Morris–Pratt (KMP) Algorithm**
   - Uses preprocessing to build a longest prefix-suffix (LPS) array.
   - Time Complexity: O(n + m)
   - Space Complexity: O(m)

2. **Boyer–Moore Algorithm**
   - Skips sections of the text using bad-character and good-suffix heuristics.
   - Very efficient in practice for large alphabets.

3. **Naive Search**
   - Simple but slow for long texts (O(n * m)).

### Test Case

In [3]:
# Test Cases
test_cases = [
    ("abracadabra", "abra"),
    ("aaaaa", "aa"),
    ("abcdef", "gh"),
    ("hello", "ll"),
    ("abcdabcabcd", "abc")
]

for txt, pat in test_cases:
    print(f"Text: '{txt}', Pattern: '{pat}' -> Matches at indices: {rabin_karp(txt, pat)}")


Text: 'abracadabra', Pattern: 'abra' -> Matches at indices: [0, 7]
Text: 'aaaaa', Pattern: 'aa' -> Matches at indices: [0, 1, 2, 3]
Text: 'abcdef', Pattern: 'gh' -> Matches at indices: []
Text: 'hello', Pattern: 'll' -> Matches at indices: [2]
Text: 'abcdabcabcd', Pattern: 'abc' -> Matches at indices: [0, 4, 7]


## Complexity Analysis

### Time Complexity
- **Average Case:** O(n + m)
- **Worst Case:** O(n * m) (due to hash collisions)

### Space Complexity
- O(1) (only a few integer variables used)


#### Thank You!!