# Strings - Manacher's Algorithm (Longest Palindromic Substring)

## Problem Statement
**Input**: String $s$ of length $0 <= n$ (it fits reasonably in memory)

**Output**: The longest contiguous substring $s'$ of $s$ such that $s'[i] = s'[n-1-i]$ for $0 <= i < n$, i.e. $s'[0] = s'[n-1]$, $s'[1] = s'[n-2]$, etc. If multiple are possible, any are valid.

**Examples**:
- If `s = "zabad"`, `s' = "aba"`
- If `s = "aabaa"`, `s' = "aabaa"`.
- If `s = "abaxcbc"`, `s' = "aba"` or `"cbc"`
- If `s = "cab"`, `s' = "c"` or `"a"` or `b"`
- If `s = ""`, `s' = ""`

In [1]:
import time

def test(fn):
    test_start = time.time_ns()
    cases = [
        ("aabaa", ["aabaa"]),
        ("zabad", ["aba"]),
        ("abaxcbc", [ "aba", "cbc"]),
        ("cab", ["c","a","b"]),
        ("", [""]),
        (("ab"*100) + "zzzzzzzzzz"*100 + ("gm"*100), ["zzzzzzzzzz"*100])
    ]
    case_times = {}
    for i, test_case in enumerate(cases):
        given, allowed = test_case
        case_start = time.time_ns()
        actual = fn(given)
        assert actual in allowed, f"Allowed: {[item for item in allowed]}\nActual: {actual}"
        case_end = time.time_ns()
        case_times[i] = case_end - case_start
    test_end = time.time_ns()
    for case, case_time in case_times.items():
        print(f"Case {case}: {case_time} nsec")
    print(f"Total test time: {round(test_end - test_start,2)} nsec")

## Details
There are virtually no good explanations of this problem; I've looked over a few resources and all of them are either poorly written or hard to follow. Here's three different approaches (and I'm sure there are more).

## Brute force - every possible substring

The simplest way to find the longest palindromic substring is just to test every possible substring of $s$ to see if it's a palindrome, and then return the largest one we find. There are $n \cdot n-1 = n^2$ possible substrings of $s$, each of which must be verified in $O(n)$ time, so the overall algorithm will have a runtime complexity of $O(n^3)$. If we only keep track of the starting and ending indices of our longest substring, we can have $O(c)$ space usage.

In [2]:
def is_palindrome(s, l, r):
    while l <= r:
        if s[l] != s[r]:
            return False
        l += 1
        r -= 1
    return True

def brute_force(s):
    best_l, best_r = 0, 0
    for l, _ in enumerate(s):
        for r in range(l, len(s)):
            if is_palindrome(s, l, r) and (r-l > best_r - best_l):
                best_l, best_r = l, r
    return s[best_l:best_r+1]

test(brute_force)

Case 0: 26522 nsec
Case 1: 14966 nsec
Case 2: 21666 nsec
Case 3: 6627 nsec
Case 4: 1398 nsec
Case 5: 9737761630 nsec
Total test time: 9737845494 nsec


## Brute force v2 - every possible center
The brute force strategy does unnecessary work. Suppose we run it for `s = "aaxxxxbb"`. The algorithm will test substrings `"x"` (there are 4 different ones), `"xx"` (3 different ones), and `"xxx"` (two different ones), then eventually find the actual answer `"xxxx"`. We wind up going over the same four characters numerous times. When I intuitively look at the string to check if it's a palindrome, my brain starts at the middle (a single character for odd-length strings and gap between characters for even length ones) and works outwards. We can do something similar with code to eliminate unnecessary work. We will insert mock characters between each actual character to make the idea of a "center" concrete, then do an "outwards test". 

This will test $2n+1$ possible centers, each of which takes $n$ time to verify, thus bringing us down to $O(n^2)$ runtime. Adding and removimg bogus "center" characters between each actual character in the string will add two separate $n$ time steps, but neither changes the overall $O(n^2)$ runtime of the program.

In [3]:
def best_palindrome_lr(split_string, center):
    l = r = center
    while (1 < l) \
      and (r < len(split_string)-1) \
      and split_string[l-1] == split_string[r+1]:
        l-=1
        r+=1
    return l, r
    
def better_brute_force(string):
    split_string = f"|{'|'.join(list(string))}|"
    best_l = best_r = 0
    for center, _ in enumerate(split_string):
        l,r = best_palindrome_lr(split_string, center)
        if (r-l > best_r - best_l):
            best_l, best_r = l, r
    return "".join([c for c in split_string[best_l:best_r+1] if c != "|"])

test(better_brute_force)

Case 0: 11477 nsec
Case 1: 6815 nsec
Case 2: 8103 nsec
Case 3: 3718 nsec
Case 4: 1711 nsec
Case 5: 243329234 nsec
Total test time: 243366882 nsec


## Use the Force - Manacher's Algorithm

So can we do better than $O(n^2)$? If so, it means that our brute force v2 is also doing unncessary work. Manacher's algorithm is $O(n)$ - from we can guess we probably only need to look at each character (or bogus "center" character) once. Where might the unnecessary work be happening? 

Consider `s = aaxxxxbb` (using `"."` as a center:  `'.a.a.x.x.x.x.b.b.'`):

| . | a | . | a | . | x | . | x | . | x | . | x | . | b | . | b | . |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|

Suppose we used our "middle-outward" strategy and just checked `s[2]` as the center. The program would've halted after `l = 0, r = 4` and we'd then know about the palindrome-ness of that part of the string:

| . | a | . | a | . | x | . | x | . | x | . | x | . | b | . | b | . |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
| x | x | c | x | x |   |   |   |   |   |   |   |   |   |   |   |   |

We can use this when we check for palindromes at `s[3]`. We can think of `.a.a.` as a palindrome `p` centered at `s[2]` with radius 2 (total length 5). A palindrome `q` centered at `s[3]` thus falls within `p`:

| . | a | . | a | . | x | . | x | . | x | . | x | . | b | . | b | . |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
| p | p | p | p | p |   |   |   |   |   |   |   |   |   |   |   |   |
| ? | ? | ? | q | ? | ? | ? |...|   |   |   |   |   |   |   |   |   |

so we 


In [18]:
def manacher(string):
    pass

## Resources
- [Manacher's original paper](https://dl.acm.org/doi/pdf/10.1145/321892.321896)
- [The Wikipedia page]() (warning: shitty)
- [Fred Akalin's explanation](https://www.akalin.com/longest-palindrome-linear-time) of Manacher's algorithm 
- [Jewels of Stringology](https://www.amazon.com/Jewels-Stringology-Maxime-Crochemore/dp/9810247826), p. 114


In [None]:

string = '.a.a.x.x.x.x.b.b.'
new = "|"
for i, char in enumerate(string):
    new += f"{i}|"
print(new)

f"|{'|'.join(list(string))}|"