Z algorithm is a linear time string matching algorithm which runs in complexity $O(n)$. It is used to find all occurrence of a pattern $P$ in a string , which is common string searching problem.

__Algorithm__

Given a string $S$ of length $n$, the Z Algorithm produces an array $Z$ where __$Z[i]$ is the length of the longest substring starting from $S[i]$ which is also a prefix of $S$__,  
i.e. the maximum $k$ such that $S[j]=S[i+j]$ for all $0 \leq j < k$. Note that $Z[0] = 0$ means that $S[0] \neq S[i]$. For easier terminology, let's refer to substrings which are also a prefix as prefix-substrings.


The algorithm relies on a single, crucial invariant. __Iterate over the letters in the string (index i from 1 to n-1)__ and maintain an interval $[L, R]$ which is the interval with __maximum $R$ such that $1 \leq L \leq i \leq R $ and $S[L\dots R]$ is a prefix-substring__ (if no such interval exists, just let $L=R=-1$).  


For $i=1$, simply compute $L$ and $R$ by comparing $S[0\dots]$ to $S[1\dots]$. $Z[1]$ is obtained during this process.


Now, suppose the correct interval $[L, R]$ for $i-1$ and all of the $Z$ values up to $i-1$. Compute $Z[i]$ and the new $[L,R]$ by the following steps:

- If $i > R$, then there does not exist a prefix-substring of $S$ that starts before $i$ and ends at or after $i$. If such a substring existed, $[L, R]$ would have been the interval for that substring rather than its current value. Thus "reset" and compute a new $[L, R]$ by comparing $S[0\dots]$ to $S[i\dots]$ and get $Z[i]$ at the same time $(Z[i] = R- L + 1)$.

- Otherwise, $i \leq R$, so the current $[L, R]$ extends at least to $i$. Let $k= i- L$. It is known that $Z[i]\geq min(Z[k], R - i + 1)$ because $S[i\dots]$ matches $S[k\dots]$ for at least $R-i + 1$ characters (they are in the $[L, R]$ interval which is known to be prefix-substring).

- If $Z[k] < R -i+1$, then there is no longer prefix-substring starting ar $S[i]$ (or else $Z[k]$ would be larger), meaning $Z[i]=Z[k]$ and $[L, R]$ stays the same. The latter is true becasue $[L, R]$ only changes if there is a prefix-substring starting at $S[i]$ that extends beyond $R$, which is not the case here.

- If $Z[k] \ge R-i+1$, then it is possible for $S[i\dots]$ to match $S[0\dots]$ for more than $R-i+1$ characters(i.e past postion $R$). Thus, there's a need to update $[L, R]$ by setting $L =i$ and matching from $S[R+1]$ forward to obtain the new $R$. Again, $Z[i]$ is obtained during this process.

In [10]:
# Note that the optimization L = R = i is used when S[0] \neq S[i]
# (it doesn't affect the algorithm since at the next iteration  regardless).
"""
int L = 0, R = 0;
for (int i = 1; i < n; i++) 
{
    if (i > R) 
    {
        L = R = i;
        while (R < n && s[R-L] == s[R]) 
        {
            R++;
        }
        z[i] = R-L; 
        R--;
    } 
    else 
    {
        int k = i-L;
        if (z[k] < R-i+1) 
        {
            z[i] = z[k];
        } 
        else 
        {
            L = i;
            while (R < n && s[R-L] == s[R]) 
            {
                R++;
            }
            z[i] = R-L; 
            R--;
        }
    }
}
"""

def z_algo(text):
    n = len(text)
    z_arr = [0 for i in range(n)]
    l, r = 0, 0
    
    for i in range(1, n):
        if i > r:
            l = r = i
            while r < n and text[r-l] == text[r]:
                r += 1
            z_arr[i] = r-l
            r-=1
        else:
            k=i-l
            if z_arr[k] < r-l+1:
                z_arr[i] = z_arr[k]
            else:
                l=i
                while r < n and text[r-l] == text[r]:
                    r+=1
                z_arr[i] = r-l
                r-=1
    print(z_arr)
    return z_arr

text = 'abracadabra'
# check that your code works correctly on provided example
assert z_algo(text) == [0, 0, 0, 1, 0, 1, 0, 4, 0, 0, 1], 'Wrong answer'

[0, 0, 0, 1, 0, 1, 0, 4, 0, 0, 1]


In [3]:
def zFunction(text):
    n = len(text)
    z_func = [0 for i in range(n)]

    # YOUR CODE GOES HERE
    l, r= 0, 0

    for i in range(1, n):
        if i <= r:
            z_func[i] = min(z_func[i-1], r-i+1)

        while i + z_func[i] < n and text[z_func[i] +i] == text[z_func[i]]:
            z_func[i] += 1


        new_r = i + z_func[i] - 1
        if new_r > r:
            l, r = i, new_r
    print(z_func)
    return z_func

text = 'abracadabra'
# check that your code works correctly on provided example
assert zFunction(text) == [0, 0, 0, 1, 0, 1, 0, 4, 0, 0, 1], 'Wrong answer'

[0, 0, 0, 1, 0, 1, 0, 4, 3, 2, 1]


AssertionError: Wrong answer

In [37]:
# Fills Z array for given string str[] 
def getZarr(string,): 
    n = len(string) 
    z = [0 for i in range(n)]
  
    # [L,R] make a window which matches 
    # with prefix of s 
    l, r, k = 0, 0, 0
    for i in range(1, n): 
  
        # if i>R nothing matches so we will calculate. 
        # Z[i] using naive way. 
        if i > r: 
            l, r = i, i 
  
            # R-L = 0 in starting, so it will start 
            # checking from 0'th index. For example, 
            # for "ababab" and i = 1, the value of R 
            # remains 0 and Z[i] becomes 0. For string 
            # "aaaaaa" and i = 1, Z[i] and R become 5 
            while r < n and string[r - l] == string[r]: 
                r += 1
            z[i] = r - l 
            r -= 1
        else: 
  
            # k = i-L so k corresponds to number which 
            # matches in [L,R] interval. 
            k = i - l 
  
            # if Z[k] is less than remaining interval 
            # then Z[i] will be equal to Z[k]. 
            # For example, str = "ababab", i = 3, R = 5 
            # and L = 2 
            if z[k] < r - i + 1: 
                z[i] = z[k] 
  
            # For example str = "aaaaaa" and i = 2,  
            # R is 5, L is 0 
            else: 
  
                # else start from R and check manually 
                l = i 
                while r < n and string[r - l] == string[r]: 
                    r += 1
                z[i] = r - l 
                r -= 1
#     print(z)
    return z
    
text = 'abracadabra'
# check that your code works correctly on provided example
assert getZarr(text) == [0, 0, 0, 1, 0, 1, 0, 4, 0, 0, 1], 'Wrong answer'

# prints all occurrences of pattern  
# in text using Z algo 
def search(text, pattern):
  
    # Create concatenated string "P$T" 
    concat = pattern + "$" + text 
    l = len(concat) 
  
    # Construct Z array 
    z = getZarr(concat) 
  
    # now looping through Z array for matching condition 
    for i in range(l): 
  
        # if Z[i] (matched region) is equal to pattern 
        # length we got the pattern 
        if z[i] == len(pattern): 
            print("Pattern found at index",  
                      i - len(pattern) - 1) 

            
pattern = 'bra'
search(text, pattern)

def zAlgorithm(text, pattern):
    n, m  = len(text), len(pattern)
    special_symbol = "#"
    indices = []

    # YOUR CODE GOES
    concat = pattern + special_symbol + text 
    cl = len(concat) 

    z = [0 for i in range(cl)]

    l, r, k = 0, 0, 0
    for i in range(1, cl): 

        if i > r: 
            l, r = i, i

            while r < cl and concat[r - l] == concat[r]: 
                r += 1
            z[i] = r - l 
            r -= 1
        else: 

            k = i - l 

            if z[k] < r - i + 1: 
                z[i] = z[k] 
            else: 
                l = i 
                while r < cl and concat[r - l] == concat[r]: 
                    r += 1
                z[i] = r - l 
                r -= 1

    for i in range(cl): 
        if z[i] == m: 
            indices.append(i - m - 1)

    return indices

# inde = zAlgorithm(text, pattern)
# print(inde)
text = 'abracadabra'
pattern = 'ab'
# check that your code works correctly on provided example
assert zAlgorithm(text, pattern) == [0, 7], 'Wrong answer'

text = 'abracadabracabbsdfsacadf'
pattern = 'aca'

print(zAlgorithm(text, pattern))

Pattern found at index 1
Pattern found at index 8
[3, 10, 19]
