# Longest Common Substring (LCSS)

This problem consists in finding the longest string (or strings) that is a substring (or are substrings) on both strings.


Note: LCSS differs from problem of finding Longest Common Subsequence, because substrings are required to occupy consecutive positions within the original sequences.

ref. https://en.wikipedia.org/wiki/Longest_common_substring_problem

## Brute force

The naive (or brute force) solution would be to consider all substrings of the second string and find the longest substring that is also a substring of first string. This would have a time complexity of $O((m+n)*m^2)$ because it takes (m+n) time for substring search and there are $m^2$ substrings generated by second string.

## Dynamic Programming

$LCSS[i,j] =\left\{\begin{matrix}
LCSS[i-1,j-1] + 1 & if X[i] = Y[j] \\ 
0 &  otherwise
\end{matrix}\right.$

considering that $0 \le i < m$, where $m$ is the length of the string X, and $0 \le j < n$, where $n$ is the length of the string Y.

In [1]:
def LCSS(X: str, Y: str, verbose=False):
    m = len(X)
    n = len(Y)

    # stores the max length of LCS
    maxlen = 0
    endingIndex = m  # stores the ending index of LCS in X

    # lookup[i, j] stores the length of LCSS
    # X[0..i-1], Y[0..j-1]
    # initialize all cells of lookup table to 0
    lookup = [[0] * (n + 1) for i in range(m + 1)]

    # fill the lookup table in bottom-up manner
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            # if current character of X and Y matches
            if X[i - 1] == Y[j - 1]:
                lookup[i][j] = lookup[i - 1][j - 1] + 1

                # update the maximum length and ending index
                if lookup[i][j] > maxlen:
                    maxlen = lookup[i][j]
                    endingIndex = i
    if verbose:
        # print the lookup matrix
        print(" " * 5, " ". join(["{:>2s}".format(v) for v in Y]))
        for i in range(m + 1):
            print("{:>2s}".format(X[i-1] if i > 0 else ""),  " ". join(["{:2d}".format(v) for v in lookup[i]]))
    
    # return LCSS having length maxlen
    return maxlen, X[endingIndex - maxlen:endingIndex]

In [3]:
LCSS("ABC", "BABA", verbose=True)

       B  A  B  A
    0  0  0  0  0
 A  0  0  1  0  1
 B  0  1  0  2  0
 C  0  0  0  0  0


(2, 'AB')

In [4]:
LCSS("ABABC", "BABCAD", verbose=True)

       B  A  B  C  A  D
    0  0  0  0  0  0  0
 A  0  0  1  0  0  1  0
 B  0  1  0  2  0  0  0
 A  0  0  2  0  0  1  0
 B  0  1  0  3  0  0  0
 C  0  0  0  0  4  0  0


(4, 'BABC')