# Longest Common Substring (DPV 6.8)#

This problem is from *Algorithms* by Dasgupta, Papadimitriou, and Vazirani (DPV).

### The Problem ###

Given two strings $S$ and $R$, we wish to find the length of their longest common substring.

***Example:***

$S$ = "ABCDEFG"

$R$ = "ZBCDXYE"

The longest common substring of $S$ and $R$ is "BCD" and thus the length of the LCS is 3.


### Define the subproblem in words ###

Each table entry $T(i,j)$ represents the length of the longest common suffix (LCSuf) for $S_0...S_i$ and $R_0...R_j$. 

For example, "MMA" and "MMAM" have a LCSuf of 0, but "MMA" and "GAMMA" have an LCSuf of 3.

### Find the recurrence relation ###


$$
T(i, j) = \left\{\begin{aligned}
&T(i-1, j-1) + 1 && if S_i = R_j\\
&0 && otherwise \\
\end{aligned}
\right.
$$


In [4]:
import numpy as np


def longest_common_substring(string1, string2):
    """Return the length of the longest common substring of two strings."""
    n = len(string1)
    m = len(string2)

    # We need an n x m table. (Also, our algorithm will run in O(nm) time.)
    table = np.zeros((n, m))

    # Each table entry (for example, table[i,j]) represents the length of the longest common suffix
    # (LCSuf) for string1[0,i] and string2[0,j]. So, for example, "MMA" and "MMAM" have a LCSuf of 0,
    # but "MMA" and "GAMMA" have an LCSuf of 3.

    for i in range(n):
        for j in range(m):
            if X[i] == Y[j]:
                # Substrings of length 1
                if i == 0 or j == 0:
                    table[i][j] = 1
                # Bigger substrings
                else:
                    table[i][j] = table[i-1][j-1] + 1
            else:
                table[i][j] = 0

    print("Table T: \n" + str(table))
    return np.max(table)


# Our two strings to compare
S = "GAMMAMUOMICRON"
R = "BETAGAMMAPHI"

print("First string:\t%s" % X)
print("Second string:\t%s" % Y)
print("Length of longest common substring: %d" % longest_common_substring(X, Y))

First string:	GAMMAMUOMICRON
Second string:	BETAGAMMAPHI
Table T: 
[[ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  2.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  3.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  4.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  1.  0.  0.  5.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  2.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]
Length of longest common substring: 5


### Step 4: Analyze runtime complexity ###

Our algorithm above contains a nested for loop that compares each character of $S$ to each character of $R$. 

$n$ is the length of $S$, and $m$ the length of $R$. The runtime complexity is:

$$O(nm)$$