# Longest Repeated Subsequence - LRS

We want to find the longest subsequences of a string that occurs at least twice. It is a variation of Longest Common Subsequence (LCS), where we call LCS(X, X) excluding the cases when index are same ($X[i] = Y[j]$).


For example, if we consider the sequence ATACTCGGA, the LRS is 4 (ATCG).

- <span style="color:red">A T</span> A <span style="color:red">C</span> T C <span style="color:red">G</span> G A
- A T <span style="color:red">A</span> C <span style="color:red">T C</span> G <span style="color:red">G</span> A

Note that repeated characters holds different index in the input string.

## Recurrence equation

$LRS[i, j] = \left\{\begin{matrix}
0 & \text{if } i = 0 \text{ and } j = 0 \\ 
LRS[i-1, j-1] + 1 & \text{if } X[i] = X[j] \text{ and } i \ne j\\ 
\max \left (LRS[i-1, j], LRS[i, j-1]  \right ) & \text{if } X[i] \ne X[j]
\end{matrix}\right.$


In [1]:
# recursive solution
def LRS_R(X: str, m: int, n: int):
    # return if we have reached the end of either string
    if m == 0 or n == 0:
        return 0

    # if characters at index m and n matches and index is different
    if X[m - 1] == X[n - 1] and m != n:
        return LRS_R(X, m - 1, n - 1) + 1

    # else if characters at index m and n don't match
    return max (LRS_R(X, m, n - 1), LRS_R(X, m - 1, n))

In [2]:
def LRS_Recursive(X: str):
    return LRS_R(X, len(X), len(X))

In [3]:
LRS_Recursive("ATACTCGGA")

4

In [4]:
def LRS(X: str, verbose=False):
    # memoization
    # lookup table stores solution to already computed sub-problems
    n = len(X)
    
    # lookup[i][j] stores the length of LRS of substring X[0..i-1] and X[0..j-1]
    # first column of the lookup table will be all 0
    # first row of the lookup table will be all 0
    lookup = [[0] * (n + 1) for i in range(n + 1)]

    # fill the lookup table in bottom-up manner
    for i in range(1, n + 1):
        for j in range(1, n + 1):
            if X[i - 1] == X[j - 1] and i != j:
                # chars at index i and j matches and i is different from j
                lookup[i][j] = lookup[i - 1][j - 1] + 1            
            else:
                # characters at index i and j are different
                lookup[i][j] = max(lookup[i - 1][j], lookup[i][j - 1])

    if verbose:
        # print the lookup matrix
        print(" " * 5, " ". join(["{:>2s}".format(v) for v in X]))
        for i in range(n + 1):
            print("{:>2s}".format(X[i-1] if i > 0 else ""),  " ". join(["{:2d}".format(v) for v in lookup[i]]))

    # LRS will be last entry in the lookup table
    return lookup[n][n]

In [5]:
LRS("ATACTCGGA", verbose=True)

       A  T  A  C  T  C  G  G  A
    0  0  0  0  0  0  0  0  0  0
 A  0  0  0  1  1  1  1  1  1  1
 T  0  0  0  1  1  2  2  2  2  2
 A  0  1  1  1  1  2  2  2  2  3
 C  0  1  1  1  1  2  3  3  3  3
 T  0  1  2  2  2  2  3  3  3  3
 C  0  1  2  2  3  3  3  3  3  3
 G  0  1  2  2  3  3  3  3  4  4
 G  0  1  2  2  3  3  3  4  4  4
 A  0  1  2  3  3  3  3  4  4  4


4