# Diff

Diff compara duas sequencias de caracteres e mostra a diferença entre as duas, isto é, mostra como a primeira string pode ser transformada na segunda sequencia.

O resultado é uma sequencia que contem caracteres da string original que existem na segunda string, caracteres com + que são caracteres da segunda string que precisam ser adicionados e caracteres com - que são caracteres da primeira string que precisam ser removidos para gerar o resultado final (a segunda string).

The idea is to use the longest common subsequence to find a longest subsequence of characters that occurs on both strings in the same order, that way:
* if a character is in the subsequence, just keep it
* if a character is not in the subsequence and is in the second string, then it should be added to the result (+)
* if a character is not in the subsequence and is in the first string, then it should be removed (-)

In [3]:
def lcs(X: str, Y: str):
    """Function to fill lookup table by finding the length of LCS
    of substring X[0..m-1] and Y[0..n-1]"""
    m = len(X)
    n = len(Y)
    
    # first column and row of the lookup table will be all 0
    lookup = [[0] * (n + 1) for i in range(m + 1)]
        
    for i in range(1, m + 1):
        # fill the lookup table in bottom-up manner
        for j in range(1, n + 1):            
            if X[i - 1] == Y[j - 1]:
                # if current character of X and Y matches
                lookup[i][j] = lookup[i - 1][j - 1] + 1            
            else:
                # X and Y don't match
                lookup[i][j] = max(lookup[i - 1][j], lookup[i][j - 1])
    return lookup

In [23]:
def diff(X: str, Y: str, m, n, lookup):
    """O(mn) in time complexity and space"""
    if m > 0 and n > 0 and X[m - 1] == Y[n - 1]:
        # if last character of X and Y matches
        diff(X, Y, m - 1, n - 1, lookup)
        print(" ", X[m - 1], end="")
    elif n > 0 and (m == 0 or lookup[m][n - 1] >= lookup[m - 1][n]):
        # current character of Y is not present in X
        diff(X, Y, m, n - 1, lookup)
        print(" +{}".format(Y[n - 1]), end="")
    elif m > 0 and (n == 0 or lookup[m][n - 1] < lookup[m - 1][n]):
        # current character of X is not present in Y
        diff(X, Y, m - 1, n, lookup)
        print(" -{}".format(X[m - 1]), end="")

In [22]:
X = "ABCDFGHJQZ"
Y = "ABCDEFGIJKRXYZ"
diff(X, Y, m=len(X), n=len(Y), lookup=lcs(X, Y))

  A  B  C  D +E  F  G -H +I  J -Q +K +R +X +Y  Z