# Summary
At each character, if it's a match, then the solution is 1 plus a subproblem of the rest of the characters.

## Time Complexity
$O(m \cdot n)$ because we have to iterate over an entirety of a 2D matrix that's the same size as the lengths of the two strings multiplied together.

## Space Complexity
$O(m \cdot n)$ because we need a 2D matrix to store our intermediary calculations.

In [7]:
class Solution:
    def longestCommonSubsequence(self, text1: str, text2: str) -> int:
        array = [[0] * (len(text2) + 1) for _ in range(len(text1) + 1)]
        
        for i in range(len(text1) - 1, -1, -1):
            for j in range(len(text2) -1, -1, -1):
                if text1[i] == text2[j]:
                    array[i][j] = array[i + 1][j + 1] + 1
                else:
                    array[i][j] = max(array[i + 1][j], array[i][j + 1])
        
        return array[0][0]

# Summary - Memoization

We can break this problem down as sub-problems.

If we are able to find a match to characters, then the answer is 1 + the sub-problem of two substrings starting from the matched character to the ends of the strings.

Alternatively, this character from text1 might not need to be in the solution (of the longest subsequence), then the sub problem would be text1 excluding this character to the rest of the strings, with the entirety of all characters from text2

We can recursively call the function itself, until we reach the end of text1 or text2, either case could end the recursive stack call and return 0.

And along the way, there will be repeated subproblem calls, so we should cache the results along the way to save processing time.

For example, let's say text1 has a length of 7 and text2 has a length of 9.

Then the subproblem of match from character 4th to 7th and 2nd to 9th could be reached as a repeated subproblem within the call to 2nd to 7th and 3rd to 9th.


## Time Complexity
$O(m \cdot n^2)$ because at each character from text1, we may have to run all n possible substrings of text2 as subproblems. So that gets us $O(m \cdot n)$.

But then at each subproblem operation, we search through the entirety of text2 to find a potential character match. This will take another $n$ operation, so the total is $O(m \cdot n^2)$

Note that this is an upper bound, because specifically how many subproblems get explored depend on the specific strings.

For example, if the first character in the first string simply doesn't exist in the second string, then we can skip the entirety of all the subproblems of the inclusion case, which knocks out 1 count of $n$ iterations.

## Space Complexity

$O(m \cdot n)$ for storing all m times n possible cache.

The recursive stack can at most run through the entire length of text1 before hitting the base case to return 0. So the recursive stack call memory complexity is only $O(m)$, which is smaller than $O(m \cdot n)$

In [None]:
class Solution:
    def longestCommonSubsequence(self, text1: str, text2: str) -> int:
        cache = {}

        def helper(p1, p2):
            if p1 >= len(text1) or p2 >= len(text2):
                return 0
            
            if (p1, p2) in cache:
                return cache[(p1, p2)]

            else:
                exclusion_case = helper(p1 + 1, p2)
                
                match_index = -1
                match_char = text1[p1]
                for i in range(p2, len(text2)):
                    if match_char == text2[i]:
                        match_index = i
                        break
                inclusion_case = 0 if match_index == -1 else 1 + helper(p1 + 1, match_index + 1)
                cache[(p1, p2)] = max(
                    exclusion_case,
                    inclusion_case
                )
                return cache[(p1, p2)]
        return helper(0, 0)

# Summary - Memoization, Improved

We can break this problem down as sub-problems.

If the current two characters match, then there's no harm in just including it, because we are scanning from left to right, there is no possible way where we would be "crossing over" from right to left in future character matches.

Then the remaining subproblem is simply the substrings excluding this matched characters entered back into this common subsequence question.

If they don't match, then we only have two choices:

1. We either don't include the current character in text1, and search through the remaining text1 with the current text2

2. We don't include the current character in text2, and search through the remaining text2 with the current text1

We simply compare whichever gives us a bigger answer, and that's the max subsequence length at the current p1 and p2 positions.

## Time Complexity
$O(m \cdot n)$ because now at each character in text1, we at most need ot iterate over all characters in text2 and that will complete all subproblems corresponding to this first character in text1. Then we repeat the process for all characters in text1, so the time complexity is $O(m \cdot n)$. We got rid of the need to search through the second string so saves us from $O(m \cdot n^2)$.

## Space Complexity

$O(m \cdot n)$ for storing all m times n possible cache.

The recursive stack can at most run through the entire length of text1 before hitting the base case to return 0. So the recursive stack call memory complexity is only $O(m)$, which is smaller than $O(m \cdot n)$

In [None]:
class Solution:
    def longestCommonSubsequence(self, text1: str, text2: str) -> int:
        cache = {}

        def helper(p1, p2):
            if p1 >= len(text1) or p2 >= len(text2):
                return 0
            
            if (p1, p2) in cache:
                return cache[(p1, p2)]

            if text1[p1] == text2[p2]:
                cache[(p1, p2)] = 1 + helper(p1 + 1, p2 + 1)
            else:
                cache[(p1, p2)] = max(
                    helper(p1 + 1, p2),
                    helper(p1, p2 + 1)
                )
            return cache[(p1, p2)]
        return helper(0, 0)

# Summary Brute Force

We can find all the possible subsequences of a string, then find the overlap that is of the longest length.

First is the string itself, then delete one character, then delete any two characters, then delete any three characters, and so on. So the time complexity would be $O(n^2)$

In [4]:
class SolutionBruteForce:
    def longestCommonSubsequence(self, text1: str, text2: str) -> int:
        def overlap_finder(text1, text2):
            short_text = text1 if len(text1) <= len(text2) else text2
            long_text = text1 if len(text1) > len(text2) else text2

            # then iterate from short text character by character,
            # once a match is found, increment counter by 1, and 
            # move onto the next character, and start counting from
            # this index in the long_text
            
            
            idx_long = 0
            counter = 0
            for c1 in short_text:
                for i in range(idx_long, len(long_text)):
                    c2 = long_text[i]
                    if c1 == c2:
                        # print(f"A match {c1} == {c2} at index {i}")
                        counter += 1
                        idx_long = i + 1
                        break
            return counter

        short_text = text1 if len(text1) <= len(text2) else text2
        long_text = text1 if len(text1) > len(text2) else text2

        counter = 0
        for i in range(len(short_text)):
            counter = max(counter, overlap_finder(short_text[i:], long_text))
        return counter

In [5]:
text1 = "mhunuzqrkzsnidwbun"
text2 = "szulspmhwpazoxijwbq"

In [6]:
s = SolutionBruteForce()
s.longestCommonSubsequence(text1, text2)

5

In [24]:
len(text1)

18

In [None]:
# first find out which string is the shorter one
# then test if the full thing can fit

# if not, move to the next - if we delete one string anywhere, can that fit?

# if not, then we delete two characters is that a subsequence

# 