# Question 391

## Description

This problem was asked by Facebook.

We have some historical clickstream data gathered from our site anonymously using cookies. The histories contain URLs that users have visited in chronological order.

Write a function that takes two users' browsing histories as input and returns the longest contiguous sequence of URLs that appear in both.

For example, given the following two users' histories:

```python
user1 = ['/home', '/register', '/login', '/user', '/one', '/two']
user2 = ['/home', '/red', '/login', '/user', '/one', '/pink']
```

You should return the following:

`['/login', '/user', '/one']`


## Solution

To solve this problem, we can use a dynamic programming approach. The idea is to create a matrix where each cell `(i, j)` represents the longest contiguous sequence of URLs ending at `user1[i]` and `user2[j]`. If `user1[i]` is equal to `user2[j]`, we have found a common URL and can extend the longest sequence found so far. Otherwise, the length of the common sequence at that cell is 0.

Here's a step-by-step approach to implement this:

1. Initialize a matrix `dp` with dimensions `(len(user1) + 1) x (len(user2) + 1)`, where all elements are initialized to 0.
2. Iterate through each URL in `user1` and `user2`.
3. When a common URL is found (`user1[i] == user2[j]`), set `dp[i+1][j+1] = dp[i][j] + 1`.
4. Keep track of the maximum length of the common sequence and its ending index.
5. After completing the matrix, use the maximum length and the ending index to retrieve the longest contiguous sequence.

The function correctly finds the longest contiguous sequence of URLs that appear in both users' histories. For the given example, the longest common sequence is `['/login', '/user', '/one']`, which matches the expected result.


In [1]:
def longest_common_subsequence(user1, user2):
    # init a dp matrix
    dp = [[0 for _ in range(len(user2) + 1)] for _ in range(len(user1) + 1)]

    # variables to track the max len and its ending
    max_len = 0
    end_len = 0

    # filling the dp matrix
    for i in range(len(user1)):
        for j in range(len(user2)):
            if user1[i] == user2[j]:
                dp[i + 1][j + 1] = dp[i][j] + 1
                if dp[i + 1][j + 1] > max_len:
                    max_len = dp[i + 1][j + 1]
                    end_len = i

    # return the longest common subsequence
    return user1[end_len - max_len + 1 : end_len + 1]

In [2]:
user1 = ["/home", "/register", "/login", "/user", "/one", "/two"]
user2 = ["/home", "/red", "/login", "/user", "/one", "/pink"]

longest_common_subsequence(user1, user2)

['/login', '/user', '/one']

## Complexity Analysis

Let's analyze the time and space complexity of the `longest_common_subsequence` function:

### Time Complexity

1. **Initialization of the DP Matrix**: The DP matrix of size `(len(user1) + 1) x (len(user2) + 1)` is initialized with zeros. This step takes `O(m * n)` time, where `m` is the length of `user1` and `n` is the length of `user2`.

2. **Filling the DP Matrix**: The nested loop iterates through each element in `user1` and `user2`. For each pair of elements, it performs constant time operations. Therefore, this step also takes `O(m * n)` time.

Overall, the time complexity of the function is **O(m \* n)**, where `m` is the length of `user1` and `n` is the length of `user2`.

### Space Complexity

1. **DP Matrix**: The space complexity is dominated by the DP matrix, which is of size `(len(user1) + 1) x (len(user2) + 1)`. Thus, the space complexity is `O(m * n)`.

2. **Auxiliary Space**: The space used for variables like `max_length`, `end_index`, and the space for the output list (longest contiguous sequence) is relatively small compared to the DP matrix. The length of the output list can be at most `min(m, n)`, but this does not affect the overall space complexity.

Hence, the overall space complexity of the function is **O(m \* n)**.

In summary, the function has a time complexity of O(m _ n) and a space complexity of O(m _ n), where `m` is the length of `user1` and `n` is the length of `user2`. This makes the algorithm quite efficient for moderately sized input histories but can become resource-intensive for very large histories.
