Write a greedy (approximate) solution to the shortest superstring problem called shortest_superstring. This function should take a set of strings as input and return a superstring containing all strings in the input set. The superstring produced by your function should be the one generated by the greedy approach described in the article: iteratively merge the two strings having the greatest overlap. You may assume that no string in the input set is a substring of another.
I recommend writing a helper function that takes two strings string1 and string2 as input and determines their overlap (assuming string1 to be on the left and string2 to be on the right). String slicingLinks to an external site. will be helpful in accomplishing this. Make sure to check all possible lengths of overlaps: the check might fail for overlap 1 and 2 but succeed for 3.
It will also be useful to write a function that takes as input string1, string2, and overlap and returns the merged string. String slicingLinks to an external site. and concatenationLinks to an external site. will be helpful for this.
To determine which two strings have the largest overlap, you will need to initially search the space of all ordered pairs of strings (requiring a double for loop). Consider storing this information in a list of triples (string1, string2, overlap) and sorting it by overlap.
When you merge two strings, you can remove all occurrences of each of them from the list. You will then need to compute the overlap of the new merged string with all the other remaining strings and update your list of triples. For this reason, it is probably helpful to keep a list of just the strings (with no overlap information) in addition to the list of triples (string1, string2, overlap).
Export to HTML a notebook containing your pseudocode, code, and the output of the following test cases:
shortest_superstring({'ABCDEF', 'BCDEFX', 'XCDEFY', 'DEFYZ'}) returns 'ABCDEFXCDEFYZ'
shortest_superstring({'UV', 'VW', 'XY', 'YZ'}) either 'UVWXYZ' or 'XYZUVW' (both are correct)

function shortest_superstring(strings):
    while |strings| > 1:
        max_overlap = -1
        best_pair = (None, None)
        for each pair (s1, s2) in strings:
            overlap = find_overlap(s1, s2)
            if overlap > max_overlap:
                max_overlap = overlap
                best_pair = (s1, s2)
        merged = merge_strings(best_pair[0], best_pair[1], max_overlap)
        remove best_pair[0] and best_pair[1] from strings
        add merged to strings
    return the single remaining string
function find_overlap(s1, s2):
    max_possible = min(len(s1), len(s2))
    for overlap_len from max_possible down to 1:
        if s1[-overlap_len:] == s2[:overlap_len]:
            return overlap_len
    return 0
function merge_strings(s1, s2, overlap):
    return s1 + s2[overlap:]

In [1]:
def find_overlap(string1, string2):
    """
    find the maximum overlap between string1 (left) and string2 (right).
    returns the length of the overlap.
    """
    max_possible = min(len(string1), len(string2))
    # check from largest possible overlap down to 1
    for overlap_len in range(max_possible, 0, -1):
        if string1[-overlap_len:] == string2[:overlap_len]:
            return overlap_len
    return 0

In [2]:
def merge_strings(string1, string2, overlap):
    """
    merge two strings given the overlap length.
    """
    return string1 + string2[overlap:]

In [3]:
def shortest_superstring(strings):
    """
    iteratively merges the two strings with maximum overlap.
    """
    # convert set to list for mutability
    strings = list(strings)
    while len(strings) > 1:
        max_overlap = -1
        best_i, best_j = -1, -1
        # find the pair with maximum overlap
        for i in range(len(strings)):
            for j in range(len(strings)):
                if i != j:
                    overlap = find_overlap(strings[i], strings[j])
                    if overlap > max_overlap:
                        max_overlap = overlap
                        best_i, best_j = i, j
        # merge best pair
        merged = merge_strings(strings[best_i], strings[best_j], max_overlap)
        # remove merged strings (remove larger index first to avoid index issues)
        if best_i > best_j:
            strings.pop(best_i)
            strings.pop(best_j)
        else:
            strings.pop(best_j)
            strings.pop(best_i)
        # add merged string
        strings.append(merged)
    return strings[0]

In [6]:
shortest_superstring({'ABCDEF', 'BCDEFX', 'XCDEFY', 'DEFYZ'})

'ABCDEFXCDEFYZ'

In [5]:
shortest_superstring({'UV', 'VW', 'XY', 'YZ'})

'XYZUVW'