### Find Longest Subsequence

Given a string S and a set of words D, find the longest word in D that is a subsequence of S.

Word W is a subsequence of S if some number of characters, possibly zero, can be deleted from S to form W, without reordering the remaining characters.

Note: D can appear in any format (list, hash table, prefix tree, etc.

For example, given the input of S = "abppplee" and D = {"able", "ale", "apple", "bale", "kangaroo"} the correct output would be "apple"

The words "able" and "ale" are both subsequences of S, but they are shorter than "apple".

The word "bale" is not a subsequence of S because even though S has all the right letters, they are not in the right order.

The word "kangaroo" is the longest word in D, but it isn't a subsequence of S.
Simple string operations.

See [The Google example problem](https://techdevguide.withgoogle.com/resources/find-longest-word-in-dictionary-that-subsequence-of-given-string/#!)

In [32]:
debugging = False
debugging = True

logging = True

def dbg(f, *args):
    if debugging:
        print(('  DBG:' + f).format(*args))

def log(f, *args):
    if logging:
        print((f).format(*args))
        
def logError(f, *args):
    if logging:
        print(('*** ERROR:' + f).format(*args))
        
def className(instance):
    return type(instance).__name__

In [33]:
class TestCase(object):
    def __init__(self, name, method, inputs, expected):
        self.name = name
        self.method = method
        self.inputs = inputs
        self.expected = expected
        
    def run(self):
        return self.method(*self.inputs)

In [34]:
import time
from datetime import timedelta

class TestSet(object):
    def __init__(self, cases):
        self.cases = cases
    
    def run_tests(self):
        count = 0
        errors = 0
        total_time = 0
        for case in self.cases:
            count += 1
            start_time = time.time()
            result = case.run()
            elapsed_time = time.time() - start_time
            total_time += elapsed_time
            if callable(case.expected):
                if not case.expected(result):
                    errors += 1
                    logError("Test {0} failed. Returned {1}", case.name, result)
            elif result != case.expected:
                errors += 1
                logError('Test {0} failed. Returned "{1}", expected "{2}"', case.name, result, case.expected)
        if errors:
            logError("Tests passed: {0}; Failures {1}", count-errors, errors)
        else:
            log("All {0} tests passed.", count)
        log("Elapsed test time: {0}", timedelta(seconds=total_time))

In [79]:
def is_subsequence(w, s):
    """ True if w 'abc' is a valid subsequence of s 'xaabbyyz' """
    if len(w) == 0:
        return True
    if len(s) < len(w):
        return False
    dbg('w="{0}", s="{1}"', w, s)
    c = w[0]
    offset = s.find(c)
    while offset >= 0:
        if is_subsequence(w[1:], s[offset+1:]):
            return True
        s = s[offset+1:] # skip this instance.
        offset = s.find(c)

    return False

def longest_subsequence(sourceChars, dict):
    """ Find the longest word in dict that s a subsequence of sourceWord. """
    sortedDict = sorted(dict, key = lambda x: -len(x))
    for w in sortedDict:
        if is_subsequence(w, sourceChars):
            return w
    return ""


In [87]:
import collections
import sys
def find_longest_word_in_string(letters, words):
    letter_positions = collections.defaultdict(list)
    # For each letter in 'letters', collect all the indices at which it appears.
    # O(#letters) space and speed.
    for index, letter in enumerate(letters):
        letter_positions[letter].append(index)
    # For words, in descending order by length...
    # Bails out early on first matched word, and within word on
    # impossible letter/position combinations, but worst case is
    # O(#words # avg-len) * O(#letters / 26) time; constant space.
    # With some work, could be O(#W * avg-len) * log2(#letters/26)
    # But since binary search has more overhead
    # than simple iteration, log2(#letters) is about as 
    # expensive as simple iterations as long as 
    # the length of the arrays for each letter is
    # “small”.  If letters are randomly present in the
    # search string, the log2 is about equal in speed to simple traversal
    # up to lengths of a few hundred characters.              
    for word in sorted(words, key=lambda w: len(w), reverse=True):
        pos = 0
        for letter in word:
            if letter not in letter_positions:
                break
        # Find any remaining valid positions in search string where this
        # letter appears.  It would be better to do this with binary search,
        # but this is very Python-ic.
        possible_positions = [p for p in letter_positions[letter] if p >= pos]
        if not possible_positions:
            break
        else:
            pos = possible_positions[0] + 1
            # We didn't break out of the loop, so all letters have valid positions  
            return word

In [94]:
simplecompare = lambda s1, s2: longest_subsequence(s1, s2)
examplecompare = lambda letters, words: find_longest_word_in_string(letters, words)

dict = [ "ale", "apple", "bale", "kangaroo"]
c1 = TestCase('Provided example', 
              examplecompare,
              [ "abppplee", dict ],
              "apple")
c2 = TestCase('Provided example', 
              simplecompare,
              [ "apple", dict ],
              "apple")

tester = TestSet([c1, c2])

In [95]:
tester.run_tests()

*** ERROR:Test Provided example failed. Returned "None", expected "apple"
w="apple", s="apple"
w="pple", s="pple"
w="ple", s="ple"
w="le", s="le"
w="e", s="e"
*** ERROR:Tests passed: 1; Failures 1
Elapsed test time: 0:00:00.001072


In [51]:
a = ["a", "this is a long one", "able", "ale", "apple", "bale", "kangaroo"]
sorted(a, key = lambda x: -len(x))

['this is a long one', 'kangaroo', 'apple', 'able', 'bale', 'ale', 'a']

In [63]:
"This is a test".find('z')

-1

In [86]:
import collections
[collections.defaultdict(list)]

[defaultdict(list, {})]