# Chapter 7: Combinatorial Search and Heuristic Methods (Completed 5/19: 26%)

## Backtracking

Below is a general purpose Backtracking routine, adapted into Python from the book's C implementation, that will be used in some of the exercises.

In [1]:
def backtrack(A, k, **data):
    if is_a_solution(A, k, **data):
        process_solution(A,k, **data)
    else:
        k += 1
        candidates = construct_candidates(A,k, **data)
        for i, _ in enumerate(candidates):
            A.append(candidates[i])
            make_move(A, k, **data)
            backtrack(A, k, **data)
            unmake_move(A, k, **data)
            A.pop()

### 7.1 [3]

A *derangement* is a permutation $p$ of $\{1, . . . , n\}$ such that no item is in its proper position, i.e. $p_i \neq i$ for all $1 \leq i \leq n$. Write an efficient backtracking program with pruning that constructs all the derangements of $n$ items.

*Solution:*

For simplicity, the program computes the derangements of numbers $\{0, . . . , n-1\}$.

A closed form expression for the number of derangements, notated as $!n$, from [Wikipedia](http://www.wikiwand.com/en/Derangement), is given by:

$$ !n = \left[ \frac{n!}{e} \right] = \left \lfloor \frac{n!}{e} + \frac{1}{2} \right \rfloor$$

where $[x]$ is the nearest integer function. Therefore this number grows proportionally with $n!$, the number of permutations.

This problem is very similar to constructing permutations, a partial solution to which is provided in the text. However, we must ensure that we don't insert number $i$ at position $i$. Therefore we include the check `(i != k)` when screening potential candidates in the `construct_candidates` function. This effectively prunes partial solutions that are doomed to fail, rather than putting such a check in the `is_a_solution`, which would require a linear scan through all $!n$ solutions checking if `A[i] == i`. This indeed would be exhaustive.

In [2]:
## Problem-specific functions  
def construct_candidates(A, k, n, V, m):
    candidates = []
    for i in range(n):
        if (V[i] == False) and (i != k):
            candidates.append(i)
    return candidates

def make_move(A, k, n, V, m):
    V[A[k]] = True
    
def unmake_move(A, k, n, V, m):
    V[A[k]] = False

def is_a_solution(A, k, n, V, m):
    return k == n - 1

def process_solution(A, k, n, V, m):
    print(A)
    m[0] += 1
    print("Solution number:", m[0])
    
def generate_derangements(n):
    A = []
    V = []
    m = [0]   #number of solutions
    
    for i in range(n):
        V.append(False)
        
    backtrack(A, -1, n = n, V = V, m = m)

generate_derangements(4)

#No number is in the position equal to its value

[1, 0, 3, 2]
Solution number: 1
[1, 2, 3, 0]
Solution number: 2
[1, 3, 0, 2]
Solution number: 3
[2, 0, 3, 1]
Solution number: 4
[2, 3, 0, 1]
Solution number: 5
[2, 3, 1, 0]
Solution number: 6
[3, 0, 1, 2]
Solution number: 7
[3, 2, 0, 1]
Solution number: 8
[3, 2, 1, 0]
Solution number: 9


### 7.2 [4]

*Multisets* are allowed to have repeated elements. A multiset of $n$ items may thus have fewer than $n!$ distinct permutations. For example, $\{1, 1, 2, 2\}$ has only six different permutations: $\{1, 1, 2, 2\}$, $\{1, 2, 1, 2\}$, $\{1, 2, 2, 1\}$, $\{2, 1, 1, 2\}$, $\{2, 1, 2, 1\}$, and $\{2, 2, 1, 1\}$. Design and implement an efficient algorithm for constructing all
permutations of a multiset.

*Solution:*

I originally tried to solve this problem by enumerating permutations of the index and pruning them appropriately. (Skip this paragaph if you want.) For example, $S = \{1, 1, 2, 2\}$ contains 4 numbers, so permutations of the numbers 1 through 4 will correspond to permutations of $S$. The only problem is that some of these index permutations will correspond to identical set permutations, due to the repeated elements. For example, both index permutations $\{1, 3, 2, 4\}$ and $\{1, 4, 2, 3 \}$ produce the set permutation $\{1, 2, 1, 2\}$. However, we may note that only one of these satisfies the condition that indices corresponding to identical set elements are in sorted order. Specifically, both indices $3$ and $4$ correspond to the number $2$ in the set $S$. So if we require that such numbers be in sorted order, then only one of these index permutations is valid, $\{1, 3, 2, 4\}$. However, I was unable to encode this without using a "stack" data structure. Specifically, for set-values already included, a second auxilliary data structure would be needed to record what index was used most recently that corresponds to this value. (The first data structure is a bit vector for included indices.) For example, when adding index $3$ to the partial solution $\{1, 4, 2\}$, we would need check that $S[3] = 2$ was already included using index $4$, and since $3 < 4$ this would break our requirement that indices that correspond to equal set values be in sorted order. The problem is how to implement the `unmake_move` function when there are more than 2 identical elements. For example, if there are 5 3's in the set $S$, then our second auxiliary data structure would essentially have to record the order in which these 5 different indices were added. This could be done with a stack. However, I was able to find a more elementary and direct solution to the problem.

To solve this problem, we can compute a histogram-like structure that tells us how many more times we can include a given value. For example, say you have 3 unique numbers, like $S = [1, 1, 2, 2, 2, 7, 7]$. We can turn this into two arrays $H = [2, 3, 2]$ and $I = [1, 2, 7]$, where the first tells you how many of each number you have, and the second is what the number is. Together, these form a sort of histogram, with $I$ the x values and $H$ the y values.

At each iteration in the backtracking algorithm, the potential candidates for the next position in our solution vector $A$ are any index from $0$ to $m-1$ such that $H[i] \neq 0$, where $m$ is the number of unique values in $S$. When a number $i$ is added to $A$, we decrement $H[i]$ by one to indicate that we have one less of that number available for future positions. To undo this, we increment $H[i]$ by one.

Interestingly, although this solution is conceptually different from how the permutation problem is treated, it may in fact be a generalization. The permutation problem uses a bit vector to indicate what indices from $1$ to $n$ have already been included. This is analogous to our $H$ array, only in the permutation problem we know that each number appears exactly once, and so True/False values can be used instead of 1's and 0's. Also, there is no need for an $I$ array in the permutation case because the value and the index are equal.

In [3]:
## Problem-specific functions  
def construct_candidates(A, k, n, H, I, m):
    candidates = []
    for i, _ in enumerate(I):
        if H[i] != 0:
            candidates.append(i)       
    return candidates

def make_move(A, k, n, H, I, m):
    H[A[k]] -= 1
    
def unmake_move(A, k, n, H, I, m):
    H[A[k]] += 1

def is_a_solution(A, k, n, H, I, m):
    return k == n - 1

def process_solution(A, k, n, H, I, m):
    m[0] += 1
    Solution = []
    for i, _ in enumerate(A):
        Solution.append(I[A[i]])
        
    print("Solution number:", m[0])
    print('Permutation;', Solution)
    print()
 
def construct_histogram(S):
    H = []
    I = []
    
    for i, _ in enumerate(S):
        if i == 0 or S[i] > S[i - 1]: #New number
            I.append(S[i])
            H.append(1)
        else:
            H[-1] += 1
    return H, I

def generate_multiset_permutations(S):
    S.sort()
    A = []
    H, I = construct_histogram(S)
    m = [0]     #number of solutions
    n = len(S)

    backtrack(A, -1, n = n, H = H, I = I, m = m)

S = [1, 1, 2, 2]
print('The set:', S, '\n')
generate_multiset_permutations(S)

# Possible Multisets from the given example:
#{1,1,2,2}
#{1,2,1,2}
#{1,2,2,1}
#{2,1,1,2}
#{2,1,2,1}
#{2,2,1,1}

The set: [1, 1, 2, 2] 

Solution number: 1
Permutation; [1, 1, 2, 2]

Solution number: 2
Permutation; [1, 2, 1, 2]

Solution number: 3
Permutation; [1, 2, 2, 1]

Solution number: 4
Permutation; [2, 1, 1, 2]

Solution number: 5
Permutation; [2, 1, 2, 1]

Solution number: 6
Permutation; [2, 2, 1, 1]



---

## Combinatorial Optimization

---

## Interview Problems

For this section I will not use the general purpose backtracking algorithm at the beginning of this notebook, but rather will write the algorithms from scratch as if I was in an "interview".

### 7.14 [4]

Write a function to find all permutations of the letters in a particular string.

*Solution:*

If we may use Python's dictionary structure, we can convert the string into a dictionary mapping each letter to the number of times it appears in the string. Think of this as a histogram. We can then use a backtracking algorithm to sucessively build up solution strings by taking letters off the histogram.

In [4]:
def string_to_histogram(s):
    D = {}
    for c in s:
        if c in D:
            D[c] += 1
        else:
            D[c] = 1
    return D

def string_backtrack(A, k, D, n, m):
    if k == n - 1:
        m[0] += 1
        #print('Solution', m[0])  #Uncomment to count solutions
        print(''.join(A))
    else:
        k += 1
        candidates = construct_candidates(D)
        for c in candidates:
            A.append(c)
            D[c] -= 1
            string_backtrack(A, k, D, n, m)
            D[c] += 1
            A.pop()

def construct_candidates(D):
    candidates = []
    for s in D:
        if D[s] != 0:
            candidates.append(s)
    return candidates
            
def string_permutations(s):
    A = []
    k = -1
    D = string_to_histogram(s)
    n = len(s)
    m = [0]
     
    string_backtrack(A, k, D, n, m)

string_permutations('ello')

ello
elol
eoll
lelo
leol
lleo
lloe
loel
lole
oell
olel
olle


### 7.15 [4]

Implement an efficient algorithm for listing all $k$-element subsets of $n$ items.

*Solution:*

I assume that all $n$ items are distinguishable. We can then number the items from $0$ to $n-1$ and enumerate all subsets of size $k$ of these numbers. We can do this using a backtracking algorithm. An array $A$ will hold our partial solutions, and we'll use a bit vector to record which items have already been included. Candidate values to add next are those values which haven't yet been included, which is a constant time lookup with the bit vector.

We also require that the element being added be greater than the previous element. For example, of all the permutations that correspond to the size-3 subset $\{1, 2, 4\}$, the rest of them will at some point have broken the rule that the additional element was lower than the previous one. So this condition eliminates any equivalent permutations of the same subset showing up, and therefore significantly prunes our search.

NOTE: A elementary but still recursive function for doing this was done as a subproblem for problem 4.10. It was quite challenging, so it is remarkable that using backtracking the problem becomes somewhat easy.

In [5]:
# Note that j plays the role of k
# compared to most of the other
# backtracking implementations here

def k_subset_backtrack(A, j, n, k, V):
    if j == k - 1:
        print(A)
    else:
        j += 1
        candidates = construct_candidates(A, j, n, k, V)
        for c in candidates:
            A.append(c)
            V[c] = True
            k_subset_backtrack(A, j, n, k, V)
            V[c] = False
            A.pop()

def construct_candidates(A, j, n, k, V):
    candidates = []
    for i, v in enumerate(V):
        if v == False and ((j == 0) or (A[j-1] < i)):
            candidates.append(i)
    return candidates
            
def k_subsets(n, k):
    A = []
    V = []
    j = -1
    
    for i in range(n): #Initialize bit-vector
        V.append(False)
        
    k_subset_backtrack(A, j, n, k, V)
    
k_subsets(5, 3)    

[0, 1, 2]
[0, 1, 3]
[0, 1, 4]
[0, 2, 3]
[0, 2, 4]
[0, 3, 4]
[1, 2, 3]
[1, 2, 4]
[1, 3, 4]
[2, 3, 4]


---

### 7.16 [5]

An anagram is a rearrangement of the letters in a given string into a sequence of dictionary words, like *Steven Skiena* into *Vainest Knees*. Propose an algorithm to construct all the anagrams of a given string.

*Solution:*

This problem is similar to enumerating all the perumutations of a set of $n$ numbers, where $n$ is the length of the string, but for 3 complications. Strings are multisets, meaning they can have repeated elements. Permuting equal letters doesn't change a word, so unless we accound for this, the same anagram will appear multiple times. The second complication is that anagrams are free to include differing amounts of words separated by spaces, as long as they contain the same set of letters.

The third complication is the requirement that the resulting strings be dictionary words.

To construct the anagrams, we will convert the original string into a histogam-like structure that will track how many more times we may use each letter. "Space" will also be an option unless the previously added character was also a "space". We will use a backtracking algorithm to recursively extend partial string vectors $A$ one character at a time, using the histogram to track what characters remain available for insertion.

To enforce the requirement that the words be found in the dictionary, each time we add a "space" we will check if the word is contained in a list of dictionary words, found in `'Other/Dictionary_words.txt'`. The `is_word()` function makes use of the `bisect` module to implement a binary search. When we have found a potential solution, we use the `is_word()` function again to check if the last word in the solution is valid.

The program implemented below works, but takes about a minute to run on the string 'steven skien', and is too slow to run on 'steven skiena'.

One important area where the program can be improved is the `is_word()` function. Currently a binary search is performed on a list of valid words that originally was stored at `/usr/share/dict/words` on my computer. However, this list **does not contain plurals** even though it contains 235886 words. I found a list of dictionary words online that contained 354986 words, but many of them did not seem to be valid words, like "wl", and many other two-letter combinations. For example, the word "pillow" produces 33 or 2498 different anagrams depending on which of these two dictionaries is used. I opted for the smaller one.

Instead of manually performing a binary search on a list of dictionary words, I could have used a specialized spell checker library that would probably provide better and faster results, but I am not sure if I can "import" modules in an Ipython Notebook hosted on GitHub.

In [6]:
import bisect

# Index function code from
# bisect module page
def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect.bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return True
    else:
        return False

def is_word(x):
    return index(dictionary_words, x)

with open('Other/Dictionary_words.txt', 'r') as file:
    dictionary_words = [line.rstrip() for line in file]
dictionary_words.sort()

print(len(dictionary_words))
is_word('all')

235886


True

In [7]:
# A = partial solution string
# D = dictionary "histogram" of remaining letters
# m = number of letters currently in solution string
# n = number of letters in original string
# sol = solution number

def anagram_backtracking(A, D, m, n, sol):
    if m == n and last_word_in_dict(A, D, m, n, sol):
        sol[0] += 1
        print('Anagram %s:' % sol[0], ''.join(A))
    else:
        candidates = construct_candidates_anagram(A, D, m, n, sol)
        for c in candidates:
            A.append(c)
            if c != ' ':
                D[c] -= 1
                m += 1
            anagram_backtracking(A, D, m, n, sol)
            if c != ' ':
                D[c] += 1
                m -= 1
            A.pop()

def construct_candidates_anagram(A, D, m, n, sol):
    candidates = []
    for c, h in D.items():
        if h != 0:
            candidates.append(c)    
    if m != 0 and A[-1] != ' ': # Consider 'space'
        if last_word_in_dict(A, D, m, n, sol):
            candidates.append(' ')
    return candidates

def last_word_in_dict(A, D, m, n, sol):
    s = 0 # Position of previous 'space'
    for i in range(len(A)):
        if A[-(i+1)] == ' ':
            s = i
            break
    word = ''.join(A[-s:])
    if len(word) == 1:
        return (word == 'a' or word == 'i')
    else:
        #return word in dictionary_words
        return is_word(word)
         
def construct_histogram(s):
    D = {}
    for c in s:
        if c == ' ': # Ignore spaces
            continue
        if c in D:
            D[c] += 1
        else:
            D[c] = 1
    return D

def find_anagrams(s):
    s = s.lower().replace(' ', '')
    print(s)
    A = []
    D = construct_histogram(s)
    m = 0
    n = len(s)
    sol = [0]
    print(D)
    anagram_backtracking(A, D, m, n, sol)

%time find_anagrams('pillows')

pillows
{'l': 2, 'i': 1, 'p': 1, 'o': 1, 's': 1, 'w': 1}
Anagram 1: lip slow
Anagram 2: lip sowl
Anagram 3: lisp low
Anagram 4: lisp owl
Anagram 5: lis plow
Anagram 6: low lisp
Anagram 7: low slip
Anagram 8: ill wops
Anagram 9: plow lis
Anagram 10: plow sil
Anagram 11: pill sow
Anagram 12: poll wis
Anagram 13: pow sill
Anagram 14: po swill
Anagram 15: owl lisp
Anagram 16: owl slip
Anagram 17: ow spill
Anagram 18: slip low
Anagram 19: slip owl
Anagram 20: slow lip
Anagram 21: sill pow
Anagram 22: sill wop
Anagram 23: sil plow
Anagram 24: spill ow
Anagram 25: spill wo
Anagram 26: sop will
Anagram 27: sowl lip
Anagram 28: sow pill
Anagram 29: swill po
Anagram 30: will sop
Anagram 31: wis poll
Anagram 32: wops ill
Anagram 33: wop sill
Anagram 34: wo spill
CPU times: user 101 ms, sys: 2.35 ms, total: 103 ms
Wall time: 104 ms


## 7.17 [5] Unfinished

Telephone keypads have letters on each numerical key. Write a program that generates all possible words resulting from translating a given digit sequence (e.g., 145345) into letters.

*Solution:*

In [8]:
keypad = {1 : ['a', 'b', 'c'],
          2 : ['d', 'e', 'f']}

def keypad_backtracking(A, n, D, m):
    if n == m:
        print(''.join(A))
    else:
        candidates = keypad_construct_candidates(A, n, D, m)
        for c in candidates:
            A.append(c)
            keypad_backtracking(A, n, D, m)
            A.pop()
            
#def keypad_construct_candidates(A, n, D, m):
    
    