# 609 Assignment 1

The R packages $\texttt{partitions}$ and $\texttt{purrr}$ may need to be installed to run this notebook.

## Generic Backtracking Algorithm
### Explanaition of design approach
The backtrack algorithm uses higher order programming, as it takes as some of its inputs four functions, accept reject, first and next. This allows the backtrack algorithm itself to be generic and reusable for any backtracking problem, as the four functions taken as inputs can be tailored to a specific backtracking problem. In the function, the output is defined globally as a global variable. The output variable is defined to be an empty list, until $C$ is accepted as a solution by the accept function, at which point it is then assigned the solution $C$. There is an if statement which returns the output if it has length not equal to zero, i.e. when a solution has been found and assigned to the output variable. The Recall function is used to perform recursion in the backtrack algorithm. This aids reusability as it avoids any bugs which may occur due to the renaming of the backtrack function.
### Source code
General backtracking algorithm.

In [1]:
backtrack <- function(accept, reject, first, nextt, P, C) {
  output <<- list()
  if(reject(P, C)) {
    return(NULL)
  }
  if(accept(P, C)) {
    output <<- C
  }
  s <- first(P, C)
  while(is.null(s) != TRUE) {
    Recall(accept, reject, first, nextt, P, s)
    if(length(output) != 0) {
      return(output)
    }
    s <- nextt(P, s)
  }
}

## Integer Partition Problem
### How can problem be solved with backtracking?
The integer partition problem can be solved using backtracking, as the first and next functions can be used to discover whether the current list of partitions is missing any partitions of length $M$, and then generate a potential partition of length $M$ by sampling $M$ integers from a list of integers which potentially could be in a partition of this length. The reject function can then reject any of these potential partitions which do not in fact add to $n$ or are already in the list of partitions, and the accept function accepts the list of partitions if it has length equal to $P(n)$, the partition function which counts the number of unique integer partitions of $n$.
### Explanation of design approach
I chose to represent the partitions of $n$ by a list, where each entry in the list was a vector of integers which form a partition of $n$. Each recursion of the backtracking algorithm adds a new partition to the list. Since generating a list of all of the unique combinations of $M$ values, where the values are from a set of numbers which could plausibly be in a partition of $n$ of length $M$ is very computationally expensive for larger $n$, the first and next functions just randomly generate one such combination of values.

All previously tried potential partitions are stored in a list to avoid wasting computation on trying the same potential partition twice. Since this list is stored in a global variable, when the complete list of integer partitions is successfully found, this list is emptied to avoid issues when running the function again. 

Since a while loop which ran until an untried potential partition was found could potentially be computationally expensive once most of the potential partitions of a given length have already been tried, $reps$ potential partitions are generated at a time using the replicate function, and then are checked against the list of tried potential partitions. To be more efficient, this parameter could potentially be specified in terms of $n$, and partition length $M$, and even how many potential partitions of $n$ of length $M$ have already been tried, since when there are a large number of potential partitions but only a small number of potential partitions left untried, we want a large $reps$. However here this parameter is just set to 500.

The reject_partition function is a partial application of a more general reject function, which checks whether a list of conditions are true.
### Source code
#### Load libraries
Load $\texttt{partitions}$ library which has function for calculating the number of partitions of an integer $n$, and the number of partitions of $n$ of length $M$, and $\texttt{purrr}$, which has a function for partial application.

In [39]:
library(partitions); library(purrr)

#### Preliminary functions
Function which finds the number of $i$ which go into $n$, $n_{i}$, for $i \in \{1,\dots,n\}$, and creates a list of $n_{i}$ of each $i$. If $i$ is too big to be part of a partition of $n$ of length $M$, it is removed from the list. This returns a list which can be sampled from to generate potential partitions of $n$ of length $M$.

In [3]:
possible_numbers <- function(n, M){
  f <- function(i, n) rep(i, floor(n/i))
  pos_numbers <- unlist(sapply(1:n, f, n=n))
  pos_numbers <- pos_numbers[pos_numbers <= n - M + 1]
}

Function which attempts to find untried potential partitions of length $M$. Generates $reps$ number of potential partitions by sampling from vec. Returns an untried potential partition if there is one, and a tried potential partition if not.

In [3]:
find_untried <- function(vec, M, reps){
  pos_new_vec <- replicate(reps, sample(vec, M))
  pos_new_list <- unique(lapply(FUN = f <- function(x){ sort(x, decreasing = T)},
                                X = split(pos_new_vec, ceiling(seq_along(pos_new_vec)/M))))
  untried <- pos_new_list[!(pos_new_list %in% part_tries)]
  if(length(untried) > 0) {
    new <- untried[[1]]
  } else {
    new <- unlist(sample(pos_new_list, 1))
  }
  new
}

Function which finds untried potential partition of $n$ of length $M$, by running the possible_numbers function until one is found. Once found, it adds this potential partition to the list of tried potential partitions.

In [5]:
unique_sample <- function(vec, M){
  if(!exists('part_tries')) {
    part_tries <<- list()
  }
  new <- sort(sample(vec, M), decreasing = T)
  while(list(new) %in% part_tries){
    new <- find_untried(vec, M, 500)
  }
  m <- length(part_tries)
  part_tries[[m+1]] <<- new
  new
}

Function to count the number of partitions of length $M$ in $C$

In [6]:
length_M_partition_count <- function(M, C){
  sum(M == sapply(C, length)) 
}

Function to test whether all elements in list of partitions $C$ add to $n$

In [7]:
partitions_add_to_n <- function(n, C){
  sum(sapply(C, sum) == rep(n, length(C))) == length(C)
}

Function to check whether a list of partitions $C$ has the same length as the number of unique partitions of $n$

In [8]:
P_partitions <- function(n, C){
  length(C) == P(n)
}

Function to remove last item from list lst

In [9]:
lst_remove_last <- function(lst){
  l <- length(lst)
  lst <- lst[1:(l-1)]
  lst
}

Function to add element new to end of list lst

In [10]:
lst_add <- function(lst, new){
  l <- length(lst)
  lst[[l + 1]] <- new
  lst
}

Function to test whether a list contains duplicates

In [42]:
no_duplicates <- function(n, C){
  length(unique(C)) == length(C)
}

General reject function that takes a list of functions that check different conditions, checks, as an input. If any of these checking functions return false for $C$, then $C$ is rejected.

In [44]:
reject <- function(n, C, checks){
  F %in% lapply(checks, function(f) f(n, C))
}

#### Functions to input into backtracking algorithm
Reject function rejects $C$ if there are any duplicates or if any of the partitions are not partitions of $n$. This is a partial application of the reject function for the specific checking criterion for the integer partitioning problem.

In [47]:
reject_partition <- partial(reject, checks = list(no_duplicates, partitions_add_to_n))

Accept function accepts list of partitions of $n$ if there are the correct number of partitions. If it accepts $C$ then it also empties the list of tried partitions as this is a global variable

In [13]:
accept_partition <- function(n, C){
  accept <- P_partitions(n, C)
  if(accept == TRUE){
    part_tries <<- list()
  }
  accept
}

First function finds the smallest partition length $M$ for which all partitions have not yet been found. It then finds a new potential partition of $n$ of length $M$ which has not yet been tried

In [14]:
first_partition <- function(n, C){
  l <- length(C)
  M <- min(sapply(C, length))
  while(length_M_partition_count(M, C) == R(M, n)) {
    M = M + 1
    if(M > n) {
      return(NULL)
    }
  }
  pos_numbers <- possible_numbers(n, M)
  new <- unique_sample(pos_numbers, M)
  C <- lst_add(C, new)
}

Next function does the same as the first function except that it first removes the potential partition which has just been rejected from $C$

In [15]:
next_partition <- function(n, C){
  C <- lst_remove_last(C)
  M <- 1
  while(length_M_partition_count(M, C) == R(M, n)) {
    M = M + 1
    if(M > n) {
      return(NULL)
    }
  }
  pos_numbers <- possible_numbers(n, M)
  new <- unique_sample(pos_numbers, M)
  C <- lst_add(C, new)
}

### Concrete example

In [46]:
n <- 7
backtrack(accept_partition, reject_partition, first_partition, next_partition, n, list(n))

## Gray Code
### How can problem be solved with backtracking?
A Gray code of length $n$ can be found with backtracking by using the first and next functions to generate a potential next element in the gray code by changing exactly one of the bits in the previous element of the Gray code (each element is a sequence of length $n$ of zeroes and ones). The reject function can be used to reject a sequence of these elements either if there are any repeated elements or if the Hamming distance between any two consecutive elements is not 1. The accept function can be used to accept this list of elements once if it has length $2^n$, as this is the number of elements in a Gray code of length $n$.
### Explanation of design approach
The Gray code is stored in a list, where each element is a vector of bits which makes up an element of the Gray code. The first and next functions find the potential next item in the list/Gray code.

As in the integer partition problem, the reject_gray function is a partial application of the more general reject function.
### Source code
#### Preliminary functions
Function that finds the Hamming distance between two vectors

In [17]:
hamming_distance<- function(v1, v2){
  sum(v1 != v2)
}

Function to ensure that Hamming distance between consecutive code words in prospective Gray code is exactly one

In [41]:
gray_switch_check <- function(n, C){
  l <- length(C)
  sum(mapply(hamming_distance, v1=C[-l], v2=C[-1]) == 1) == l-1
}

Function to check that there are $2^n$ code words in Gray code

In [19]:
gray_length_check <- function(n, C){
  length(C) == 2^n
}

Function that switches a $1$ to a $0$ and vice versa

In [20]:
bin_switch <- function(b) 1 - b

#### Functions to input into backtracking algorithm
Accept function accepts gray code if it has length $2^n$

In [21]:
accept_gray <- function(n, C){
  gray_length_check(n, C)
}

Reject function rejects $C$ if it contains duplicate code words or any two consecutive code words don't have a Hamming distance of $1$. This is a partial application of the reject function for the specific checking criterion for the Gray code problem.

In [48]:
reject_gray <- partial(reject, checks = list(no_duplicates, gray_switch_check))

First function generates a first potential code word by switching the final digit of the last of the current list of code words from a $0$ to a $1$ or vice versa

In [23]:
first_gray <- function(n, C){
  l <- length(C)
  if(l == 0) {
    return(list(rep(0, n)))
  }
  if(l == 2^n) {
    return(NULL)
  }
  switch_last <- c(C[[l]][1:(n-1)], bin_switch(C[[l]][n]))
  C <- lst_add(C, switch_last)
}

Next function finds the index of the digit which was switched between the prior two code words and then switches the digit before that to generate a new potential code word

In [24]:
next_gray <- function(n, C){
  l <- length(C)
  if(l == 2^n) {
    return(NULL)
  } else {
    switch_index <- which(C[[l]] != C[[l - 1]])[length(which(C[[l]] != C[[l - 1]]))]
    if(switch_index == 1) {
      new_switch_index <- n 
    } else {
      new_switch_index <- switch_index - 1
    }
    C <- lst_remove_last(C)
    C <- lst_add(C, C[[l - 1]])
    C[[l]][new_switch_index] <- bin_switch(C[[l]][new_switch_index])
  }
  C
}

### Concrete example

In [50]:
n <- 6
backtrack(accept_gray, reject_gray, first_gray, next_gray, n, list(rep(0,n)))

## Conclusions
### Reusability
This code is fairly reusable, as all tasks are broken down into a lot of small functions which just perform one purpose. This makes it easier for the code to be understood, and also means that the functions could be repurposed. Additionally, each function has a comment explaining what exactly the function does (in this notebook all of the comments are in Markdown blocks before the function). This makes the code more reusable as it allows anyone reading the code to get a good understanding of what it does. The code also follows R style conventions, again making it more readable for anyone else looking at it, and thus aiding reusability.
### Limitations and improvements
One limitation of this implementation of backtracking is its use of global variables within functions. This is bad coding practice, since by defining a global variable within a function we may accidentally overwrite a variable of the same name outside of the function. This could be improved by making an output function that returned a value for the outermost backtrack function from the innermost backtrack function. This would allow the backtrack function to take the exact form of the backtrack pseudocode given on Wikipedia, but I was struggling to find a way to break out of the outer function from the nested function.

Another limitation is that for Gray code, using backtracking any $n$ greater than $11$ leads to a recursion error due to stack overflow. This is because the number of elements in the Gray code increases exponentially ($2^n$) with $n$, leading to $2^n$ nested functions. This could potentially improved by running the code on a more powerful computer system, or alternatively by finding a method of backtracking Gray codes with less recursions. I did briefly consider implementing backtracking of Gray codes by generating potential 'columns' of Gray codes (i.e. $2^{n}$ binary values representing the $i^{th}$ bits in each element of the Gray code, $i\in\{1,\dots,n\}$), as this would only involve $n$ nested functions, however implementing this could be very computationally slow as there is no obvious way to generate these columns that wouldn't involve a lot of rejection.

A limitation of the integer partition code is that it is very computationally slow for large $n$. This is because as $n$ increases, there number of possible lists of $M$ numbers increases rapidly. This means that the number of wrong potential partitions that are generated increases rapidly. A way to improve the algorithm would be to create first and next functions which were more limited in the potential partitions they returned, thus cutting some of the computational inefficiency.

Another improvement that could be made to the code would be to apply more functional patterns, for example using partial application to make general accept, first, and next functions similar to the general reject function I made. This would make the code even more reusable as it would make the backtracking algorithm even easier to adapt to other problems.