# 609 Assessment 1

## Generic Backtracking Algorithm
The generic backtracking algorithm can be written as a recursive function with 5 main functions, the contents of which can be provided for the scenario being implemented. 

When the backtracking function is called and provided with data about the problem, and the current partial candidate, firstly the function checks to see if the candidate can be rejected and if this is the case it returns to the previous level of recursion as the branch will not be able to provide any solutions. If this does not happen then the partial candidate is then checked to see if it is a full solution and therefore should be accepted. If this is true then an output function is called which will do something to this solution, for example print it, or store it in a variable. After this, an extention to the current partial candidate is produced. If this exists then the backtracking is performed on this new partial candidate. Once this branch is exited, new partial candidates are produced and backtracking is performed on these until no more partial candidates can be produced. At this point, the branch of the backtracking has had all its possible solutions exhausted, and the function will return to the previous level of recurrsion.

In [1]:
backtracking <- function(P, c){
  #if current partial solution can not be viable, reject and return
  if (reject(P,c) == TRUE){
    return()
  }
  #if current partial solution is a true full solution, accept it, and perform output function
  if (accept(P,c) == TRUE){
    output(P,c)
  }
  s <- first(P,c)
  #iterate through all possible candidates for adding to the current partial solution
  while(length(s)!=0){
    backtracking(P,s)
    s <- next_val(P,s)
  }
  return()
}


In the backtracking algortihm above, the function has two inputs, $P$ which includes the data provided for the specific problem, and $c$ which represents the current partial solution being evaluated. 

## Integer Partitioning
In this problem, we are provided with a value and have to produce all unique integer partitions for that number (groups of positive integers that sum to the value). 
One way to solve this is to take the highest integer that can be used in the partitioning (the value itself) and find all partitionings that start with this value. Once these solutions are exhausted, then the starting number can be decreased by one and the process repeated. For each subsequent number included in the partitioning, it must be less than or equal to the pervious number in the partitioning, and the sum must be less than or equal to the target. This can be done using the backtracking algorithm where each node in the backtracking 'tree' represents a new value added to the partial solution for a partitioning.

To use the backtracking algorithm for this, each of the 5 functions called in the backtracking algortihm need to be defined. 

**Reject**: The main case that needs to be rejected is when the sum of the proposed partitioning is more than the value that is being aimed for. As we can only add positive integers to a partitioning this means that the proposed partial solution can never be made into a correct partitioning. The other case that needs to be rejected is if the last value in the partitioning is 0. As there can only be positive integers in the partitioning, this means that the solution is not viable, and even if this is not the case, as the integers added to the partitioning have to be less than or equal to the previously added integer, then more positive integers will not be able to be added to the proposed partitioning so it could never be made a solution.

**Accept**: In order for a partitioning to be correct and a full solution, the partial candidate needs to sum to the value that we are producing the partitionings for. It is important for this to be checked after we have checked to see if the solution must be rejected as this makes sure that correct partitionings with a 0 added to the end will not be accepted.

**Output**: In this case, if a partitioning is accepted as a solution, it is appeneded to a list that is stored as a global variable, $solution$. The partitioning is passed into this function as $c$ which is a list of the integers that make up the partitioning, however, before adding it to the list of solutions it is turned into a string of the values in the list separated by commas to make it easier to view the solutions in the output.

**First**: For each step of the recursion, this produces the first new partial solution that should be considered by adding an integer to the list of integers that make up the partitioning. As each integer added to the partitioning must be less than or equal to the previous value in order for solutions to not be repeated, if possible, we want, for our first proposed solution to append an integer of the same value as the last one already in the partitioning. This value can then be decreased in the **next** function. However, there are two other cases that need to be considered. The first is when adding this would cause the partitioning to sum to more than the value that is being aimed for. Whilst this problem will be caught by the **reject** step in the next level of recurrence, it is more efficient to spot that this is happening now, and instead append to the partitioning the highest value that can be appended, while keeping the sum of the partitioning less than or equal to the value being aimed for. The other case is when the backtracking algorithm is first called. As when we first call the backtracking algortihm, the current partial solution provided is empty, we can instead start the recursion for finding the solutions by adding the target value to the partitioning, creating the partitioning of size 1.

**Next**: This is the final function that needs to be designed. Each time this is called, we want to take the last value in the partitioning, and decrease it by 1. This is then returned as the next partial solution to test. As a partitioning cannot include negative integers, if the last number in the partitioning is equal to 0, then all the possible partial solutions have been tested. In this case, we can return an empty list which means that there are no more solutions to try and the while loop will be left and the recursion will return to the level above.

The function created can be called with the one input being the number that you wish to find the integer partitions of. After defining these functions for the backtracking, the backtracking function is called with two inputs. $P$ is the data that we wish to provide which in this case is the number we are partitioning, and $c$ which is the partial solution is initiated as an empty list.

In [2]:
integer_partition <- function(n){
  P <- n
  
  reject <<- function(P,c){
    #case for when the partial solution is empty as other cases can not be evaluated
    if (length(c)==0){return(FALSE)}
    #not valid if there is a 0 in the solution
    if (tail(c, n = 1)[1]== 0){return(TRUE)}
    #also reject if sum of partitioning is too high
    if (sum(c) > P){return(TRUE)}
    else {return(FALSE)}
  }
  
  accept <<- function(P,c){
    #one accept case-if a true partitioning so sums to P
    if (sum(c) == P){return(TRUE)}
    else {return(FALSE)}
  }
  
  output <<- function(P,c){
    #append the correct solution to a list
    solution <<- c(solution,paste(c,collapse = ', '))
  }
  first <<- function(P,c){
    #if beginning of recursion, set partial solution to the target
    if(length(c)==0){return(c(P))}
    #if the last integer (smallest in partition) can be added again without exceding target
    if(P-sum(c)>tail(c, n = 1)[1]){
      return(c(c,(tail(c, n = 1)[1])))
    }
    #else add the largest value that does not go over target
    return (c(c,(P-sum(c))))
  }
  next_val <<- function(P,c){
    #if last integer 0, no more possible solutions so return empty list
    if (tail(c, n = 1)[1]== 0){return(c())}
    #otherwise decrease last integer in partition by one
    return(c(head(c,-1),(tail(c, n = 1)[1]-1)))
  }
  solution <<- c()
  backtracking(P,c())
}


This algorithm can be tested, for example by getting it to find all the integer partitionings for 10. Printing the solution shows all the partitionings found, and the length of the list of solutions is 42, which is the number of unique integer partitionings of 10.

In [4]:
integer_partition(10)
print(solution)
print(length(solution))

NULL

 [1] "10"                           "9, 1"                        
 [3] "8, 2"                         "8, 1, 1"                     
 [5] "7, 3"                         "7, 2, 1"                     
 [7] "7, 1, 1, 1"                   "6, 4"                        
 [9] "6, 3, 1"                      "6, 2, 2"                     
[11] "6, 2, 1, 1"                   "6, 1, 1, 1, 1"               
[13] "5, 5"                         "5, 4, 1"                     
[15] "5, 3, 2"                      "5, 3, 1, 1"                  
[17] "5, 2, 2, 1"                   "5, 2, 1, 1, 1"               
[19] "5, 1, 1, 1, 1, 1"             "4, 4, 2"                     
[21] "4, 4, 1, 1"                   "4, 3, 3"                     
[23] "4, 3, 2, 1"                   "4, 3, 1, 1, 1"               
[25] "4, 2, 2, 2"                   "4, 2, 2, 1, 1"               
[27] "4, 2, 1, 1, 1, 1"             "4, 1, 1, 1, 1, 1, 1"         
[29] "3, 3, 3, 1"                   "3, 3, 2, 2"              

## Grey Code
In this problem, we are provided with a number, $n$, (positive integer) and have to create a full grey code for binary numbers of this length. This means that a sequence of $2^n$ binary numbers will be produced. There are two criteria that this sequence of binary numbers must have. Firstly, no two numbers should be the same - each of the $2^n$ binary values of length $n$ will appear exactly once. The other criteria is that two successive values must only differ by one bit. This is different to the standard way of representing numbers in binary.

This problem can be solved using a backtracking algorithm, where in each level of the recursion, the next binary number is assigned. If a sequence of binary numbers gets to a point where no value can be assigned next such that the two criteria above are satisfied, the algorithm can then backtrack to change the previous value to see if this can instead produce a valid code. 

The same 5 main functions need to be defined to use in the backtracking function.

**Reject**: There are two main crieteria that need to be checked when suggesting new values to be added to the code. The easiest way to do this is to check for one of them in the reject function, and to check for the other when proposing the new values to append to the partial solution. Therefore, this function checks to make sure there are no repeated values in the partial solution, and rejects if this is the case.

**Accept**: This checks to see if a full grey code has been found. The easiest way to do this is to check that it is the correct length ($2^n$) as this means that it is an acceptable code and the correct length.

**Output**: If the code is a valid grey code and therefore accepted, it needs to be stored in a variable which is made global so the solution can be accessed outside the function. The list of values of the grey code are stored, with each value being represented by a string. 

**First**: When deciding on the first value to try at the start of each recursion, there are a few cases that need to be considered. Firstly, if a code has been accepted, the recursion needs to be exitted, as only one solution is being looked for, not all possible solutions as in the integer partitioning code. If a solution has been found, an empty list is returned which allows the function to exit the recursion. If this is the first level of recursion, a starting value of the grey code needs to be decided on. The simplest way is to start with a string of 0's of length $n$. This means that each possible value for each place in the code can then be easily checked by incrementing the number that the binary string represents each time. If there is already some values in the partial solution, a possible value needs to be found for the next value, whilst also checking for the other essential criteria - there is only a one bit difference between two consecutive values (hamming distance of 1). This can be done by setting the suggested value to a string of 0's of length $n$ and checking the hamming distance between this and the previous value, using a **hamming distance** function which counts the number of bits that are different between two binary values of the same length. Whilst this criteria is not met, new values for the next value in the grey code are checked. This is done by converting the binary number into decimal, adding one to it, and then converting it back to a binary string. The only other thing that needs to be checked is if this causes the value being proposed to be a string of 1's of length $n$. If this is not accepted then this means that there are no possible values that can be proposed and an empty list is returned which allows the level of recursion to be left. 

**Next**: This works in a similar way to how **First** works. It increments the previously suggested value by one and checks its hamming distance. If this is equal to one then it is part of a possible solution so the partial solution with this appended to the end is returned and the backtracking is performed, otherwise the suggested value is incremented again and this repreats until a value with this criteria is found, or a string of 1's of length $n$ is found, which leads to an empty list being returned as there are no more partial solutions to check with the greycode beginning in the way that the current partial solution does.

The function below can then be called with one input, $n$ being a positive integer that represents the length of each value in the grey code that is to be produced. It returns a list of strings with each binary string representing the next value in an acceptable grey code.

In [5]:

grey_code_full <- function(n){
  P <- n
  reject <<- function(P,c){
    #reject solution if any of the entries repeat
    if (anyDuplicated(c)==0){
      return(FALSE)
    }
    return(TRUE)
  }
    
  accept <<- function(P,c){
    #if the length of the code is 2^P then a full grey code has been found
    if (length(c)==2^P){
      return(TRUE)
    }
    return(FALSE)
  }
    
  output <<- function(P,c){
    #store the solution in a global variable
    solution <<- c
  }
    
  first <<- function(P,c){
    #if solution has been found, return empty list
    if(length(solution)!=0){return(c())}
    #if first level of recursion, suggest string of all 0s
    if(length(c)==0){
      return(c(strrep(0,P)))
    }
    #otherwise suggest string of all 0s
    poss <- c(strrep(0,P))
    #increment suggestion until hamming distance=1
    while(hamming_dist(tail(c, n = 1), poss)!= 1){
      #if string all 1s, all stings have been checked
      if (identical(strrep(1,P),poss)){return(c())}
      #binary to decimal
      poss_decimal <- strtoi(poss, base=2)
      #increment by one
      new_decimal <- poss_decimal+1
      #convert back to binary
      new_binary <- paste(rev(as.integer(intToBits(new_decimal))),collapse='')
      poss <- substring(new_binary,(nchar(new_binary)-P+1))
    }
    return(c(c,poss))
      
  }
    
  next_val<<- function(P,c){
    #if solution has been found, return empty list
    if(length(solution)!=0){return(c())}
    #last value checked=tail of partial solution
    poss <- tail(c, n = 1)
    #return empty list if all 1s
    if (identical(strrep(1,P),poss)){return(c())}
    #increment last value by one
    poss_decimal <- strtoi(poss, base=2)
    new_decimal <- poss_decimal+1
    new_binary <- paste(rev(as.integer(intToBits(new_decimal))),collapse='')
    poss <- substring(new_binary,(nchar(new_binary)-P+1))
    #partial solution, without last value that is being incremented
    c <- c[-(length(c))]
    #check value of hamming distance
    while(hamming_dist(tail(c, n = 1), poss)!= 1){
      #increment by 1
      if (identical(strrep(1,P),poss)){return(c())}
      poss_decimal <- strtoi(poss, base=2)
      new_decimal <- poss_decimal+1
      new_binary <- paste(rev(as.integer(intToBits(new_decimal))),collapse='')
      poss <- substring(new_binary,(nchar(new_binary)-P+1))
    }
    return(c(c,poss))
  }

  #fuction to calculate hamming distance
  hamming_dist <<- function(c_last,c_new){
    #turn strings into lists
    c_last <- strsplit(c_last,'')
    c_new <- strsplit(c_new,'')
    difference <- 0
    #compare each item in each list
    for (i in 1:length(c_last[[1]])){
      if (c_last[[1]][i] != c_new[[1]][i]){
        difference <- difference + 1
      }
   }
   #return number of differences (hamming distance)
   return(difference)
}

    
  solution <<- c()
  backtracking(P,c())
}



This algorithm can be tested, for example by getting it to find a grey code of length 6. Printing the solution shows each individual value in an acceptable grey code, and the length of the solution is 64 which is the length that is expected ($2^6$).

In [11]:
grey_code_full(5)
print(solution)
print(length(solution))

NULL

 [1] "00000" "00001" "00011" "00010" "00110" "00100" "00101" "00111" "01111"
[10] "01011" "01001" "01000" "01010" "01110" "01100" "01101" "11101" "10101"
[19] "10001" "10000" "10010" "10011" "10111" "10110" "10100" "11100" "11000"
[28] "11001" "11011" "11010" "11110" "11111"
[1] 32


The performance of the backtracking algortihm for this problem could be improved as it runs slowly, for grey codes of length 9 or more. One way to improve this is to check for both of the criteria when proposing the values to append to the partial solution, instead of checking one in the reject step. This would reduce the number of recursions that need to be performed and increase the speed.

Overall, these two examples show the wide range of uses that a backtracking algorithm can be used in which shows the reusability of the function. This is becasue, even though the individual functions passed into it can vary a lot, the structure that the backtracking has is always the same. One diffrerence between the two problems is that the integer partitioning required all solutions for a problem to be found, whereas when creating a grey code, only one possible solution needs to be found. A variable which represents if all or only one solution is wanted could be added to the inputs of the backtracking function which would also increase reusability as it would mean that if, for example, all grey codes of a length need to be found, not just one, the functions used in the backtracking would not need to be changed, just the input vairable of if one or all solutions need to be found.

One limitation of the backtracking algortihm is that some problems that can be solved using backtracking and recursion could also be solved by just using iteration. One problem with the backtracking algorithm compared to this is that depending on the size of the problem, the recursion limit could be reached and this can lead to stack overflow errors or the program crashing due to the number of values that need to be stored each time a new level of recursion is entered, whereas iteration does not have these problems.