Updated README

mailund · Aug 27, 2018 · 32055d5 · 32055d5
1 parent 8feeb25
commit 32055d5
Show file tree

Hide file tree

Showing 12 changed files with 470 additions and 162 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -52,7 +52,23 @@ devtools::install_github("mailund/tailr")
 
 ## Examples
 
-We can take a classical recursive function and write it in a tail-recursive form using an accumulator:
+Consider a classical recursive function, `factorial`:
+
+```{r}
+factorial <- function(n) {
+    if (n <= 1) 1
+    else n * factorial(n - 1)
+}
+```
+
+(I know R already has a builtin factorial function, but please ignore that). This function will compute the factorial of `n`, but if `n` is too large, it will exceed the stack limit:
+
+```r
+> factorial(3000)
+Error: C stack usage  7970184 is too close to the limit
+```
+
+A classical way out of this problem is to turn it into a tail-recursive function:
 
 ```{r}
 factorial <- function(n, acc = 1) {
@@ -61,18 +77,31 @@ factorial <- function(n, acc = 1) {
 }
 ```
 
-We can then, automatically, translate that into a looping version:
+R doesn't implement the tail-recursion optimisation, though, so it doesn't help us.
+
+```r
+> factorial(3000)
+Error: C stack usage  7970184 is too close to the limit
+```
+
+With `tailr` we can, automatically, translate a tail-recursive function into a looping one, essentially implementing the tail-recursion optimisation this way.
 
 ```{r}
-tr_factorial <- tailr::loop_transform(factorial, byte_compile = FALSE, set_srcref = FALSE)
-tr_factorial
+tr_factorial <- tailr::loop_transform(factorial, byte_compile = FALSE)
+```
+
+I have disabled byte compilation to make running time comparisons fair below; by default it is enabled. For a function as simple as `factorial`, though, byte compiling will not affect the running time in any substantial amount. 
+
+This version, because it looks instead of recurse, doesn't have the stack limit problem:
 
-tr_factorial(100)
+```{r}
+tr_factorial(3000)
 ```
 
-I have disabled byte compilation to make running time comparisons fair; by default it is enabled. For a function as simple as `factorial`, though, byte compiling will not affect the running time in any substantial amount. 
 
-We can compare the running time with the recursive function and a version that is written using a loop:
+We get the result `Inf` because the number we compute is too large to represent on the computer, but that is not the point of the example. The point is that the recursion doesn't get too deep for the stack because we avoid recursion alltogether.
+
+With something as simple as computing the factorial, it is easy to write a looping function by hand, and it will be much faster than both the (tail-)recursive and the transformed function:
 
 ```{r}
 loop_factorial <- function(n) {
@@ -87,13 +116,14 @@ loop_factorial <- function(n) {
 
 n <- 1000
 bm <- microbenchmark::microbenchmark(factorial(n), 
-                                     loop_factorial(n), 
-                                     tr_factorial(n))
+                                     tr_factorial(n), 
+                                     loop_factorial(n))
 bm
 boxplot(bm)
 ```
 
-There is *some* overhead in using the automatically translated version over the hand-written, naturally, and for a simple function such as `factorial`, it is not hard to write the loop-variant instead of the recursive function.
+The transformed version runs in about the same time as the recursive one, but the looping function is much faster.
+
 
 However, consider a more complicated example. Using the `pmatch` package, we can create a linked list data structure as this:
 
@@ -112,46 +142,6 @@ llength <- function(llist, acc = 0) {
 }
 ```
 
-It is reasonably simple to understand this function, whereas a looping version is somewhat more complicated. An initial attempt could look like this:
-
-```r
-loop_llength <- function(llist) {
-    acc <- 0
-    repeat {
-        cases(llist,
-              NIL -> return(acc),
-              CONS(car, cdr) -> {
-                  acc <- acc + 1
-                  llist <- cdr
-              })
-    }
-}
-```
-
-This version will not function, however, since it tries to `return` from inside a call to `cases`, and `return` only works inside the immediate scope.
-
-Instead, we can use `callCC` to implement a non-local return like this:
-
-```{r}
-loop_llength <- function(llist) {
-    callCC(function(escape) {
-        acc <- 0
-        repeat {
-            cases(llist,
-                  NIL -> escape(acc),
-                  CONS(car, cdr) -> {
-                      acc <<- acc + 1
-                      llist <<- cdr
-                  })
-        }    
-    })
-}
-```
-
-Notice that we have to use the `<<-` assignment operator here. This is for the same reason that we need a non-local return. The expression inside the call to `cases` is evaluated in a different environment than the local function environment, so to get to the actual variables we want to assign to, we need the non-local assignment operator.
-
-It is possible to avoid `cases` using other functions from the `pmatch` package, but the result is vastly more compliated since pattern matching and expressions that should be evaluated per case needs to handle scoping. We can automatically make such a function using `tailr`, however:
-
 ```{r}
 tr_llength <- tailr::loop_transform(llength)
 ```
@@ -164,7 +154,21 @@ body(tr_llength)
 
 but, then, it is not one we want to manually inspect in any case.
 
-The automatically generated function is complicated, but it actually outcompetes the hand-written loop version.
+It is not too hard to implement this function with a loop either, but it is not as simple as the recursive function:
+
+```{r}
+is_nil <- function(llist) cases(llist, NIL -> TRUE, otherwise -> FALSE)
+loop_llength <- function(llist) {
+    len <- 0
+    while (!is_nil(llist)) {
+        len <- len + 1
+        llist <- llist$cdr
+    }
+    len
+}
+```
+
+If we compare the running time for these three functions, the transformed function is faster than the recursive but not as fast as the iterative:
 
 ```{r}
 make_llist <- function(n) {
@@ -176,14 +180,12 @@ make_llist <- function(n) {
 }
 test_llist <- make_llist(100)
 bm <- microbenchmark::microbenchmark(llength(test_llist),
-                                     loop_llength(test_llist),
-                                     tr_llength(test_llist))
+                                     tr_llength(test_llist),
+                                     loop_llength(test_llist))
 bm
 boxplot(bm)
 ```
 
-It is, of course, possible to write a faster hand-written function to deal with this case, but it will be about as complicated as the automatically generated function, and you don't really want to write that by hand.
-
 As you have no doubt noticed about `llength`, it is not in fact tail-recursive, from the look of it, since the final recursion is enclosed by a call to `cases`. The function is only tail-recursive because it can be translated into one by rewriting the `cases` function call to a sequence of `if`-statements. The `tailr` package doesn't handle `cases` from `pmatch` by knowing about this package. Instead, it has a mechanism that lets you provide re-writing rules.
 
 If you set the attribute "tailr_transform" on a function, and set this attribute to a function, then that function will be called when `tailr` sees the function, before it attempts any other processing. The attribute must be a function that maps an expression to another, re-written, expression. The one for `cases` looks like this:
@@ -201,3 +203,151 @@ attr(cases, "tailr_transform") <- tailr_transform_call
 ```
 
 You can use this mechanism to support tail-recursion for non-tail-recursive functions that can be rewritten to be tail-recursive.
+
+More examples:
+
+```{r}
+llcontains <- function(lst, x) {
+    cases(lst, 
+          NIL -> FALSE,
+          CONS(car, cdr) -> if (car == x) TRUE else llcontains(cdr, x)
+    )
+}
+tr_llcontains <- tailr::loop_transform(llcontains)
+
+loop_contains <- function(lst, x) {
+    while (!is_nil(lst)) {
+        if (x == lst$car) return(TRUE)
+        else lst <- lst$cdr
+    }
+}
+
+lst <- make_llist(100)
+bm <- microbenchmark::microbenchmark(llcontains(lst, 1001),
+                                     tr_llcontains(lst, 1001),
+                                     loop_contains(lst, 1001))
+bm
+boxplot(bm)
+
+```
+
+```{r}
+llrev <- function(llist, acc = NIL) {
+    pmatch::cases(
+        llist,
+        NIL -> acc,
+        CONS(car, cdr) -> llrev(cdr, CONS(car, acc))
+    )
+}
+
+bubble <- function(llist, swapped = FALSE, acc = NIL) {
+    cases(llist,
+          CONS(first, CONS(second, rest)) -> 
+              if (first > second) bubble(CONS(first, rest), TRUE, CONS(second, acc))
+              else bubble(CONS(second, rest), swapped, CONS(first, acc)),
+          CONS(x, NIL) -> list(new_list = llrev(CONS(x, acc)), swapped = swapped)
+    )
+}
+
+bubble_sort <- function(lst) {
+    if (is_nil(lst)) return(lst)
+    bind[lst, swapped] <- bubble(lst)
+    while (swapped) {
+        bind[lst, swapped] <- bubble(lst)
+    }
+    lst
+}
+
+lst <- CONS(3, CONS(2, CONS(5, CONS(1, NIL))))
+bubble_sort(lst)
+```
+
+```{r}
+tr_llrev <- function(llist, acc = NIL) {
+    pmatch::cases(
+        llist,
+        NIL -> acc,
+        CONS(car, cdr) -> llrev(cdr, CONS(car, acc))
+    )
+}
+tr_llrev <- tailr::loop_transform(tr_llrev)
+
+tr_bubble <- function(llist, swapped = FALSE, acc = NIL) {
+    cases(llist,
+          CONS(first, CONS(second, rest)) -> 
+              if (first > second) tr_bubble(CONS(first, rest), TRUE, CONS(second, acc))
+              else tr_bubble(CONS(second, rest), swapped, CONS(first, acc)),
+          CONS(x, NIL) -> list(new_list = tr_llrev(CONS(x, acc)), swapped = swapped)
+    )
+}
+tr_bubble <- tailr::loop_transform(tr_bubble)
+
+tr_bubble_sort <- function(lst) {
+    if (is_nil(lst)) return(lst)
+    bind[lst, swapped] <- tr_bubble(lst)
+    while (swapped) {
+        bind[lst, swapped] <- tr_bubble(lst)
+    }
+    lst
+}
+
+lst <- CONS(3, CONS(2, CONS(5, CONS(1, NIL))))
+tr_bubble_sort(lst)
+```
+
+
+```{r}
+loop_llrev <- function(lst) {
+    acc <- NIL
+    while (!is_nil(lst)) {
+        acc <- CONS(lst$car, acc)
+        lst <- lst$cdr
+    }
+    acc
+}
+loop_bubble <- function(lst, swapped = FALSE) {
+    acc <- NIL
+    repeat {
+        if (is_nil(lst$cdr)) 
+            return(list(new_list = loop_llrev(CONS(lst$car, acc)),
+                        swapped = swapped))
+        
+        first <- lst$car
+        second <- lst$cdr$car
+        rest <- lst$cdr$cdr
+        if (first > second) {
+            acc <- CONS(second, acc)
+            lst <- CONS(first, rest)
+            swapped <- TRUE
+        } else {
+            acc <- CONS(first, acc)
+            lst <- CONS(second, rest)
+        }
+    }
+}
+
+loop_bubble_sort <- function(lst) {
+    if (is_nil(lst)) return(lst)
+    bind[lst, swapped] <- loop_bubble(lst)
+    while (swapped) {
+        bind[lst, swapped] <- loop_bubble(lst)
+    }
+    lst
+}
+
+lst <- CONS(3, CONS(2, CONS(5, CONS(1, NIL))))
+loop_bubble_sort(lst)
+```
+
+```{r}
+lst <- make_llist(10)
+bm <- microbenchmark::microbenchmark(bubble_sort(lst),
+                                     tr_bubble_sort(lst),
+                                     loop_bubble(lst))
+bm
+boxplot(bm)
+```
+
+The module primarily solves the problem of exceeding the stack space. The transformed functions are not as fast as those we can code by hand using loops. It *should* be possible to improve on the running time of the transformed functions, however, with some program analysis... This analysis should be included in the time usage analysis, though, which will probably still come out saying that manually programmed looping versions are faster than transformed functions. Recursive functions can be a lot easier to read, though, than their corresponding looping versions, especially with pattern matching.
+
+