accumulate_while #253

dgrtwo · 2016-10-11T12:54:06Z

There's a functional pattern that I think would be useful in purrr: repeat a function until some predicate is met, where the output would then be of an unpredictable size. This would be very useful in iterative algorithms like this EM that run until convergence.

For example, consider this implementation of accumulate_while, which (like accumulate) iteratively changes a value while keeping intermediate states, then stops when a predicate is not met:

accumulate_while <- function(.x, .f, .p, ..., .compare = FALSE, .max = 100000) {
  .f <- as_function(.f, ...)
  .p <- as_function(.p)

  ret <- vector("list", .max)
  ret[[1]] <- .x

  so_far <- .x
  for (i in seq(2, .max)) {
    result <- .f(so_far)
    ret[[i]] <- result

    if (.compare && !.p(so_far, result)) {
      break
    }
    if (!.compare && !.p(result)) {
      break
    }

    so_far <- result
  }
  if (i == .max) {
    message("Reached .max limit of ", .max)
  }

  head(ret, i)
}

As one minimal example:

# add one until not less than 10
as_vector(accumulate_while(1, ~ . + 1, ~ . < 10))
# [1]  1  2  3  4  5  6  7  8  9 10

There's also a .compare option where the predicate takes the last two values, which is very useful for "run until convergence"

as_vector(accumulate_while(1, ~ . / 2, ~ abs(.x - .y) > 1e-4, .compare = TRUE))
 [1] 1.000000e+00 5.000000e-01 2.500000e-01 1.250000e-01 6.250000e-02 3.125000e-02 1.562500e-02
 [8] 7.812500e-03 3.906250e-03 1.953125e-03 9.765625e-04 4.882812e-04 2.441406e-04 1.220703e-04
[15] 6.103516e-05

I can think of other variations as well:

reduce_while: same, but drop the intermediate states
rerun_while: performs each trial independently until the result satisfies a condition (like a geometric process), keeps the intermediate result. (Technically a special case of accumulate_while).

If you support this I can turn it into a pull request with docs and tests.

(Also note that the final version could easily support an option of .max being infinite, and successively doubling the size of ret when it is reached- I left it out for simplicity here).

The text was updated successfully, but these errors were encountered:

dgrtwo · 2016-10-11T13:33:21Z

For example of the usefulness of accumulate_while, here's a function for performing clustering on univariate data with a flexible model, based on my example here:

library(purrr)
library(dplyr)

em_cluster <- function(data, expectation, log_likelihood, clusters = 2) {
  expectation_lifted <- lift(expectation)

  iterate <- function(last) {
    # expectation: find the mean/sd of current clusters
    estimates <- last %>%
      group_by(assignment) %>%
      do(expectation_lifted(.)) %>%
      tidyr::crossing(data)

    # maximization: pick best for each
    estimates$likelihood <- do.call(log_likelihood, estimates)

    estimates %>%
      group_by(observation) %>%
      top_n(1, likelihood) %>%
      ungroup()
  }

  not_converged <- ~ !all(.x$assignment == .y$assignment)

  data %>%
    mutate(assignment = factor(sample(clusters, nrow(data), replace = TRUE))) %>%
    accumulate_while(iterate, not_converged, .compare = TRUE) %>%
    map2_df(seq_along(.), ~ mutate(.x, iteration = .y))
}

For example, Gaussian clustering would be done with:

set.seed(2016)

# setup: simulate data
observations <- data_frame(observation = seq_len(1000)) %>%
  mutate(oracle = sample(c("A", "B"), n(), TRUE)) %>%
  mutate(mu = ifelse(oracle == "A", 0, 4),
         x = rnorm(n(), mu, sd = 1))

estimate_normal <- function(x, ...) {
  data_frame(mu_hat = mean(x), sd_hat = sd(x))
}
likelihood_normal <- function(x, mu_hat, sd_hat, ...) {
  dnorm(x, mu_hat, sd_hat)
}

result <- em_cluster(observations, estimate_normal, likelihood_normal)
result
#> # A tibble: 8,000 × 9
#>    observation oracle    mu          x assignment iteration mu_hat sd_hat
#>          <int>  <chr> <dbl>      <dbl>     <fctr>     <int>  <dbl>  <dbl>
#> 1            1      A     0  0.9005550          2         1     NA     NA
#> 2            2      A     0 -0.4388189          2         1     NA     NA
#> 3            3      B     4  4.1400180          1         1     NA     NA
#> 4            4      A     0  0.1586751          1         1     NA     NA
#> 5            5      A     0  0.9073702          1         1     NA     NA
#> 6            6      A     0  1.6201261          2         1     NA     NA
#> 7            7      B     4  4.0466888          1         1     NA     NA
#> 8            8      B     4  4.7145865          1         1     NA     NA
#> 9            9      A     0  1.0421298          2         1     NA     NA
#> 10          10      A     0  0.1820228          2         1     NA     NA
#> # ... with 7,990 more rows, and 1 more variables: likelihood <dbl>

library(ggplot2)
ggplot(result, aes(x, fill = assignment)) +
  geom_histogram() +
  facet_wrap(~ iteration)

hadley · 2017-03-03T19:01:10Z

I think this seems reasonable, so I'd be happy to review a PR. (purrr development will kick off seriously in about a month, so there's no rush)

lionel- · 2017-03-07T11:19:33Z

This would be useful in conjunction with a way of saving temporary results. This is a case where the user might also want to interrupt, check, and resume computations.

I thought at first that mapping environments would be useful for this but @hadley feels this is out of scope for purrr (cf #135).

But how about this: saving the current computations in a list, à la ggplot2? Then if interrupted you could check .purrr_last. This would not solve the resume-from-where-you-left part, but at least you wouldn't lose expensive computations when something goes wrong. Or maybe save the call and environment as well?

This is starting to feel like base R muckery, and it would feel a bit icky to program with this. Indeed this kind of stuff is for interactive data analysis and one-time scripts. Should it be an adverb? Using the condition system we could check if the user requested saving of computations earlier on the call stack. Maybe we could have something even more general, i.e., you'd supply { blocks and all intermediate quantities would be saved and you'd have a way of restarting computations where you interrupted. Such a system would probably belong in its own package (but would still require support in client packages).

jankislinger · 2017-05-04T07:06:08Z

I thing that the naming of this function is a bit misleading. Functions accumulate and accumulate_while should do the same if predicate is never met, e.g. .p = function(x) return(F). The difference is that accumulate_while takes single-valued .x and unary .f and iterates using only last result (so_far) while accumulate takes vector .x and binary .f and iterates using last result and next value of .x.

I suggest renaming this function to something else, for example iterate_while. It would be nice to have also function iterate that would do simply

iterate <- function(.x, .f, .n, ...) {
  for(i in seq_len(.n)) {
    .x <- f(.x, ...)
  }
  return(.x)
}

cfhammill · 2018-10-26T17:40:31Z

I'm happy to add reduce_while. Super simple sketch modelled after reduce_impl:

reduce_while <- function(.x, .f, .p, ..., .init){
    out <- reduce_init(.x, .init, left = TRUE)
    idx <- reduce_index(.x, .init, left = TRUE)
    .f <- as_mapper(.f, ...)
    .p <- as_mapper(.p)
    for (i in idx) {
        if(is_false(.p(out))) break
        out <- .f(out, .x[[i]], ...)
    }
    out
}

I implemented this with tail recursive functions using the trampoline pattern but it was very very slow.

cfhammill · 2018-10-26T19:41:32Z

The solution above is a bit incomplete, there is some ambiguity on whether to return the result after the first fail, or from before. I defaulted to before, but implemented this as an argument, issuing PR now.

In partial resolution of tidyverse#253

lionel- · 2018-10-27T06:27:15Z

This sort of streaming problems might be more relevant for the flowery package.

cfhammill · 2018-10-27T11:34:49Z

Didn't realize this problem was already solved in flowery, perhaps close the issue so others don't spend time on this?

lionel- · 2018-10-27T11:43:48Z

We are not sure yet what direction we are going to take regarding flowery. This is still experimental.

cfhammill · 2018-10-27T12:05:24Z

Ahh I see, I misunderstood to message on the PR. I'll reopen and try to implement trapping.

lionel- · 2018-10-27T12:32:08Z

I think it's better to leave the PR in its current state so that your work is saved in the git refs of the tidyverse/purrr repo. You could implement the sentinel in another PR.

But it's not clear yet whether we want to do this at all, and whether we'd do it with a sentinel or a signalled condition as you suggested in the PR. At this point it may be more helpful to experiment outside purrr, this could help us take the right design decisions.

About sentinel vs condition, I think I'd prefer a sentinel value here, which is the approach taken by Clojure and flowery for this particular problem. I'm not sure I can articulate why though. Another case where we might want to use conditions is to continue mapping on error and throw an aggregated error at the end. This would be an optional behaviour, e.g. with_lazy_errors(map(x, f)). This is different from using safely() and would be a good candidate for using conditions, because it's about exceptional cases in the error path.

Closes tidyverse#253 Closes tidyverse#566

hadley added the feature a feature request or enhancement label Mar 3, 2017

garrettgman mentioned this issue Oct 13, 2017

accumulate_n #394

Closed

cfhammill added a commit to cfhammill/purrr that referenced this issue Oct 26, 2018

Implement reduce_while and variants

61db74c

In partial resolution of tidyverse#253

cfhammill mentioned this issue Oct 26, 2018

Implement reduce_while and variants #566

Closed

lionel- added the reduce 🔨 label Nov 16, 2018

lionel- added a commit to lionel-/lowliner that referenced this issue Dec 11, 2018

Support done sentinels in reduce() and reduce2()

f01c0c5

Closes tidyverse#253 Closes tidyverse#566

lionel- mentioned this issue Dec 11, 2018

Early termination with done sentinels #600

Merged

lionel- added a commit to lionel-/lowliner that referenced this issue Dec 12, 2018

Support done sentinels in reduce() and reduce2()

a5da31c

Closes tidyverse#253 Closes tidyverse#566

lionel- closed this as completed in #600 Dec 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accumulate_while #253

accumulate_while #253

dgrtwo commented Oct 11, 2016

dgrtwo commented Oct 11, 2016

hadley commented Mar 3, 2017

lionel- commented Mar 7, 2017

jankislinger commented May 4, 2017

cfhammill commented Oct 26, 2018 •

edited

cfhammill commented Oct 26, 2018

lionel- commented Oct 27, 2018

cfhammill commented Oct 27, 2018

lionel- commented Oct 27, 2018

cfhammill commented Oct 27, 2018

lionel- commented Oct 27, 2018

accumulate_while #253

accumulate_while #253

Comments

dgrtwo commented Oct 11, 2016

dgrtwo commented Oct 11, 2016

hadley commented Mar 3, 2017

lionel- commented Mar 7, 2017

jankislinger commented May 4, 2017

cfhammill commented Oct 26, 2018 • edited

cfhammill commented Oct 26, 2018

lionel- commented Oct 27, 2018

cfhammill commented Oct 27, 2018

lionel- commented Oct 27, 2018

cfhammill commented Oct 27, 2018

lionel- commented Oct 27, 2018

cfhammill commented Oct 26, 2018 •

edited