Making Rcompadre pipe-friendly #25

patrickbarks · 2018-08-21T14:09:47Z

As noted briefly in PR #24, I'm interested in making Rcompadre work nicely with dplyr's pipe operator (%>%). The pipe operator passes an object on the left side to a function on the right side (e.g. x %>% mean() == mean(x)), and when used in series can make code more readable.

A particular piping sequence I often perform with compadre is to calculate some quantity for every row of the db, add it as a column to the metadata, and then subset the db based on that new column (and repeat). In the past I've worked with a tibble version of the db (i.e. metatadata + list-columns for matA, matU, ..., matrixClass) to make this sequence easier:

library(dplyr)

compadre_tb <- as_tibble(compadre$metadata) %>% 
  mutate(matA = lapply(compadre$mat, function(x) x$matA),
         matU = lapply(compadre$mat, function(x) x$matU),
         matF = lapply(compadre$mat, function(x) x$matF),
         matC = lapply(compadre$mat, function(x) x$matC),
         matrixClass = compadre$matrixClass)

For instance, say I want to work with a set of matrices reflecting populations in decline (lambda < 1), and I only want ergodic matrices with no NAs. With the tibble version I can use a sequence of dplyr and purrr functions to repeatedly add columns (mutate) and subset (filter) based on those new columns.

library(purrr)

compadre_use <- compadre_tb %>% 
  mutate(na_matA = map_lgl(matA, ~ any(is.na(.x)))) %>% 
  filter(na_matA == FALSE) %>% 
  mutate(ergodic = map_lgl(matA, popdemo::isErgodic)) %>% 
  filter(ergodic == TRUE) %>% 
  mutate(lambda = map_dbl(matA, popbio::lambda)) %>% 
  filter(lambda < 1)

With a CompadreData object, the equivalent sequence might look something like this:

compadre_s4_use <- compadre_s4 %>% 
  cleanDB() %>% 
  subsetDB(check_NA_A == FALSE & check_ergodic == TRUE)

compadre_s4_use@metadata$lambda <- sapply(compadre_s4_use@mat,
                                          function(x) popbio::lambda(x@matA))

compadre_s4_use <- subsetDB(compadre_s4_use, lambda < 1)

So piping works fine with subsetting (and cleanDB), but I can't replicate the fully-piped sequence without an Rcompadre equivalent to dplyr::mutate(). What we would need is a function that takes a CompadreData object as the first argument, and returns a CompadreData object with an additional metadata column (based on some transformation specified in the second argument). I don't know what this function would entail in practice, but I think this general type of functionality would be desirable (to me anyway). Thoughts?

The text was updated successfully, but these errors were encountered:

iainmstott · 2018-08-21T15:42:25Z

I think this would require a new definition for %>%, which is an exported method from dplyr. You're passing it an object (class CompadreData) which it won't understand, so we'll need to import it from dplyr and add it to the methods for CompadreData objects. This would mean the package importing dplyr and all of its imports (a fair few).

However, I think dplyr imports the pipe from magrittr which has no imports. Maybe we should look into using that instead.

(Someone else also should weigh in here: this is outside my personal experience)

tdjames1 · 2018-08-21T16:17:26Z

I'm not sure that piping is the problem here, it's having a function that does what @patrickbarks wants to do (in the example, take the output from subsetDB() and modify its metadata, in order to then subset on the new column).

What this comes down to is that people may want to be able to subset based on some properties of the matrices (is_ergodic, has_NAs, lambda). Perhaps then it's a question of making subsetDB able to do those calculations on the fly? I've no idea how this but I'm thinking something like:

subsetDB(check_NA(matA) == FALSE)

levisc8 · 2018-08-21T16:33:48Z

As best I can tell, all you need to do to make it pipe friendly is make sure the first argument of the subsequent function is the CompadreData object. %>% doesn't really recognize classes, just passes the output of the left side to the first part of the right side.

I've only tried this with S3 for other projects, but I don't think it should be any different for S4.

iainmstott · 2018-08-21T16:35:32Z

I don't think I'm grasping the issue properly!

But I think I get it now.

In Patrick's example, I guess it would work if:

compadre_s4_use <- compadre_s4 %>% 
  cleanDB()
  subsetDB(compadre_S4_use, check_NA_A == FALSE & check_ergodic == TRUE)

I guess it's not working in the pipe sequence because the variables check_NA_A and check_ergodic aren't saved in the object yet (they normally would be after a call to cleanDB().

iainmstott · 2018-08-21T16:47:59Z

@levisc8 beat me to it.

But I guess it doesn't change the original problem? I can see now it may be useful to have some sort of function that adds to the metadata to make the objects more pipe-friendly so that (as @tdjames1) pointed out), @patrickbarks first code sequence would work with a CompadreData object, adding to the metadata

patrickbarks · 2018-08-22T07:00:19Z

This thread helped me clarify what I'm after. I guess I was really only interested in creating a mutate-like function that takes a CompadreData db, and adds a column to db@metadata based on some transformation of the original db@metadata and/or db@mat. It's the latter part that I think would be tricky to code (i.e. allowing the user to specify how the new column is derived, and specifically, allowing the transformation to be a function of db@metadata and/or db@mat).

In any case, the proposal in #27 would make this unnecessary because with the flat structure of the proposed data slot, a user could literally just use dplyr::mutate.

tdjames1 · 2018-08-24T12:33:09Z

Stepping back a bit, I wonder if there is any mileage in thinking about implementing a method that allows user to mutate metadata with reference to the CompadreM object, e.g.

mutateDB(compadre_obj, <new_col> = <function of CompadreM>)

The user-specified function would be applied to the mat slot and the (required to be data frame friendly) output appended to metadata as a new column. There could be optional args to indicate a particular CompadreM slot(s) to be used as args to the function to make it easy to use with predefined functions that take a single matrix argument (e.g. popbio::lambda). The function would return the modified CompadreData object which you could then use in pipes with subsetDB etc.

This is an alternative approach to flattening the data (cf. #27, #29) that maintains the integrity of the matrix objects while allowing users access to their contents to calculate derived values.

patrickbarks · 2018-12-07T08:20:50Z

closed by #32 and #48

patrickbarks added the enhancement label Aug 21, 2018

patrickbarks mentioned this issue Aug 21, 2018

Adding derived vectors/matrices to a CompadreData object #26

Closed

levisc8 mentioned this issue Aug 21, 2018

Restructure the CompadreData class? #27

Closed

patrickbarks mentioned this issue Dec 2, 2018

Change fn names to object_verb format, and add ggplot method #48

Merged

patrickbarks closed this as completed Dec 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making Rcompadre pipe-friendly #25

Making Rcompadre pipe-friendly #25

patrickbarks commented Aug 21, 2018

iainmstott commented Aug 21, 2018

tdjames1 commented Aug 21, 2018

levisc8 commented Aug 21, 2018

iainmstott commented Aug 21, 2018

iainmstott commented Aug 21, 2018

patrickbarks commented Aug 22, 2018

tdjames1 commented Aug 24, 2018 •

edited

Loading

patrickbarks commented Dec 7, 2018

Making Rcompadre pipe-friendly #25

Making Rcompadre pipe-friendly #25

Comments

patrickbarks commented Aug 21, 2018

iainmstott commented Aug 21, 2018

tdjames1 commented Aug 21, 2018

levisc8 commented Aug 21, 2018

iainmstott commented Aug 21, 2018

iainmstott commented Aug 21, 2018

patrickbarks commented Aug 22, 2018

tdjames1 commented Aug 24, 2018 • edited Loading

patrickbarks commented Dec 7, 2018

tdjames1 commented Aug 24, 2018 •

edited

Loading