Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making Rcompadre pipe-friendly #25

Closed
patrickbarks opened this issue Aug 21, 2018 · 8 comments
Closed

Making Rcompadre pipe-friendly #25

patrickbarks opened this issue Aug 21, 2018 · 8 comments

Comments

@patrickbarks
Copy link
Collaborator

As noted briefly in PR #24, I'm interested in making Rcompadre work nicely with dplyr's pipe operator (%>%). The pipe operator passes an object on the left side to a function on the right side (e.g. x %>% mean() == mean(x)), and when used in series can make code more readable.

A particular piping sequence I often perform with compadre is to calculate some quantity for every row of the db, add it as a column to the metadata, and then subset the db based on that new column (and repeat). In the past I've worked with a tibble version of the db (i.e. metatadata + list-columns for matA, matU, ..., matrixClass) to make this sequence easier:

library(dplyr)

compadre_tb <- as_tibble(compadre$metadata) %>% 
  mutate(matA = lapply(compadre$mat, function(x) x$matA),
         matU = lapply(compadre$mat, function(x) x$matU),
         matF = lapply(compadre$mat, function(x) x$matF),
         matC = lapply(compadre$mat, function(x) x$matC),
         matrixClass = compadre$matrixClass)

For instance, say I want to work with a set of matrices reflecting populations in decline (lambda < 1), and I only want ergodic matrices with no NAs. With the tibble version I can use a sequence of dplyr and purrr functions to repeatedly add columns (mutate) and subset (filter) based on those new columns.

library(purrr)

compadre_use <- compadre_tb %>% 
  mutate(na_matA = map_lgl(matA, ~ any(is.na(.x)))) %>% 
  filter(na_matA == FALSE) %>% 
  mutate(ergodic = map_lgl(matA, popdemo::isErgodic)) %>% 
  filter(ergodic == TRUE) %>% 
  mutate(lambda = map_dbl(matA, popbio::lambda)) %>% 
  filter(lambda < 1)

With a CompadreData object, the equivalent sequence might look something like this:

compadre_s4_use <- compadre_s4 %>% 
  cleanDB() %>% 
  subsetDB(check_NA_A == FALSE & check_ergodic == TRUE)

compadre_s4_use@metadata$lambda <- sapply(compadre_s4_use@mat,
                                          function(x) popbio::lambda(x@matA))

compadre_s4_use <- subsetDB(compadre_s4_use, lambda < 1)

So piping works fine with subsetting (and cleanDB), but I can't replicate the fully-piped sequence without an Rcompadre equivalent to dplyr::mutate(). What we would need is a function that takes a CompadreData object as the first argument, and returns a CompadreData object with an additional metadata column (based on some transformation specified in the second argument). I don't know what this function would entail in practice, but I think this general type of functionality would be desirable (to me anyway). Thoughts?

@iainmstott
Copy link
Collaborator

I think this would require a new definition for %>%, which is an exported method from dplyr. You're passing it an object (class CompadreData) which it won't understand, so we'll need to import it from dplyr and add it to the methods for CompadreData objects. This would mean the package importing dplyr and all of its imports (a fair few).

However, I think dplyr imports the pipe from magrittr which has no imports. Maybe we should look into using that instead.

(Someone else also should weigh in here: this is outside my personal experience)

@tdjames1
Copy link
Collaborator

I'm not sure that piping is the problem here, it's having a function that does what @patrickbarks wants to do (in the example, take the output from subsetDB() and modify its metadata, in order to then subset on the new column).

What this comes down to is that people may want to be able to subset based on some properties of the matrices (is_ergodic, has_NAs, lambda). Perhaps then it's a question of making subsetDB able to do those calculations on the fly? I've no idea how this but I'm thinking something like:

subsetDB(check_NA(matA) == FALSE)

@levisc8
Copy link
Collaborator

levisc8 commented Aug 21, 2018

As best I can tell, all you need to do to make it pipe friendly is make sure the first argument of the subsequent function is the CompadreData object. %>% doesn't really recognize classes, just passes the output of the left side to the first part of the right side.

I've only tried this with S3 for other projects, but I don't think it should be any different for S4.

@iainmstott
Copy link
Collaborator

I don't think I'm grasping the issue properly!

But I think I get it now.

In Patrick's example, I guess it would work if:

compadre_s4_use <- compadre_s4 %>% 
  cleanDB()
  subsetDB(compadre_S4_use, check_NA_A == FALSE & check_ergodic == TRUE)

I guess it's not working in the pipe sequence because the variables check_NA_A and check_ergodic aren't saved in the object yet (they normally would be after a call to cleanDB().

@iainmstott
Copy link
Collaborator

@levisc8 beat me to it.

But I guess it doesn't change the original problem? I can see now it may be useful to have some sort of function that adds to the metadata to make the objects more pipe-friendly so that (as @tdjames1) pointed out), @patrickbarks first code sequence would work with a CompadreData object, adding to the metadata

@patrickbarks
Copy link
Collaborator Author

This thread helped me clarify what I'm after. I guess I was really only interested in creating a mutate-like function that takes a CompadreData db, and adds a column to db@metadata based on some transformation of the original db@metadata and/or db@mat. It's the latter part that I think would be tricky to code (i.e. allowing the user to specify how the new column is derived, and specifically, allowing the transformation to be a function of db@metadata and/or db@mat).

In any case, the proposal in #27 would make this unnecessary because with the flat structure of the proposed data slot, a user could literally just use dplyr::mutate.

@tdjames1
Copy link
Collaborator

tdjames1 commented Aug 24, 2018

Stepping back a bit, I wonder if there is any mileage in thinking about implementing a method that allows user to mutate metadata with reference to the CompadreM object, e.g.

mutateDB(compadre_obj, <new_col> = <function of CompadreM>)

The user-specified function would be applied to the mat slot and the (required to be data frame friendly) output appended to metadata as a new column. There could be optional args to indicate a particular CompadreM slot(s) to be used as args to the function to make it easy to use with predefined functions that take a single matrix argument (e.g. popbio::lambda). The function would return the modified CompadreData object which you could then use in pipes with subsetDB etc.

This is an alternative approach to flattening the data (cf. #27, #29) that maintains the integrity of the matrix objects while allowing users access to their contents to calculate derived values.

@patrickbarks
Copy link
Collaborator Author

closed by #32 and #48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants