Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatten the matrix slot? #29

Closed
patrickbarks opened this issue Aug 24, 2018 · 3 comments
Closed

Flatten the matrix slot? #29

patrickbarks opened this issue Aug 24, 2018 · 3 comments

Comments

@patrickbarks
Copy link
Collaborator

(With apologies for the length)

Rather than the nested structure of the current mat slot (or "column", à la #27), I propose having separate list-columns for matA, matU, matF, matC, MatrixClassOrganized, and MatrixClassAuthor. The reason is that most MPM functions (e.g. in popdemo, popbio) act on a matrix, so we should make it easier for users to access matrices, and particularly, to vectorize over a set of matrices.

Vectorizing

Currently, vectorizing with popdemo or popbio functions requires writing a custom function to access the slots within mat, e.g.

ergodic <- sapply(db@mat, function(x) popdemo::isErgodic(x@matA))
lambda <- sapply(db@mat, function(x) popbio::lambda(x@matA))

With a flat structure a user could vectorize over matA without having to write a custom function, e.g.

ergodic <- sapply(db$matA, popdemo::isErgodic)
lambda <- sapply(db$matA, popbio::lambda)

The situation is more nuanced when it comes to Rage, because Rage functions could eventually work with CompadreM objects, in which case a user could vectorize over db@mat without writing custom functions. However, for the Rage functions that take a single matrix (i.e. kEntropy, qsdConverge, reprodStages, identfityReproStages, splitMatrix), vectorizing with the flat version would be just as easy, e.g.

# CompadreM version
k_ent <- sapply(db@mat, Rage::kEntropy)

# flat version
k_ent <- sapply(db$matU, Rage::kEntropy)

For some Rage functions that take multiple matrix arguments (e.g. matrixElementPerturbation), vectorizing with the CompadreM object would admittedly be nicer, e.g.

# CompadreM version
mat_pert <- lapply(db$mat, Rage::matrixElementPerturbation)

# flat version
mat_pert <- mapply(Rage::matrixElementPerturbation, db$matU, db$matF, SIMPLIFY = FALSE)

For Rage functions that will often be used with additional row-specific arguments (e.g. longevity, rearrangeMatrix, reprodStages), vectorizing with the flat version is just as easy, (e.g. calculating lifespan wrt the first non-propagule stage)

# CompadreM version
start_life <- sapply(db@mat, function(x) min(which(x@matrixClass$MatrixClassOrganized == "active")))
lifespan <- mapply(Rage::longevity, db@mat, startLife = start_life, SIMPLIFY = FALSE)

# flat version
start_life <- sapply(db$MatrixClassOrganized, function(x) min(which(x == "active")))
lifespan <- mapply(Rage::longevity, db$matU, startLife = start_life, SIMPLIFY = FALSE)

Finally, there are a few Rage functions (R0, dEntropy, lifeTimeRepEvents, and makeLifeTable) for which having a CompadreM method would make the function overly difficult to document and understand (in my opinion), because the user may wish to apply them to matF-only, matC-only, or matF and matC (or in the extreme case of makeLifeTable, also matU-only).

As outlined in jonesor/Rage#19, I think we should simplify these functions so that they only take one 'reproductive matrix' argument (e.g. matR), and correspondingly make it easier for users to derive and store matFC (= matF + matC) within db.

Printing

The flat version makes it easier to examine a series of matrices from the same study (e.g. reflecting different years or populations)

# CompadreM version
db@mat[[450]]@matA
db@mat[[451]]@matA
db@mat[[452]]@matA

# flat version
db$matA[450:452,]

Though the CompadreM version makes it easier to examine all the matrices for a given row

# CompadreM version
db@mat[[450]]

# flat version
db$matA[[450]]
db$matU[[450]]
db$matF[[450]]
db$matC[[450]]

Matrix validation

I think the matrix-validation function of the CompadreM class could be moved to a new Rcompadre function (e.g. db_validate). I don't see too much benefit of 'built-in' validation of things like matrix dimension, non-negative values, etc. Definitely those things should be validated on the COMPADRE side, but after that I think it's fine to leave things to the user.

@levisc8
Copy link
Collaborator

levisc8 commented Aug 27, 2018

I don't know, I think the benefits of keeping it outweigh the advantages of removal. responses in reverse order:

  1. I think in general, user-flexibility is good. But I also think the class definition isn't overly restrictive and only keeps them from shooting themselves in the foot. If we can provide that functionality without burdening them (and I don't think the class is so complicated as to inhibit usability), I think we should keep it.

  2. You can print multiple S4 objects using something like db$matdata[1:3, ] (to see all matrices) or using matA(db[1:3, ]) (to see specific ones). Note that my branch currently does still have the CompadreData class, but I'll change that shortly.

  3. I feel like the benefit of vectorizing over multiple arguments for certain functions makes it worth keeping. It should be equally easy to vectorize for functions that take single matrices too. For the Rage functions that you mention at the end, we could include something like this the examples:

# Using matF and matC
matR <- purrr::map2(.x = matF(db), .y = matC(db), .f = function(x., .y) .x + .y)

purrr::map(matR, ~Rage::R0)

# matF only. Skips the first mapping of calculations
purrr::map(matF(db), ~Rage::R0)
In the examples above, you could use `lapply` too, it's just a bit trickier.

I don't feel too strongly about this though, just that it's worth considering the utility of the CompadreM class before we ditch it entirely. What do others think?

@patrickbarks
Copy link
Collaborator Author

Good points. I see that the accessor functions could be helpful if we retain the nested structure of mat.

My biggest qualm pertains to the matF+matC functions (jonesor/Rage#19). What I meant was that having a CompadreM method would make those functions harder to document and understand.

E.g. In the tidy version of R0 there would be 3 arguments, all 'mandatory':

  • matU
  • matR
  • startLife

In a version with CompadreM functionality there would have to be an extra argument (e.g. reproType: if CompadreM whether to use 'matF', 'matC', or 'matF+matC'), and 2 of the 4 arguments become conditional:

  • x: either CompadreM or matU
  • matR: (only needed if x = matU)
  • startLife:
  • reproType: (only needed if x = CompadreM)

I just prefer the simplicity of the tidy version.


More pedantically... I don't love the asymmetry of the single nested column in an otherwise tidy object. The accessor functions are nice, but they're slightly unintuitive for a user used to tidy data frames, particularly in an object where the 47 other variables can all be accessed with db$.

There's also asymmetry in that user-derived MPMs (e.g. collapsed, rearranged, λ-standardized) or MPM components will live outside mat and therefore require sightly different methods to program with. E.g. Calculating a group mean matrix (#6) wrt matF (lives within mat) is slightly different than calculating wrt matR or matF_collapsed (live outside mat).

A counterpoint to the convenience of only passing one CompadreM object is that it's a bit more black-box-y. mat has 4 slots, and the various Rage functions may act on any possible combination... if a user passes CompadreM it's not inherently clear what a given function will act on (obviously this can be remedied with good documentation).

iainmstott added a commit to iainmstott/Rcompadre that referenced this issue Nov 12, 2018
…cessor functions for data, small updates to functions. Closes 4 issues.

1. Changed the structure of CompadreData object, so that now there are two slots: 1. 'data', which contains a data.frame with the original 'metadata' plus a list column 'mat' containing the CompadreM objects, and 2. a 'version' slot which contains version information. This addresses issues jonesor#27 and jonesor#29 at github.com/jonesor/Rcompadre and ties the order metadata and matrices more closely together.
2. Accessor functions for almost anything contained in the databases have been added, which addresses issue jonesor#23. It is still possible to access matrices and metadata separately to one another, even though they are now contained in one data.frame.
3. All functions and methods have been updated to use these accessor functions rather than accessing information directly from the slots (completes issue jonesor#15). If we wish to change the structure of the classes in the future, this minimises the work needed to go into making sure the methods and functions work with the new structure, as we only have to change the accessor functions rather than everything. Accessor functions should be used in the future rather than accessing information directly from the slots.
4. Added a pseudo superclass CompadreMorData which allows methods to be utilised by both classes (these pertain to matrices and matrixClass information).
5. Changed collapseMatrix, getMeanMatF, IdentifyReprodStages, rearrangeMatrix, splitMatrix so that they work with CompadreM objects as well as specific matrices that have been passed to them.
@patrickbarks
Copy link
Collaborator Author

I've mostly come around on this. I personally prefer working with a totally flat structure, but I see the benefits of having a single mat slot/column in the package, so closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants