Flatten the matrix slot? #29

patrickbarks · 2018-08-24T12:20:59Z

(With apologies for the length)

Rather than the nested structure of the current mat slot (or "column", à la #27), I propose having separate list-columns for matA, matU, matF, matC, MatrixClassOrganized, and MatrixClassAuthor. The reason is that most MPM functions (e.g. in popdemo, popbio) act on a matrix, so we should make it easier for users to access matrices, and particularly, to vectorize over a set of matrices.

Vectorizing

Currently, vectorizing with popdemo or popbio functions requires writing a custom function to access the slots within mat, e.g.

ergodic <- sapply(db@mat, function(x) popdemo::isErgodic(x@matA))
lambda <- sapply(db@mat, function(x) popbio::lambda(x@matA))

With a flat structure a user could vectorize over matA without having to write a custom function, e.g.

ergodic <- sapply(db$matA, popdemo::isErgodic)
lambda <- sapply(db$matA, popbio::lambda)

The situation is more nuanced when it comes to Rage, because Rage functions could eventually work with CompadreM objects, in which case a user could vectorize over db@mat without writing custom functions. However, for the Rage functions that take a single matrix (i.e. kEntropy, qsdConverge, reprodStages, identfityReproStages, splitMatrix), vectorizing with the flat version would be just as easy, e.g.

# CompadreM version
k_ent <- sapply(db@mat, Rage::kEntropy)

# flat version
k_ent <- sapply(db$matU, Rage::kEntropy)

For some Rage functions that take multiple matrix arguments (e.g. matrixElementPerturbation), vectorizing with the CompadreM object would admittedly be nicer, e.g.

# CompadreM version
mat_pert <- lapply(db$mat, Rage::matrixElementPerturbation)

# flat version
mat_pert <- mapply(Rage::matrixElementPerturbation, db$matU, db$matF, SIMPLIFY = FALSE)

For Rage functions that will often be used with additional row-specific arguments (e.g. longevity, rearrangeMatrix, reprodStages), vectorizing with the flat version is just as easy, (e.g. calculating lifespan wrt the first non-propagule stage)

# CompadreM version
start_life <- sapply(db@mat, function(x) min(which(x@matrixClass$MatrixClassOrganized == "active")))
lifespan <- mapply(Rage::longevity, db@mat, startLife = start_life, SIMPLIFY = FALSE)

# flat version
start_life <- sapply(db$MatrixClassOrganized, function(x) min(which(x == "active")))
lifespan <- mapply(Rage::longevity, db$matU, startLife = start_life, SIMPLIFY = FALSE)

Finally, there are a few Rage functions (R0, dEntropy, lifeTimeRepEvents, and makeLifeTable) for which having a CompadreM method would make the function overly difficult to document and understand (in my opinion), because the user may wish to apply them to matF-only, matC-only, or matF and matC (or in the extreme case of makeLifeTable, also matU-only).

As outlined in jonesor/Rage#19, I think we should simplify these functions so that they only take one 'reproductive matrix' argument (e.g. matR), and correspondingly make it easier for users to derive and store matFC (= matF + matC) within db.

Printing

The flat version makes it easier to examine a series of matrices from the same study (e.g. reflecting different years or populations)

# CompadreM version
db@mat[[450]]@matA
db@mat[[451]]@matA
db@mat[[452]]@matA

# flat version
db$matA[450:452,]

Though the CompadreM version makes it easier to examine all the matrices for a given row

# CompadreM version
db@mat[[450]]

# flat version
db$matA[[450]]
db$matU[[450]]
db$matF[[450]]
db$matC[[450]]

Matrix validation

I think the matrix-validation function of the CompadreM class could be moved to a new Rcompadre function (e.g. db_validate). I don't see too much benefit of 'built-in' validation of things like matrix dimension, non-negative values, etc. Definitely those things should be validated on the COMPADRE side, but after that I think it's fine to leave things to the user.

The text was updated successfully, but these errors were encountered:

levisc8 · 2018-08-27T16:03:40Z

I don't know, I think the benefits of keeping it outweigh the advantages of removal. responses in reverse order:

I think in general, user-flexibility is good. But I also think the class definition isn't overly restrictive and only keeps them from shooting themselves in the foot. If we can provide that functionality without burdening them (and I don't think the class is so complicated as to inhibit usability), I think we should keep it.
You can print multiple S4 objects using something like db$matdata[1:3, ] (to see all matrices) or using matA(db[1:3, ]) (to see specific ones). Note that my branch currently does still have the CompadreData class, but I'll change that shortly.
I feel like the benefit of vectorizing over multiple arguments for certain functions makes it worth keeping. It should be equally easy to vectorize for functions that take single matrices too. For the Rage functions that you mention at the end, we could include something like this the examples:

# Using matF and matC
matR <- purrr::map2(.x = matF(db), .y = matC(db), .f = function(x., .y) .x + .y)

purrr::map(matR, ~Rage::R0)

# matF only. Skips the first mapping of calculations
purrr::map(matF(db), ~Rage::R0)

In the examples above, you could use `lapply` too, it's just a bit trickier.

I don't feel too strongly about this though, just that it's worth considering the utility of the CompadreM class before we ditch it entirely. What do others think?

patrickbarks · 2018-08-28T08:32:14Z

Good points. I see that the accessor functions could be helpful if we retain the nested structure of mat.

My biggest qualm pertains to the matF+matC functions (jonesor/Rage#19). What I meant was that having a CompadreM method would make those functions harder to document and understand.

E.g. In the tidy version of R0 there would be 3 arguments, all 'mandatory':

matU
matR
startLife

In a version with CompadreM functionality there would have to be an extra argument (e.g. reproType: if CompadreM whether to use 'matF', 'matC', or 'matF+matC'), and 2 of the 4 arguments become conditional:

x: either CompadreM or matU
matR: (only needed if x = matU)
startLife:
reproType: (only needed if x = CompadreM)

I just prefer the simplicity of the tidy version.

More pedantically... I don't love the asymmetry of the single nested column in an otherwise tidy object. The accessor functions are nice, but they're slightly unintuitive for a user used to tidy data frames, particularly in an object where the 47 other variables can all be accessed with db$.

There's also asymmetry in that user-derived MPMs (e.g. collapsed, rearranged, λ-standardized) or MPM components will live outside mat and therefore require sightly different methods to program with. E.g. Calculating a group mean matrix (#6) wrt matF (lives within mat) is slightly different than calculating wrt matR or matF_collapsed (live outside mat).

A counterpoint to the convenience of only passing one CompadreM object is that it's a bit more black-box-y. mat has 4 slots, and the various Rage functions may act on any possible combination... if a user passes CompadreM it's not inherently clear what a given function will act on (obviously this can be remedied with good documentation).

…cessor functions for data, small updates to functions. Closes 4 issues. 1. Changed the structure of CompadreData object, so that now there are two slots: 1. 'data', which contains a data.frame with the original 'metadata' plus a list column 'mat' containing the CompadreM objects, and 2. a 'version' slot which contains version information. This addresses issues jonesor#27 and jonesor#29 at github.com/jonesor/Rcompadre and ties the order metadata and matrices more closely together. 2. Accessor functions for almost anything contained in the databases have been added, which addresses issue jonesor#23. It is still possible to access matrices and metadata separately to one another, even though they are now contained in one data.frame. 3. All functions and methods have been updated to use these accessor functions rather than accessing information directly from the slots (completes issue jonesor#15). If we wish to change the structure of the classes in the future, this minimises the work needed to go into making sure the methods and functions work with the new structure, as we only have to change the accessor functions rather than everything. Accessor functions should be used in the future rather than accessing information directly from the slots. 4. Added a pseudo superclass CompadreMorData which allows methods to be utilised by both classes (these pertain to matrices and matrixClass information). 5. Changed collapseMatrix, getMeanMatF, IdentifyReprodStages, rearrangeMatrix, splitMatrix so that they work with CompadreM objects as well as specific matrices that have been passed to them.

patrickbarks · 2018-11-15T07:16:59Z

I've mostly come around on this. I personally prefer working with a totally flat structure, but I see the benefits of having a single mat slot/column in the package, so closing this issue.

tdjames1 mentioned this issue Aug 24, 2018

Making Rcompadre pipe-friendly #25

Closed

patrickbarks mentioned this issue Aug 26, 2018

Restructure the CompadreData class? #27

Closed

iainmstott mentioned this issue Nov 12, 2018

New CompadreData structure, new accessor methods, all functions use accessor methods #32

Merged

patrickbarks closed this as completed Nov 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flatten the matrix slot? #29

Flatten the matrix slot? #29

patrickbarks commented Aug 24, 2018

levisc8 commented Aug 27, 2018

patrickbarks commented Aug 28, 2018

patrickbarks commented Nov 15, 2018

Flatten the matrix slot? #29

Flatten the matrix slot? #29

Comments

patrickbarks commented Aug 24, 2018

Vectorizing

Printing

Matrix validation

levisc8 commented Aug 27, 2018

patrickbarks commented Aug 28, 2018

patrickbarks commented Nov 15, 2018