-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
matchApply #85
matchApply #85
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR Andrea! Looks good, but seeing the code I somehow questioned if this would be userfriendly. Maybe simply iterating over the length of object
and passing each Matched
for a query element to the function might be easier to understand for the user instead having the 3 parameters matches
, query
and target
(which are anyway part of the Matched
). I tried to describe this also in the comments. Think over it and then let's discuss again.
R/Matched.R
Outdated
@@ -81,7 +81,15 @@ | |||
#' small (or, depending on parameter `decreasing`, large) values for | |||
#' `"score"` **and** `"score_rt"` are returned. | |||
#' | |||
#' - `pruneTarget` *cleans* the object by removing non-matched | |||
#' - `matchApply`: allows to apply a user defined function `FUN` to each subset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs mode documentation. You should mention:
FUN
is expected to be a function taking argumentsx
,query
,target
and tell what the individual arguments are.- By default, with
returnMatched = TRUE
,FUN
is expected to return adata.frame
with at least the same columns asx
(additional columns would be allowed) and in this case aMatched
object is returned. - If
returnMatched = FALSE
FUN
can return any value andapplyMatched
will return alist
of length equal to the number of query elements with these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have mentioned it in the @param
section for FUN
and returnMatched
but you are right it's good to be more clear. I will expand also based on the changes we are discussing.
R/Matched.R
Outdated
stop("`FUN` must have each one of the arguments ", | ||
"\"matches\", \"query\", \"target\"") | ||
tmp <- split.data.frame(object@matches, object@matches$query_idx) | ||
tmp <- do.call(rbind, lapply(tmp, FUN, object@query, object@target, ...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not directly call rbind
here, but only call rbind
if returnMatched = TRUE
. Otherwise I would simply return the result of lapply
.
R/Matched.R
Outdated
tmp <- split.data.frame(object@matches, object@matches$query_idx) | ||
tmp <- do.call(rbind, lapply(tmp, FUN, object@query, object@target, ...)) | ||
rownames(tmp) <- seq_len(nrow(tmp)) | ||
if(!returnMatched) return(tmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I think it would be important, for returnMatched = FALSE
that the list
has the same length than there are query elements. Otherwise the user will not know to which element the result belongs to. Maybe that could be possible with splitting the data specifying as levels a sequence along the number of query elements? Alternatively, you could create an empty list
with length equal to the number of query elements and then assign the results from lapply
using the $query_idx
in @matches
?
R/Matched.R
Outdated
#' representing updated matches between `query` and `target`. Such | ||
#' `data.frame` can be returned directly (`returnMatches = FALSE`) or a | ||
#' `Matched` object can be returned to represent modified matches | ||
#' (`returnMatches = TRUE`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a simple example to the example section would be helpful. With something like "select for each query the target with the highest score" and show that the results are the same as with filterMatched,TopRankedMatchesParam
.
R/Matched.R
Outdated
#' | ||
#' @export | ||
matchApply <- function(object, FUN, returnMatched = TRUE, ...) { | ||
if (any(!c("matches", "query", "target") %in% formalArgs(FUN))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's actually a nice check - but I would maybe not be that restrictive and just mention in the documentation that the matches
data.frame
for one query element will be passed to the function as first element, the full query
object as second and the full target
element as third. Then also simple functions like function(x, ...)
could be used, that simply do something with the matches
data.frame
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was not sure of this check and I was thinking too to skip it and just write in the documentation which kind of function the user should provide. Regarding passing a function like function(x,...)
it's not clear how that could work. Beacuse of the way the cose is written at the moment (also imagining that the check was not there) FUN
should have necessarily params query
and target
to receive object@query
and object@target
within matchApply
. But anyway I think what you suggested below is a good idea and so maybe it's good to leave this and go in that direction.
R/Matched.R
Outdated
if (any(!c("matches", "query", "target") %in% formalArgs(FUN))) | ||
stop("`FUN` must have each one of the arguments ", | ||
"\"matches\", \"query\", \"target\"") | ||
tmp <- split.data.frame(object@matches, object@matches$query_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code I was actually wondering - would it not be much easier to simply iterate over the whole object and pass a Matched
of length 1 to the FUN
? Something like:
res <- vector("list", length(object))
for (i in seq_along(object)) {
res[[i]] <- FUN(object[i], ...)
}
Happy to discuss this further - also if it would be possible, e.g. to merge then the result back into a single Matched
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have the feeling that it might be easier for the user to define a function that gets a Matched
object for one query as input to do something on that object. That would be similar to e.g. spectrapply
in Spectra
that applies a function to each element in a Spectra
object. Here, matchedApply
(maybe even a better name than matchApply
?) would apply a function to a Matched
of each query. The FUN
would thus only need a single parameter (and optional ones) and the user would thus not have to learn about the contents of the 3 different parameters. What do you think @andreavicini ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that the code at the moment is not very user friendly (the user has to know how the object works) and also I think it's a good idea what you proposed. But maybe also defining a FUN
acting on just the object itself would not be immediate for the user. Maybe we could define some of these functions if we have some frequent use cases in mind so that the less experienced user can use them while the more experienced user can also write its own. For example considering the second use case that you mentioned in the issue #84 we could create another param object for the filterMatches
that given a set of values for a coulmn in target
(or query
?) keeps (or removes) only the matches corresponding to target
elements having that value. For the first one we could add new filtering functions based on a certain threshold for score. But actually these functions don't need looping over subsets of matches for each query element so maybe it's not the best example. Regardless of that I like what you proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with you on the more user-friendly functions (that then eventually don't need any loop as you suggest).
I would just (more for the advanced users) like to have a function that allows users to loop over the Matched
object and apply a function to each individual one. We have a similar function (spectrapply
) in Spectra
. Thus, maybe we should adapt the matchedapply
function to apply FUN
to each element of the Matched
object. Let's then check how that feels to see if some more changes are needed.
Hi, I should have updated
I’m not sure how to call the parameter for these filter functions though. Maybe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andrea! It looks all good - but I changed my mind again - sorry for that ;)
See the comment, but I think it might be better to separate the functionality into two different functions, one lapply
that allows to apply any function to each Matched
and returns whatever the function returns, and one endoapply
that returns a Matched
result object. Just by looking at the code I realized that merging these two possibilities into one single function might result in a pretty complicated function with too many if
.
R/Matched.R
Outdated
tmp <- do.call(rbind, lapply(tmp, FUN, object@query, object@target, ...)) | ||
rownames(tmp) <- seq_len(nrow(tmp)) | ||
if(!returnMatched) return(tmp) | ||
res <- lapply(seq_along(object), function(i) FUN(object[i], ...)@matches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this requires that FUN
returns a Matched
object (otherwise the @matches
would not work). Seeing this code I now realize that maybe it might be better to split the functionality into two separate functions:
-
lapply
to apply any function to each element in aMatched
and return simply its result (i.e. simply callinglapply(seq_along(object), function(i) FUN(object[i], ...))
-
endoapply
to apply a user function (that must return aMatched
object) and returns aMatched
object of same length than the input object. This function would then be yourmatchApply
, just without thereturnMatched
.
Sorry for this confusion, but I think this would make more sense code-wise (less if
statements) and would also maybe clearer to the user. Note on the endoapply
: that is a function implemented in S4Vectors
that performs an endomorphic operation, i.e. the result object is from the same type as the input object. See here for more information.
For lapply
, it would need to be implemented as a method for Matched
object (the generic is available in BiocGenerics
, so you would need to import the method from there and implement a method for Matched
). endoapply
is a little more difficult since it's a function. Maybe define the generic method in MetaboAnnotation
and have implement a method for ANY
that calls S4Vectors::endoapply
and implement one for Matched
objects.
You would need to import the lapply
method from BiocGenerics
and implement one for Matched
. endoapply
is unfortunately only a function
- maybe there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks Andrea!
No description provided.