matchApply #85

andreavicini · 2022-05-25T12:21:55Z

No description provided.

jorainer

Thanks for the PR Andrea! Looks good, but seeing the code I somehow questioned if this would be userfriendly. Maybe simply iterating over the length of object and passing each Matched for a query element to the function might be easier to understand for the user instead having the 3 parameters matches, query and target (which are anyway part of the Matched). I tried to describe this also in the comments. Think over it and then let's discuss again.

jorainer · 2022-05-26T12:04:46Z

R/Matched.R

@@ -81,7 +81,15 @@
 #'     small (or, depending on parameter `decreasing`, large) values for
 #'     `"score"` **and** `"score_rt"` are returned.
 #'
-#' - `pruneTarget` *cleans* the object by removing non-matched
+#' - `matchApply`: allows to apply a user defined function `FUN` to each subset


I think this needs mode documentation. You should mention:

FUN is expected to be a function taking arguments x, query, target and tell what the individual arguments are.

By default, with returnMatched = TRUE, FUN is expected to return a data.frame with at least the same columns as x (additional columns would be allowed) and in this case a Matched object is returned.

If returnMatched = FALSE FUN can return any value and applyMatched will return a list of length equal to the number of query elements with these values.

I think I have mentioned it in the @param section for FUN and returnMatched but you are right it's good to be more clear. I will expand also based on the changes we are discussing.

jorainer · 2022-05-26T12:08:02Z

R/Matched.R

+        stop("`FUN` must have each one of the arguments ",
+             "\"matches\", \"query\", \"target\"")
+    tmp <- split.data.frame(object@matches, object@matches$query_idx)
+    tmp <- do.call(rbind, lapply(tmp, FUN, object@query, object@target, ...))


I would not directly call rbind here, but only call rbind if returnMatched = TRUE. Otherwise I would simply return the result of lapply.

jorainer · 2022-05-26T12:12:21Z

R/Matched.R

+    tmp <- split.data.frame(object@matches, object@matches$query_idx)
+    tmp <- do.call(rbind, lapply(tmp, FUN, object@query, object@target, ...))
+    rownames(tmp) <- seq_len(nrow(tmp))
+    if(!returnMatched) return(tmp)


Note: I think it would be important, for returnMatched = FALSE that the list has the same length than there are query elements. Otherwise the user will not know to which element the result belongs to. Maybe that could be possible with splitting the data specifying as levels a sequence along the number of query elements? Alternatively, you could create an empty list with length equal to the number of query elements and then assign the results from lapply using the $query_idx in @matches?

jorainer · 2022-05-26T12:13:54Z

R/Matched.R

+#'   representing updated matches between `query` and `target`. Such
+#'   `data.frame` can be returned directly (`returnMatches = FALSE`) or a
+#'   `Matched` object can be returned to represent modified matches
+#'   (`returnMatches = TRUE`).


I think adding a simple example to the example section would be helpful. With something like "select for each query the target with the highest score" and show that the results are the same as with filterMatched,TopRankedMatchesParam.

jorainer · 2022-05-26T12:17:25Z

R/Matched.R

+#'
+#' @export
+matchApply <- function(object, FUN, returnMatched = TRUE, ...) {
+    if (any(!c("matches", "query", "target") %in% formalArgs(FUN)))


that's actually a nice check - but I would maybe not be that restrictive and just mention in the documentation that the matches data.frame for one query element will be passed to the function as first element, the full query object as second and the full target element as third. Then also simple functions like function(x, ...) could be used, that simply do something with the matches data.frame.

Yes, I was not sure of this check and I was thinking too to skip it and just write in the documentation which kind of function the user should provide. Regarding passing a function like function(x,...) it's not clear how that could work. Beacuse of the way the cose is written at the moment (also imagining that the check was not there) FUN should have necessarily params query and target to receive object@query and object@target within matchApply. But anyway I think what you suggested below is a good idea and so maybe it's good to leave this and go in that direction.

jorainer · 2022-05-26T12:25:20Z

R/Matched.R

+    if (any(!c("matches", "query", "target") %in% formalArgs(FUN)))
+        stop("`FUN` must have each one of the arguments ",
+             "\"matches\", \"query\", \"target\"")
+    tmp <- split.data.frame(object@matches, object@matches$query_idx)


Looking at the code I was actually wondering - would it not be much easier to simply iterate over the whole object and pass a Matched of length 1 to the FUN? Something like:

res <- vector("list", length(object)) for (i in seq_along(object)) { res[[i]] <- FUN(object[i], ...) }

Happy to discuss this further - also if it would be possible, e.g. to merge then the result back into a single Matched.

I just have the feeling that it might be easier for the user to define a function that gets a Matched object for one query as input to do something on that object. That would be similar to e.g. spectrapply in Spectra that applies a function to each element in a Spectra object. Here, matchedApply (maybe even a better name than matchApply?) would apply a function to a Matched of each query. The FUN would thus only need a single parameter (and optional ones) and the user would thus not have to learn about the contents of the 3 different parameters. What do you think @andreavicini ?

Yes, I agree that the code at the moment is not very user friendly (the user has to know how the object works) and also I think it's a good idea what you proposed. But maybe also defining a FUN acting on just the object itself would not be immediate for the user. Maybe we could define some of these functions if we have some frequent use cases in mind so that the less experienced user can use them while the more experienced user can also write its own. For example considering the second use case that you mentioned in the issue #84 we could create another param object for the filterMatches that given a set of values for a coulmn in target (or query?) keeps (or removes) only the matches corresponding to target elements having that value. For the first one we could add new filtering functions based on a certain threshold for score. But actually these functions don't need looping over subsets of matches for each query element so maybe it's not the best example. Regardless of that I like what you proposed.

I totally agree with you on the more user-friendly functions (that then eventually don't need any loop as you suggest).

I would just (more for the advanced users) like to have a function that allows users to loop over the Matched object and apply a function to each individual one. We have a similar function (spectrapply) in Spectra. Thus, maybe we should adapt the matchedapply function to apply FUN to each element of the Matched object. Let's then check how that feels to see if some more changes are needed.

andreavicini · 2022-07-06T08:58:56Z

Hi, I should have updated matchApply. So if we think they are useful I could also add the filter functions we discussed above I.e.

filter the matches with score below (or above) a certain threshold.
filter the matches where the query or target element is in a certain set of values.

I’m not sure how to call the parameter for these filter functions though. Maybe ScoreThresholdParam for the first one? Do you have maybe some ideas?

jorainer

Thanks Andrea! It looks all good - but I changed my mind again - sorry for that ;)

See the comment, but I think it might be better to separate the functionality into two different functions, one lapply that allows to apply any function to each Matched and returns whatever the function returns, and one endoapply that returns a Matched result object. Just by looking at the code I realized that merging these two possibilities into one single function might result in a pretty complicated function with too many if.

jorainer · 2022-07-07T06:25:55Z

R/Matched.R

-    tmp <- do.call(rbind, lapply(tmp, FUN, object@query, object@target, ...))
-    rownames(tmp) <- seq_len(nrow(tmp))
-    if(!returnMatched) return(tmp)
+    res <- lapply(seq_along(object), function(i) FUN(object[i], ...)@matches)


Note: this requires that FUN returns a Matched object (otherwise the @matches would not work). Seeing this code I now realize that maybe it might be better to split the functionality into two separate functions:

lapply to apply any function to each element in a Matched and return simply its result (i.e. simply calling lapply(seq_along(object), function(i) FUN(object[i], ...))

endoapply to apply a user function (that must return a Matched object) and returns a Matched object of same length than the input object. This function would then be your matchApply, just without the returnMatched.

Sorry for this confusion, but I think this would make more sense code-wise (less if statements) and would also maybe clearer to the user. Note on the endoapply: that is a function implemented in S4Vectors that performs an endomorphic operation, i.e. the result object is from the same type as the input object. See here for more information.

For lapply, it would need to be implemented as a method for Matched object (the generic is available in BiocGenerics, so you would need to import the method from there and implement a method for Matched). endoapply is a little more difficult since it's a function. Maybe define the generic method in MetaboAnnotation and have implement a method for ANY that calls S4Vectors::endoapply and implement one for Matched objects.

You would need to import the lapply method from BiocGenerics and implement one for Matched. endoapply is unfortunately only a function - maybe there

jorainer

Very nice! Thanks Andrea!

matchApply

b9155f2

andreavicini requested a review from jorainer May 25, 2022 12:22

jorainer requested changes May 26, 2022

View reviewed changes

matchApply update

73cd5ed

jorainer requested changes Jul 7, 2022

View reviewed changes

jorainer mentioned this pull request Jul 7, 2022

Additional filterMatched filters #86

Open

lapply and endoapply

a61bb9e

jorainer approved these changes Jul 28, 2022

View reviewed changes

jorainer merged commit 2e23d9b into master Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

matchApply #85

matchApply #85

andreavicini commented May 25, 2022

jorainer left a comment

jorainer May 26, 2022

andreavicini May 26, 2022

jorainer May 26, 2022

jorainer May 26, 2022

jorainer May 26, 2022

jorainer May 26, 2022

andreavicini May 26, 2022

jorainer May 26, 2022

jorainer May 26, 2022

andreavicini May 26, 2022

jorainer Jun 27, 2022

andreavicini commented Jul 6, 2022

jorainer left a comment

jorainer Jul 7, 2022

jorainer left a comment

matchApply #85

matchApply #85

Conversation

andreavicini commented May 25, 2022

jorainer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreavicini commented Jul 6, 2022

jorainer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorainer left a comment

Choose a reason for hiding this comment