Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matchApply #85

Merged
merged 3 commits into from
Jul 28, 2022
Merged

matchApply #85

merged 3 commits into from
Jul 28, 2022

Conversation

andreavicini
Copy link
Collaborator

No description provided.

Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR Andrea! Looks good, but seeing the code I somehow questioned if this would be userfriendly. Maybe simply iterating over the length of object and passing each Matched for a query element to the function might be easier to understand for the user instead having the 3 parameters matches, query and target (which are anyway part of the Matched). I tried to describe this also in the comments. Think over it and then let's discuss again.

R/Matched.R Outdated
@@ -81,7 +81,15 @@
#' small (or, depending on parameter `decreasing`, large) values for
#' `"score"` **and** `"score_rt"` are returned.
#'
#' - `pruneTarget` *cleans* the object by removing non-matched
#' - `matchApply`: allows to apply a user defined function `FUN` to each subset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs mode documentation. You should mention:

  • FUN is expected to be a function taking arguments x, query, target and tell what the individual arguments are.
  • By default, with returnMatched = TRUE, FUN is expected to return a data.frame with at least the same columns as x (additional columns would be allowed) and in this case a Matched object is returned.
  • If returnMatched = FALSE FUN can return any value and applyMatched will return a list of length equal to the number of query elements with these values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have mentioned it in the @param section for FUN and returnMatched but you are right it's good to be more clear. I will expand also based on the changes we are discussing.

R/Matched.R Outdated
stop("`FUN` must have each one of the arguments ",
"\"matches\", \"query\", \"target\"")
tmp <- split.data.frame(object@matches, object@matches$query_idx)
tmp <- do.call(rbind, lapply(tmp, FUN, object@query, object@target, ...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not directly call rbind here, but only call rbind if returnMatched = TRUE. Otherwise I would simply return the result of lapply.

R/Matched.R Outdated
tmp <- split.data.frame(object@matches, object@matches$query_idx)
tmp <- do.call(rbind, lapply(tmp, FUN, object@query, object@target, ...))
rownames(tmp) <- seq_len(nrow(tmp))
if(!returnMatched) return(tmp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think it would be important, for returnMatched = FALSE that the list has the same length than there are query elements. Otherwise the user will not know to which element the result belongs to. Maybe that could be possible with splitting the data specifying as levels a sequence along the number of query elements? Alternatively, you could create an empty list with length equal to the number of query elements and then assign the results from lapply using the $query_idx in @matches?

R/Matched.R Outdated
#' representing updated matches between `query` and `target`. Such
#' `data.frame` can be returned directly (`returnMatches = FALSE`) or a
#' `Matched` object can be returned to represent modified matches
#' (`returnMatches = TRUE`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a simple example to the example section would be helpful. With something like "select for each query the target with the highest score" and show that the results are the same as with filterMatched,TopRankedMatchesParam.

R/Matched.R Outdated
#'
#' @export
matchApply <- function(object, FUN, returnMatched = TRUE, ...) {
if (any(!c("matches", "query", "target") %in% formalArgs(FUN)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's actually a nice check - but I would maybe not be that restrictive and just mention in the documentation that the matches data.frame for one query element will be passed to the function as first element, the full query object as second and the full target element as third. Then also simple functions like function(x, ...) could be used, that simply do something with the matches data.frame.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was not sure of this check and I was thinking too to skip it and just write in the documentation which kind of function the user should provide. Regarding passing a function like function(x,...) it's not clear how that could work. Beacuse of the way the cose is written at the moment (also imagining that the check was not there) FUN should have necessarily params query and target to receive object@query and object@target within matchApply. But anyway I think what you suggested below is a good idea and so maybe it's good to leave this and go in that direction.

R/Matched.R Outdated
if (any(!c("matches", "query", "target") %in% formalArgs(FUN)))
stop("`FUN` must have each one of the arguments ",
"\"matches\", \"query\", \"target\"")
tmp <- split.data.frame(object@matches, object@matches$query_idx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code I was actually wondering - would it not be much easier to simply iterate over the whole object and pass a Matched of length 1 to the FUN? Something like:

res <- vector("list", length(object))
for (i in seq_along(object)) {
    res[[i]] <- FUN(object[i], ...)
}

Happy to discuss this further - also if it would be possible, e.g. to merge then the result back into a single Matched.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have the feeling that it might be easier for the user to define a function that gets a Matched object for one query as input to do something on that object. That would be similar to e.g. spectrapply in Spectra that applies a function to each element in a Spectra object. Here, matchedApply (maybe even a better name than matchApply?) would apply a function to a Matched of each query. The FUN would thus only need a single parameter (and optional ones) and the user would thus not have to learn about the contents of the 3 different parameters. What do you think @andreavicini ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that the code at the moment is not very user friendly (the user has to know how the object works) and also I think it's a good idea what you proposed. But maybe also defining a FUN acting on just the object itself would not be immediate for the user. Maybe we could define some of these functions if we have some frequent use cases in mind so that the less experienced user can use them while the more experienced user can also write its own. For example considering the second use case that you mentioned in the issue #84 we could create another param object for the filterMatches that given a set of values for a coulmn in target (or query?) keeps (or removes) only the matches corresponding to target elements having that value. For the first one we could add new filtering functions based on a certain threshold for score. But actually these functions don't need looping over subsets of matches for each query element so maybe it's not the best example. Regardless of that I like what you proposed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with you on the more user-friendly functions (that then eventually don't need any loop as you suggest).

I would just (more for the advanced users) like to have a function that allows users to loop over the Matched object and apply a function to each individual one. We have a similar function (spectrapply) in Spectra. Thus, maybe we should adapt the matchedapply function to apply FUN to each element of the Matched object. Let's then check how that feels to see if some more changes are needed.

@andreavicini
Copy link
Collaborator Author

Hi, I should have updated matchApply. So if we think they are useful I could also add the filter functions we discussed above I.e.

  1. filter the matches with score below (or above) a certain threshold.
  2. filter the matches where the query or target element is in a certain set of values.

I’m not sure how to call the parameter for these filter functions though. Maybe ScoreThresholdParam for the first one? Do you have maybe some ideas?

Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrea! It looks all good - but I changed my mind again - sorry for that ;)

See the comment, but I think it might be better to separate the functionality into two different functions, one lapply that allows to apply any function to each Matched and returns whatever the function returns, and one endoapply that returns a Matched result object. Just by looking at the code I realized that merging these two possibilities into one single function might result in a pretty complicated function with too many if.

R/Matched.R Outdated
tmp <- do.call(rbind, lapply(tmp, FUN, object@query, object@target, ...))
rownames(tmp) <- seq_len(nrow(tmp))
if(!returnMatched) return(tmp)
res <- lapply(seq_along(object), function(i) FUN(object[i], ...)@matches)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this requires that FUN returns a Matched object (otherwise the @matches would not work). Seeing this code I now realize that maybe it might be better to split the functionality into two separate functions:

  • lapply to apply any function to each element in a Matched and return simply its result (i.e. simply calling lapply(seq_along(object), function(i) FUN(object[i], ...))

  • endoapply to apply a user function (that must return a Matched object) and returns a Matched object of same length than the input object. This function would then be your matchApply, just without the returnMatched.

Sorry for this confusion, but I think this would make more sense code-wise (less if statements) and would also maybe clearer to the user. Note on the endoapply: that is a function implemented in S4Vectors that performs an endomorphic operation, i.e. the result object is from the same type as the input object. See here for more information.

For lapply, it would need to be implemented as a method for Matched object (the generic is available in BiocGenerics, so you would need to import the method from there and implement a method for Matched). endoapply is a little more difficult since it's a function. Maybe define the generic method in MetaboAnnotation and have implement a method for ANY that calls S4Vectors::endoapply and implement one for Matched objects.

You would need to import the lapply method from BiocGenerics and implement one for Matched. endoapply is unfortunately only a function - maybe there

Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks Andrea!

@jorainer jorainer merged commit 2e23d9b into master Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants