Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite topN and remove getTopIdx/subsetBy #199

Merged
merged 11 commits into from
Mar 22, 2017
Merged

rewrite topN and remove getTopIdx/subsetBy #199

merged 11 commits into from
Mar 22, 2017

Conversation

sgibb
Copy link
Collaborator

@sgibb sgibb commented Mar 20, 2017

This PR replaces getTopIdx and subsetBy by .topIdx.

.topIdx runs the summarise function just once over the whole matrix (instead of groupwise as getTopIdx did) and returns global row indicies that could be directly used to subset the matrix (no need for subsetBy anymore).
The code is much easier to read and it greatly improves the runtime (in this example 4x but in our current synapter set with 30000 peptides it is nearly 40x):

library("git2r")
library("devtools")

repo <- repository("MSnbase")

checkout(repo, branch="master")
load_all("MSnbase")
data("msnset2")
system.time({mas <- topN(msnset2, groupBy = fData(msnset2)$accession, n = 3) })
#    user  system elapsed
#   0.092   0.000   0.091

checkout(repo, branch="topN")
load_all("MSnbase")
system.time({top <- topN(msnset2, groupBy = fData(msnset2)$accession, n = 3) })
#    user  system elapsed
#   0.019   0.000   0.018

all.equal(mas, top)
#  [1] "Attributes: < Component “.__classVersion__”: Component “R”: Mean relative difference: 0.6666667 >"
#  [2] "Attributes: < Component “.__classVersion__”: Component “Biobase”: Mean relative difference: 0.2352941 >"
#  [3] "Attributes: < Component “.__classVersion__”: Component “pSet”: Mean relative difference: 1 >"
#  [4] "Attributes: < Component “processingData”: Attributes: < Component “processing”: Lengths (2, 3) differ (string compare on first 2) > >"
#  [5] "Attributes: < Component “processingData”: Attributes: < Component “processing”: 1 string mismatch > >"
#  [6] "Attributes: < Component “qual”: Attributes: < Names: 3 string mismatches > >"
#  [7] "Attributes: < Component “qual”: Attributes: < Length mismatch: comparison on first 3 components > >"
#  [8] "Attributes: < Component “qual”: Attributes: < Component 2: Attributes: < Modes: list, NULL > > >"
#  [9] "Attributes: < Component “qual”: Attributes: < Component 2: Attributes: < names for target but not for current > > >"
# [10] "Attributes: < Component “qual”: Attributes: < Component 2: Attributes: < current is not list-like > > >"
# [11] "Attributes: < Component “qual”: Attributes: < Component 2: Lengths (1, 0) differ (string compare on first 0) > >"
# [12] "Attributes: < Component “qual”: Attributes: < Component 3: Modes: character, numeric > >"
# [13] "Attributes: < Component “qual”: Attributes: < Component 3: target is character, current is numeric > >"

## synapter
library("synapter")
r <- readRDS("kuharev2015/synapter2/output/combMSnSet.RDS")
g <- fData(r)$protein.Accession.S130423_05

repo <- repository("MSnbase")

checkout(repo, branch="master")
load_all("MSnbase")
system.time({mas <- topN(r, groupBy = g, n = 3) })
#       User      System verstrichen
#     37.928       0.364      38.297

checkout(repo, branch="topN")
load_all("MSnbase")
system.time({top <- topN(r, groupBy = g, n = 3) })
#    user  system elapsed
#   0.088   0.000   0.089

Obviously the MSnSet are not identical anymore because I use subset instead of recreation now (but recreation would also be possible if wished).

There is also a new function .summariseRow that just run apply(x, 1, fun, ...) in most cases. But if the user want to use sum/mean it uses rowSums/rowMeans. We could use .summariseRow in other places too (if you like). Please note that summariseRow could easily be fouled: summariseRow(x, fun=function(y)sum(y)) will use apply instead of rowSums.

@sgibb sgibb requested a review from lgatto March 20, 2017 12:07
@codecov-io
Copy link

codecov-io commented Mar 20, 2017

Codecov Report

Merging #199 into master will decrease coverage by 0.05%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #199      +/-   ##
==========================================
- Coverage    71.8%   71.75%   -0.06%     
==========================================
  Files          67       67              
  Lines        7182     7165      -17     
==========================================
- Hits         5157     5141      -16     
+ Misses       2025     2024       -1
Impacted Files Coverage Δ
R/utils.R 61.55% <100%> (+0.42%) ⬆️
R/methods-MSnSet.R 56.22% <100%> (-1.92%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb2fb86...bfddd5f. Read the comment docs.

@lgatto lgatto merged commit bb5d5e3 into master Mar 22, 2017
@lgatto
Copy link
Owner

lgatto commented Mar 22, 2017

@sgibb thank you very much! Could you document this in the news file, please, possibly with the PR number. That will make it easier if, in the future, we need to track that change back.

sgibb added a commit that referenced this pull request Mar 23, 2017
@sgibb sgibb deleted the topN branch March 23, 2017 00:04
lgatto pushed a commit that referenced this pull request Apr 9, 2017
From: Sebastian Gibb <mail@sebastiangibb.de>

git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@127877 bc3139a8-67e5-0310-9ffc-ced21a209358
lgatto pushed a commit that referenced this pull request Sep 7, 2017
From: Sebastian Gibb <mail@sebastiangibb.de>

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@127877 bc3139a8-67e5-0310-9ffc-ced21a209358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants