Potential fixes for issue #40 (efficient sleuth object storage) #63

psturmfels · 2016-02-11T16:23:54Z

No description provided.

…ry statistics

psturmfels · 2016-02-15T01:31:56Z

Some stuff that breaks with the current update:
analyses/transcript view – "Error: subscript out of bounds" (missing bootstraps)

summaries/fragment length distribution plot – "Error: kallisto object does not contain the fragment length distribution. Please rerun with a new version of kallisto." (can't tell whether this is due to my branch changes, or due to a sample set actually run with an older version of kallisto, more updates to come)
diagnostics/bias weights – same error as above

Update: looks like only analyses/transcript view is broken!
The only reason the other stuff is broken seems to be that the test data sets that I have are not updated with the current version of kallisto.

pimentel · 2016-02-17T18:55:47Z

.lintr

@@ -7,4 +7,4 @@ linters: with_defaults(
  single_quotes_linter = NULL,
  trailing_blank_lines_linter = NULL
  )
-exclusions: list("R/hexamers.R", "inst/doc/intro.R")
+exclusions: list("R/hexamers.R", "inst/doc/intro.html")


Why did this change? Was this giving you issue? The linter shouldn't be checking anything other than .R files

"inst/doc/intro.R" does not exist – the intro file is an html file.
I added intro.html to the exclusions since I assumed it shouldn't be checking
the html file (and I assumed that exclusions was a list of files lintr shouldn't check)

pimentel · 2016-02-17T18:56:37Z

FYI, I'm planning to do a thorough code review Thursday night.

Last commit turned out to be calculating the bootstrap quantiles incorrectly. This commit should address that issue, and also correctly plot the boxplots of each transcript.

psturmfels · 2016-02-22T17:37:29Z

Hey Harold, I want to tentatively claim that everything is working. Take a look!

pimentel · 2016-02-24T07:00:49Z

R/sleuth.R

+      bs_var
+    }))
+
+    ret$target_id <- target_id


why does this need to be returned in the object?

pimentel · 2016-02-24T07:18:28Z

R/sleuth.R

+    obs_counts <- obs_to_matrix(ret, "est_counts")
+    obs_counts <- transformation_function(obs_counts)
+
+    bs_test_summary$obs_counts <- obs_counts[rownames(obs_counts)


is it just me, or are these filter commands a bit obscure? are they not simply the same filter command?

pimentel · 2016-02-24T07:20:11Z

R/sleuth.R

+    bs_test_summary <- adf(ret$bs_summary)
+    bs_test_summary$target_id <- target_id
+    bs_test_summary <- bs_test_summary[order(bs_test_summary$target_id), ]
+    bs_test_summary <- data.frame(varMeans =


given that you perform some operations on this, I think it makes sense to leave it as a matrix until the last possible moment. please re factor so that you can perform all operations on the matrix, then simply have a very basic command that turns it into a data frame and adds the correct columns (target_id)

pimentel · 2016-02-24T07:23:36Z

okay, I did a very quick code review and made some minor changes. I haven't yet tested for correctness -- I will do so after you make these changes. I will probably have some minor changes after that, then I think we will be ready to merge this into the main branch.

good work!

also: please use 2 space tabs rather than 4.

thanks!

psturmfels · 2016-02-27T21:52:14Z

Hey Harold, I've been slowly working on new updates. In sleuth_live transcript view, can you justify having the different units for the boxplots? Won't the trends will be the same regardless of units?
If so, could we potentially choose either est_counts or tpm to display the boxplots in? This will cut down prep time – right now I have to calculate quantile data twice – once for each unit.

pimentel · 2016-02-28T00:44:51Z

There are a number of caveats here, and potentially I would like to leave it as an option.

Can you add an option to sleuth_prep named read_bootstrap_tpm. If TRUE then it reads it. Otherwise, it doesn't read tpm.

Thanks

psturmfels · 2016-02-28T19:20:25Z

Done. The code now uses a pre-allocated matrix, does not repeat calculations, and uses for-loops. I've also fixed up some minor style problems.

pimentel · 2016-02-28T20:09:49Z

Awesome, thanks!

Can you also address the comment on: https://github.com/pachterlab/sleuth/pull/63/files#diff-f7d482db290842229ad430c971048216R335

(sleuth.R line 335)?

pimentel · 2016-03-02T17:48:27Z

R/read_write.R

+read_bootstrap_mat <- function(fname, num_bootstraps, num_transcripts, est_count_sf) {
+  bs_mat <- matrix(nrow=num_bootstraps, ncol=num_transcripts)
+  for(i in 1:nrow(bs_mat)) {
+    bs_mat[i, ] <- rhdf5::h5read(fname, paste0("bootstrap/bs", i - 1)) / est_count_sf


note to self: est_count_sf usage does not look correct here. I am fairly certain it should be est_count_sf[i]. should check for correctness. UPDATE: probably okay due to the way it is called

pimentel · 2016-03-02T18:22:29Z

@psturmfels I'm having trouble running this version. Here is the error that I get:

d> so <- sleuth_prep(study_mapping, ~condition)
reading in kallisto results
......
normalizing est_counts
6280 targets passed the filter
normalizing tpm
merging in metadata
normalizing bootstrap samples
summarizing bootstraps
Reading bootstraps from sample: 
Error in ret$bs_quants[[samp_name]] <- list(est_counts = bs_quant_est_counts) : 
  attempt to select less than one element

This is the code that generated that error:

library('devtools')
dev_mode()

install('../..')

library('sleuth')

data_path <- 'ellahi'

sample_ids <- grep('^SR', dir(data_path), value = TRUE)
sample_ids <- rev(sample_ids)

study_mapping <- read.table(file.path(data_path, 'study_design.txt'), header = TRUE,
  stringsAsFactors = FALSE)
study_mapping <- dplyr::select(study_mapping, sample = run, condition)

stopifnot(sample_ids == study_mapping$sample)

result_paths <- file.path(data_path, sample_ids, 'kallisto')
study_mapping <- dplyr::mutate(study_mapping, path = result_paths)

so <- sleuth_prep(study_mapping, ~condition)

Run from within tests/testthat.

Can you look into it, please?

Thanks

psturmfels · 2016-03-02T19:57:12Z

Hey Harold,
This issue is happening because the above code does not give the study_mapping$path column names – see the getting started manual, but I'm fairly certain Sleuth expects "A list of paths to the kallisto results indexed by the sample IDs" for the path column.

With that said, I'll change the code not to rely on this assumption.

pimentel · 2016-03-02T23:08:40Z

@psturmfels I think I am comfortable merging this now. Take a look at the recent commits/modifications that I made so that you are familiar with them. In particular, I cleaned up a lot of the staff at the end (when assigning things to ret$bs_summary).

Anyway, I checked for correctness on 2 data sets and they seem to give the exact same values. Good work!

Potential fixes for issue #40 (efficient sleuth object storage)

Pascal Sturmfels added 10 commits November 16, 2015 18:51

Added read_bootstrap_statistics function to calculate bootstrap summa…

1d41d09

…ry statistics

Modified read_write.R to use matrices

0819339

Updated read_kallisto_h5 to use read_bootstrap_summary

647d6ee

updates to bootstrap reading

91d7296

updated branch, fixed merge

1106e2a

now computes variance of means without storing bootstraps

5c44a11

cleaned up mean variance calculations

770c3e0

changed from standard devation to variance

eb0c5f9

changed from varMeans to sigma_q_sq

bea47fd

updated sigma_q_sq to fit previous calculations, and now lintr clean

a1581c4

cleaned up sleuth_prep

c0b5002

pimentel reviewed Feb 17, 2016
View reviewed changes

Pascal Sturmfels added 3 commits February 20, 2016 15:05

First attempt at fixing the boxplots

0448782

Revamped bootstrap quantile calculation + plotting

37d6708

Last commit turned out to be calculating the bootstrap quantiles incorrectly. This commit should address that issue, and also correctly plot the boxplots of each transcript.

updated plot_bootstrap

c9e5cc3

Pascal Sturmfels and others added 5 commits February 22, 2016 13:29

removed browser() call

73f3fa8

amended typo in sleuth_prep

a28284f

Code is now lintr clean

8c4ee92

temporarily disable lintr. fix design_matrix test #vc

3e99335

mostly style #vc

5fd437a

pimentel reviewed Feb 24, 2016
View reviewed changes

R/sleuth.R

bs_var

}))

ret$target_id <- target_id

Copy link

Collaborator

pimentel Feb 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be returned in the object?

more style #vc

89202ea

pimentel reviewed Feb 24, 2016
View reviewed changes

Pascal Sturmfels added 2 commits February 27, 2016 17:23

Abandoned rbind

965a3af

fixed merge conflicts

a108a55

Pascal Sturmfels added 2 commits February 28, 2016 13:57

added read_bootstrap_tpm

3947162

Style fixes

1774014

minor fixes to shiny box plots

0c1400c

Pascal Sturmfels added 2 commits February 28, 2016 21:45

removed %in% commands

cedaf7c

removed browser() calls

92bf7c1

pimentel reviewed Mar 2, 2016
View reviewed changes

Pascal Sturmfels and others added 5 commits March 2, 2016 15:02

removed dependency on paths being named by sample IDs

cfac4c8

minor styling and annotation #vc

9fdf5d7

refactor reading of bootstraps #vc

b0ab406

some cleaning of bootstrap reading #vc

5a17b1e

rename option and update documentation #vc

0c5f989

pimentel added a commit that referenced this pull request Mar 2, 2016

Merge pull request #63 from pachterlab/ps-bootstrap

4b23abc

Potential fixes for issue #40 (efficient sleuth object storage)

pimentel merged commit 4b23abc into devel Mar 2, 2016

pimentel mentioned this pull request Mar 2, 2016

create efficient sleuth storage and re-factor memory usage #40

Closed

5 tasks

pimentel mentioned this pull request May 29, 2017

Release v0.29.0 #110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential fixes for issue #40 (efficient sleuth object storage) #63

Potential fixes for issue #40 (efficient sleuth object storage) #63

psturmfels commented Feb 11, 2016

psturmfels commented Feb 15, 2016

pimentel Feb 17, 2016

psturmfels Feb 20, 2016

pimentel commented Feb 17, 2016

psturmfels commented Feb 22, 2016

pimentel Feb 24, 2016

pimentel Feb 24, 2016

pimentel Feb 24, 2016

pimentel commented Feb 24, 2016

psturmfels commented Feb 27, 2016

pimentel commented Feb 28, 2016

psturmfels commented Feb 28, 2016

pimentel commented Feb 28, 2016

pimentel Mar 2, 2016

pimentel commented Mar 2, 2016

psturmfels commented Mar 2, 2016

pimentel commented Mar 2, 2016

Potential fixes for issue #40 (efficient sleuth object storage) #63

Potential fixes for issue #40 (efficient sleuth object storage) #63

Conversation

psturmfels commented Feb 11, 2016

psturmfels commented Feb 15, 2016

pimentel Feb 17, 2016

Choose a reason for hiding this comment

psturmfels Feb 20, 2016

Choose a reason for hiding this comment

pimentel commented Feb 17, 2016

psturmfels commented Feb 22, 2016

pimentel Feb 24, 2016

Choose a reason for hiding this comment

pimentel Feb 24, 2016

Choose a reason for hiding this comment

pimentel Feb 24, 2016

Choose a reason for hiding this comment

pimentel commented Feb 24, 2016

psturmfels commented Feb 27, 2016

pimentel commented Feb 28, 2016

psturmfels commented Feb 28, 2016

pimentel commented Feb 28, 2016

pimentel Mar 2, 2016

Choose a reason for hiding this comment

pimentel commented Mar 2, 2016

psturmfels commented Mar 2, 2016

pimentel commented Mar 2, 2016