Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential fixes for issue #40 (efficient sleuth object storage) #63

Merged
merged 32 commits into from
Mar 2, 2016

Conversation

psturmfels
Copy link
Contributor

No description provided.

@psturmfels
Copy link
Contributor Author

Some stuff that breaks with the current update:
analyses/transcript view – "Error: subscript out of bounds" (missing bootstraps)

summaries/fragment length distribution plot – "Error: kallisto object does not contain the fragment length distribution. Please rerun with a new version of kallisto." (can't tell whether this is due to my branch changes, or due to a sample set actually run with an older version of kallisto, more updates to come)
diagnostics/bias weights – same error as above

Update: looks like only analyses/transcript view is broken!
The only reason the other stuff is broken seems to be that the test data sets that I have are not updated with the current version of kallisto.

@@ -7,4 +7,4 @@ linters: with_defaults(
single_quotes_linter = NULL,
trailing_blank_lines_linter = NULL
)
exclusions: list("R/hexamers.R", "inst/doc/intro.R")
exclusions: list("R/hexamers.R", "inst/doc/intro.html")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change? Was this giving you issue? The linter shouldn't be checking anything other than .R files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"inst/doc/intro.R" does not exist – the intro file is an html file.
I added intro.html to the exclusions since I assumed it shouldn't be checking
the html file (and I assumed that exclusions was a list of files lintr shouldn't check)

@pimentel
Copy link
Collaborator

FYI, I'm planning to do a thorough code review Thursday night.

Pascal Sturmfels added 3 commits February 20, 2016 15:05
Last commit turned out to be calculating the bootstrap quantiles
incorrectly. This commit should address that issue, and also
correctly plot the boxplots of each transcript.
@psturmfels
Copy link
Contributor Author

Hey Harold, I want to tentatively claim that everything is working. Take a look!

bs_var
}))

ret$target_id <- target_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be returned in the object?

obs_counts <- obs_to_matrix(ret, "est_counts")
obs_counts <- transformation_function(obs_counts)

bs_test_summary$obs_counts <- obs_counts[rownames(obs_counts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it just me, or are these filter commands a bit obscure? are they not simply the same filter command?

bs_test_summary <- adf(ret$bs_summary)
bs_test_summary$target_id <- target_id
bs_test_summary <- bs_test_summary[order(bs_test_summary$target_id), ]
bs_test_summary <- data.frame(varMeans =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that you perform some operations on this, I think it makes sense to leave it as a matrix until the last possible moment. please re factor so that you can perform all operations on the matrix, then simply have a very basic command that turns it into a data frame and adds the correct columns (target_id)

@pimentel
Copy link
Collaborator

okay, I did a very quick code review and made some minor changes. I haven't yet tested for correctness -- I will do so after you make these changes. I will probably have some minor changes after that, then I think we will be ready to merge this into the main branch.

good work!

also: please use 2 space tabs rather than 4.

thanks!

@psturmfels
Copy link
Contributor Author

Hey Harold, I've been slowly working on new updates. In sleuth_live transcript view, can you justify having the different units for the boxplots? Won't the trends will be the same regardless of units?
If so, could we potentially choose either est_counts or tpm to display the boxplots in? This will cut down prep time – right now I have to calculate quantile data twice – once for each unit.

@pimentel
Copy link
Collaborator

There are a number of caveats here, and potentially I would like to leave it as an option.

Can you add an option to sleuth_prep named read_bootstrap_tpm. If TRUE then it reads it. Otherwise, it doesn't read tpm.

Thanks

@psturmfels
Copy link
Contributor Author

Done. The code now uses a pre-allocated matrix, does not repeat calculations, and uses for-loops. I've also fixed up some minor style problems.

@pimentel
Copy link
Collaborator

Awesome, thanks!

Can you also address the comment on: https://github.com/pachterlab/sleuth/pull/63/files#diff-f7d482db290842229ad430c971048216R335

(sleuth.R line 335)?

read_bootstrap_mat <- function(fname, num_bootstraps, num_transcripts, est_count_sf) {
bs_mat <- matrix(nrow=num_bootstraps, ncol=num_transcripts)
for(i in 1:nrow(bs_mat)) {
bs_mat[i, ] <- rhdf5::h5read(fname, paste0("bootstrap/bs", i - 1)) / est_count_sf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: est_count_sf usage does not look correct here. I am fairly certain it should be est_count_sf[i]. should check for correctness. UPDATE: probably okay due to the way it is called

@pimentel
Copy link
Collaborator

pimentel commented Mar 2, 2016

@psturmfels I'm having trouble running this version. Here is the error that I get:

d> so <- sleuth_prep(study_mapping, ~condition)
reading in kallisto results
......
normalizing est_counts
6280 targets passed the filter
normalizing tpm
merging in metadata
normalizing bootstrap samples
summarizing bootstraps
Reading bootstraps from sample: 
Error in ret$bs_quants[[samp_name]] <- list(est_counts = bs_quant_est_counts) : 
  attempt to select less than one element

This is the code that generated that error:

library('devtools')
dev_mode()

install('../..')

library('sleuth')

data_path <- 'ellahi'

sample_ids <- grep('^SR', dir(data_path), value = TRUE)
sample_ids <- rev(sample_ids)

study_mapping <- read.table(file.path(data_path, 'study_design.txt'), header = TRUE,
  stringsAsFactors = FALSE)
study_mapping <- dplyr::select(study_mapping, sample = run, condition)

stopifnot(sample_ids == study_mapping$sample)

result_paths <- file.path(data_path, sample_ids, 'kallisto')
study_mapping <- dplyr::mutate(study_mapping, path = result_paths)

so <- sleuth_prep(study_mapping, ~condition)

Run from within tests/testthat.

Can you look into it, please?

Thanks

@psturmfels
Copy link
Contributor Author

Hey Harold,
This issue is happening because the above code does not give the study_mapping$path column names – see the getting started manual, but I'm fairly certain Sleuth expects "A list of paths to the kallisto results indexed by the sample IDs" for the path column.

With that said, I'll change the code not to rely on this assumption.

@pimentel
Copy link
Collaborator

pimentel commented Mar 2, 2016

@psturmfels I think I am comfortable merging this now. Take a look at the recent commits/modifications that I made so that you are familiar with them. In particular, I cleaned up a lot of the staff at the end (when assigning things to ret$bs_summary).

Anyway, I checked for correctness on 2 data sets and they seem to give the exact same values. Good work!

pimentel added a commit that referenced this pull request Mar 2, 2016
Potential fixes for issue #40 (efficient sleuth object storage)
@pimentel pimentel merged commit 4b23abc into devel Mar 2, 2016
@pimentel pimentel mentioned this pull request May 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants