Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input at gene count levels #43

Closed
gevro opened this issue May 21, 2020 · 9 comments
Closed

Input at gene count levels #43

gevro opened this issue May 21, 2020 · 9 comments

Comments

@gevro
Copy link

gevro commented May 21, 2020

Hi, Saving all the control BAM files for input into DROP takes a lot of space. After I run DROP the first time, is there any kind of intermediate file for each sample (gene and exon/intron counts for example) that I can save instead for next time I want to run those samples, without having to save the original BAM files?

@gevro
Copy link
Author

gevro commented May 28, 2020

Hi, Just checking in about the above question too. The key issue is every time we have a new rare disease family, can we avoid having to rerun the entire analysis for all the many control samples and start the analysis from some intermediate step. This is analogous to the n + 1 problem in joint genotyping analysis. Thanks.

@vyepez88
Copy link
Collaborator

Hi, for the time being, you have to keep all the BAM files. The BAM files are the main input of DROP. Snakemake (and the way we designed DROP) checks that the BAM files of the samples that are going to be processed exist, and then begins with the analysis.
Nevertheless, if you add a new sample, only this one will be counted (both for gene-level and split reads) and then merged with the rest. The other ones will not be re-counted, but the BAM files must exist.

@gevro
Copy link
Author

gevro commented May 28, 2020

I see. How do I configure the pipeline so it merges the new samples with the prior samples? Does it have to be in the same master directory of the original analysis, with a new config.yaml file?
Regardless, it would be good to have an option to skip the counting step so that the original BAM files don't have to be kept for control samples.

@vyepez88
Copy link
Collaborator

You have to add the new samples as rows in the sample annotation and assign them to the corresponding DROP GROUP that you want to merge them with. Then, Snakemake will recognize that there are new processes to be done.
Yes, we're considering that option that's also useful when merging with external counts.

@gevro
Copy link
Author

gevro commented May 28, 2020

If I add a new sample to the sample annotation, does it have to be in the same original drop analysis folder? I'm guessing yes, but just want to make sure.

@vyepez88
Copy link
Collaborator

What exactly has to be in the same original analysis folder?
Every time a new analysis in run, everything's is rewritten on the processed_data and processed_results folders. A new copy of the OUTRIDER data set (ods) object is saved, but not for the FRASER data set (fds) object, because it's too big.

@gevro
Copy link
Author

gevro commented May 28, 2020

To clarify, if I start with one analysis with 10 of my samples + 100 control samples.
Then I want to do another analysis with 5 new samples together with the previous 10 samples from our lab + 100 control samples.

How do I set this up exactly? Do I just change the sample annotation table in the same DROP project folder of the first analysis? Because above you wrote that Snakemake can do this without having to recalculate all the processing for the samples from the first analysis of 10 + 100 samples. But in order for that to work, that means that all the analysis files must still exist from the first analysis, which I am guessing means that the second analysis must occur in the same folder as the first analysis. Is that correct?

@vyepez88
Copy link
Collaborator

Yes, you change the sample annotation in the same DROP project folder and then execute snakemake ....
Because it's in the same folder, it will recognize the samples that are already processed and the ones that need to be processed.

@gevro
Copy link
Author

gevro commented May 28, 2020

Ok thanks.

@gevro gevro closed this as completed May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants