Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs in conda mae pipeline #105

Closed
gevro opened this issue Aug 4, 2020 · 56 comments
Closed

Bugs in conda mae pipeline #105

gevro opened this issue Aug 4, 2020 · 56 comments
Assignees
Labels
bug Something isn't working

Comments

@gevro
Copy link

gevro commented Aug 4, 2020

Hi,
Here is the next bug in the mae pipeline I am getting. It looks much more like a bug, because gatk is getting an error due to an existing .dict file. This should not give an error, and the gatk command should be told to overwrite it if it exists.
Please let me know when it is fixed.

There are 3 other bugs in the conda mae pipeline that prevent it from working. But I think it will be easier to fix them one by one. I can post the next one after this is fixed.

[Mon Aug 3 22:51:08 2020]
rule create_dict:
input: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa
output: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.dict
jobid: 31

INFO 2020-08-03 22:51:54 CreateSequenceDictionary Output dictionary will be written in /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.dict
22:51:54.446 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/data/bin/drop_conda/share/gatk4-4.1.8.1-0/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Aug 03 22:51:54 EDT 2020] CreateSequenceDictionary --REFERENCE /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Mon Aug 03 22:51:56 EDT 2020] Executing as evrong01@cn-0044 on Linux 3.10.0-693.17.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_192-b01; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.1
[Mon Aug 03 22:51:57 EDT 2020] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=2667577344
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
picard.PicardException: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.dict already exists. Delete this file and try again, or specify a different output file.
at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:220)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /gpfs/data/bin/drop_conda/share/gatk4-4.1.8.1-0/gatk-package-4.1.8.1-local.jar
Running:

@gevro
Copy link
Author

gevro commented Aug 6, 2020

Hi, Any update about this bug? Thanks.

@mumichae
Copy link
Collaborator

Hi,

that is indeed a bug, thanks for pointing that out, we will be fixing it soon. The issue is that the point/period in the filename causes the output filename to be truncated early.

For the moment, you can create a symlink to your fasta file (so you don't have to copy the file) that does not contain any point characters before the .fa ending. In your case it should work with

ln -s /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly_genome.fa

and replacing the corresponding entry in the config to

mae:
  genome: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly_genome.fa

Hope this solves the problem for now.

@gevro
Copy link
Author

gevro commented Aug 13, 2020

I tried, but still getting errors...

MissingInputException in line 38 of /gpfs/home/evrong01/.local/lib/python3.6/site-packages/wbuild/wBuild.snakefile:
Missing input files for rule markdown:
MAE/UDP--v32_results.md
[Thu Aug 13 00:55:45 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.snakemake/log/2020-08-13T005543.671615.snakemake.log
check for missing R packages
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Subworkflow AE: Nothing to be done.
Subworkflow AS: Nothing to be done.
Executing subworkflow MAE.
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
MissingInputException in line 56 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule create_dict:
/gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly.genome.fa

@mumichae
Copy link
Collaborator

mumichae commented Aug 13, 2020

Make sure that you specify the filename correctly, the fasta file doesn't seem to exist. Also, remove the last dot before the file ending from

/gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly.genome.fa

to

/gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly_genome.fa

otherwise the error will persist.

If the unlock us giving you problems, try deleting the .snakemake directory in the project directory.

@gevro
Copy link
Author

gevro commented Aug 14, 2020

Next bug I'm getting. This is with the conda environment...

Error: package or namespace load failed for ‘tMAE’:
package ‘tMAE’ was installed before R 4.0.0: please re-install it
Execution halted

@mumichae
Copy link
Collaborator

Which R version are you using?

which R

It should be the one in your conda environment

@gevro
Copy link
Author

gevro commented Aug 14, 2020

I'm using the drop conda environment. So if there is an issue, it is because of an issue in the drop conda environment configuration.

R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)

@mumichae
Copy link
Collaborator

mumichae commented Aug 14, 2020

Hm, seems as though R might access different library paths. Which paths do you get when you call

.libPaths()

in R in the environment? You should only have a single path that links to the conda environment.
If it has more than one path, we could have some issues reinstalling tMAE.

Also, which Bioconductor version do you have?

BiocManager::version()

@gevro
Copy link
Author

gevro commented Aug 14, 2020

Ok, the libPaths was sourcing my regular R first instead of the conda R. This is a bug in conda. The point of conda is to have a separate environment. However, conda still sources the original R library paths from the root user. See here for details:
https://waoverholt.com/conda-and-R/

The solution is:
In your conda package, add to here: drop_conda/etc/conda/activate.d/activate-r-base.sh

this line: export R_LIBS=DROPCONDA_ENVIRONMENT/lib/R/library
where DROPCONDA_ENVIRONMENT is the path to the conda environment

@gevro
Copy link
Author

gevro commented Aug 14, 2020

Next bug in mae conda. How do I fix this?

Started with deseq
[1] "Running DESeq..."
Error in DESeqDataSet(se, design = design, ignoreRank) :
counts matrix should be numeric, currently it has mode: logical
Calls: DESeq4MAE ... deseq_for_allele_specific_expression -> DESeqDataSetFromMatrix -> DESeqDataSet
Execution halted
[Fri Aug 14 12:31:02 2020]
Error in rule Scripts_MAE_deseq_mae_R:
jobid: 9
output: /gpfs/scratch/evrong01/droptest2/root/processed_results/mae/samples/1841982--UDP-1003_RNA_res.Rds

@mumichae
Copy link
Collaborator

Thanks for finding the issue. For some reason, we I don't get this behaviour on my machine, despite having a system installation of R, will need some time to figure put what is going on.

@mumichae
Copy link
Collaborator

Next bug in mae conda. How do I fix this?

Started with deseq
[1] "Running DESeq..."
Error in DESeqDataSet(se, design = design, ignoreRank) :
counts matrix should be numeric, currently it has mode: logical
Calls: DESeq4MAE ... deseq_for_allele_specific_expression -> DESeqDataSetFromMatrix -> DESeqDataSet
Execution halted
[Fri Aug 14 12:31:02 2020]
Error in rule Scripts_MAE_deseq_mae_R:
jobid: 9
output: /gpfs/scratch/evrong01/droptest2/root/processed_results/mae/samples/1841982--UDP-1003_RNA_res.Rds

This error is most likely due to a failed GATK run. Try removing all the MAE output ($ROOT/processed_data where $ROOT is the project root path specified in config.yaml) before rerunning the pipeline.

@gevro
Copy link
Author

gevro commented Aug 14, 2020

Ok I will try removing failed MAE output. Why doesn't snakemake or drop detect this automatically and rerun? This should be fixed.

@gevro
Copy link
Author

gevro commented Aug 14, 2020

Deleting the mae directory in the root/processed_results folder gives the prior bug/error:
MissingInputException in line 38 of /gpfs/home/evrong01/.local/lib/python3.6/site-packages/wbuild/wBuild.snakefile:
Missing input files for rule markdown:
MAE/UDP--v32_results.md
[Fri Aug 14 12:53:06 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.snakemake/log/2020-08-14T125304.030683.snakemake.log

@gevro
Copy link
Author

gevro commented Aug 14, 2020

It also gives this error:

Building DAG of jobs...
MissingInputException in line 62 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule create_SNVs:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
[Fri Aug 14 12:51:59 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)

@gevro
Copy link
Author

gevro commented Aug 14, 2020

In general, mae is a lot more buggy than the other pipelines. There must be something in its foundation that causes it to have so many problems compared to the other drop pipelines.

@gevro
Copy link
Author

gevro commented Aug 14, 2020

This directory that it complains about doesn't even exist...
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/

I tried 'drop update' and it still doesn't work.

So the suggestion of deleting $ROOT/processed_data/mae broke the pipeline. I'm not sure how to fix it now.

@gevro
Copy link
Author

gevro commented Aug 14, 2020

Sorry, the mae-pipeline folder exists, but the mae-pipeline/.drop folder does not exist:

[evrong01@bigpurple-ln1 droptest2]$ ls -a /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/
. .. mae_readme.md resource Scripts Snakefile .snakemake

So this error points to files that do not exist. This is definitely a bug in the underlying code. The pipeline has path names that don't exist. This is what causes the unlock commands to have errors every time. I suspect the unlock commmand has several bugs.

MissingInputException in line 62 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule create_SNVs:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt

@mumichae
Copy link
Collaborator

Can you print the contents of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile ? There might be a command that doesn't always give the same result for some reason. I'd just fix it manually for now.

Yes MAE is a bit buggy, as GATK commands don't error in the same way as other commands, and require unlocking, which doesn't work as it should for submodules. We are still working on these issues on the new release. It still takes time before we establish that main features are tested more systematically, so we don't run into the same reproducibility issues as we do now.

@gevro
Copy link
Author

gevro commented Aug 14, 2020

I think the unlock command or drop update have a bug in how it sets the base folder location, because it looks like it has the similar folder path twice:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh

= /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline + .drop/modules/mae-pipeline + Scripts/MAE/filterSNVs.sh

But that is the wrong path. The correct path is: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh

@gevro
Copy link
Author

gevro commented Aug 14, 2020

Here is the mae Snakefile
`### SNAKEFILE MONOALLELIC EXPRESSION
import os
import drop
import pathlib

METHOD = 'MAE'
SCRIPT_ROOT = drop.getMethodPath(METHOD, type_='workdir', str_=False)
CONF_FILE = drop.getConfFile()

parser = drop.config(config, METHOD)
config = parser.parse()
include: config['wBuildPath'] + "/wBuild.snakefile"

FUNCTIONS

def fasta_dict(fasta_file):
return fasta_file.split('.')[0] + ".dict"

def getVcf(rna_id, vcf_id="qc"):
if vcf_id == "qc":
return config["mae"]["qcVcf"]
else:
return parser.getProcDataDir() + f"/mae/snvs/{vcf_id}--{rna_id}.vcf.gz"

def getQC(format):
if format == "UCSC":
return config["mae"]["qcVcf"]
elif format == "NCBI":
return parser.getProcDataDir() + "/mae/qc_vcf_ncbi.vcf.gz"
else:
raise ValueError(f"getQC: {format} is an invalid chromosome format")

def getChrMap(SCRIPT_ROOT, conversion):
if conversion == 'ncbi2ucsc':
return SCRIPT_ROOT/"resource"/"chr_NCBI_UCSC.txt"
elif conversion == 'ucsc2ncbi':
return SCRIPT_ROOT/"resource"/"chr_UCSC_NCBI.txt"
else:
raise ValueError(f"getChrMap: {conversion} is an invalid conversion option")

def getScript(type, name):
return SCRIPT_ROOT/"Scripts"/type/name

rule all:
input:
rules.Index.output,
config["htmlOutputPath"] + "/mae_readme.html",
rules.Scripts_MAE_Datasets_R.output,
rules.Scripts_QC_Datasets_R.output
output: touch(drop.getMethodPath(METHOD, type_='final_file'))

rule sampleQC:
input: rules.Scripts_QC_Datasets_R.output
output: touch(drop.getTmpDir() + "/sampleQC.done")

rule create_dict:
input: config['mae']['genome']
output: fasta_dict(config['mae']['genome'])
shell: "gatk CreateSequenceDictionary --REFERENCE {input[0]}"

MAE

rule create_SNVs:
input:
ncbi2ucsc = getChrMap(SCRIPT_ROOT, "ncbi2ucsc"),
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
vcf_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.vcf,
file_type='DNA_VCF_FILE'),
bam_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.rna,
file_type='RNA_BAM_FILE'),
script = getScript("MAE", "filterSNVs.sh")
output:
snvs_filename=parser.getProcDataDir() + "/mae/snvs/{vcf}--{rna}.vcf.gz",
snvs_index=parser.getProcDataDir() + "/mae/snvs/{vcf}--{rna}.vcf.gz.tbi"
shell:
"""
{input.script} {input.ncbi2ucsc} {input.ucsc2ncbi} {input.vcf_file}
{wildcards.vcf} {input.bam_file} {output.snvs_filename}
{config[tools][bcftoolsCmd]} {config[tools][samtoolsCmd]}
"""

rule allelic_counts:
input:
ncbi2ucsc = getChrMap(SCRIPT_ROOT, "ncbi2ucsc"),
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
vcf_file = lambda wildcards: getVcf(wildcards.rna, wildcards.vcf),
bam_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.rna,
file_type='RNA_BAM_FILE'),
fasta = config['mae']['genome'],
dict = fasta_dict(config['mae']['genome']),
script = getScript("MAE", "ASEReadCounter.sh")
output:
counted = parser.getProcDataDir() + "/mae/allelic_counts/{vcf}--{rna}.csv.gz"
shell:
"""
{input.script} {input.ncbi2ucsc} {input.ucsc2ncbi}
{input.vcf_file} {input.bam_file} {wildcards.vcf}--{wildcards.rna}
{input.fasta} {config[mae][gatkIgnoreHeaderCheck]} {output.counted}
{config[tools][bcftoolsCmd]}
"""

QC

rule renameChrQC:
input:
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
ncbi_vcf = getQC(format="UCSC")
output:
ncbi_vcf = getQC(format="NCBI")
shell:
"""
bcftools={config[tools][bcftoolsCmd]}
echo 'converting from UCSC to NCBI format'
$bcftools annotate --rename-chrs {input.ucsc2ncbi} {input.ncbi_vcf}
| bgzip > {output.ncbi_vcf}
$bcftools index -t {output.ncbi_vcf}
"""

rule allelic_counts_qc:
input:
ncbi2ucsc = getChrMap(SCRIPT_ROOT, "ncbi2ucsc"),
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
vcf_file_ucsc = getQC(format="UCSC"),
vcf_file_ncbi = getQC(format="NCBI"),
bam_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.rna,
file_type='RNA_BAM_FILE'),
fasta = config['mae']['genome'],
dict = fasta_dict(config['mae']['genome']),
script_qc = getScript("QC", "ASEReadCounter.sh"),
script_mae = getScript("MAE", "ASEReadCounter.sh")
output:
counted = parser.getProcDataDir() + "/mae/allelic_counts/qc_{rna}.csv.gz"
shell:
"""
{input.script_qc} {input.ncbi2ucsc} {input.ucsc2ncbi}
{input.vcf_file_ucsc} {input.vcf_file_ncbi} {input.bam_file}
{wildcards.rna} {input.fasta} {config[mae][gatkIgnoreHeaderCheck]}
{output.counted} {config[tools][bcftoolsCmd]}
{config[tools][samtoolsCmd]} {input.script_mae}
"""

rulegraph_filename = f'{config["htmlOutputPath"]}/{METHOD}_rulegraph'
rule produce_rulegraph:
input:
expand(rulegraph_filename + ".{fmt}", fmt=["svg", "png"])

rule create_graph:
output:
svg = f"{rulegraph_filename}.svg",
png = f"{rulegraph_filename}.png"
shell:
"""
snakemake --configfile {CONF_FILE} --rulegraph | dot -Tsvg > {output.svg}
snakemake --configfile {CONF_FILE} --rulegraph | dot -Tpng > {output.png}
"""

rule unlock:
output: touch(drop.getMethodPath(METHOD, type_="unlock"))
shell: "snakemake --unlock --configfile {CONF_FILE}"`

@gevro
Copy link
Author

gevro commented Aug 14, 2020

The bugs are not from gatk. The bugs are from the mae drop scripts. There is an issue in some path setting and the unlock scripts. There is no feasible way to run mae.

There's a bunch of unlock errors that happen due to bugs in the mae scripts themselves. This is before any gatk is ever run:

Building DAG of jobs...
MissingInputException in line 116 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule allelic_counts_qc:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/QC/ASEReadCounter.sh
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/ASEReadCounter.sh
[Fri Aug 14 13:11:34 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.snakemake/log/2020-08-14T131129.648275.snakemake.log

@mumichae
Copy link
Collaborator

Yes, I know that error, but it's difficult to find out why it works sometimes breaks at other times.

Try creating the empty processed_data/mae directory if it doesn't exist. That may solve the issue. Otherwise, try replacing the SCRIPT_ROOT variable in the .drop/modules/mae-pipeline/Snakefile to SCRIPT_ROOT = os.getcwd(). Note that this modification will be overwritten every time you call drop update

@mumichae
Copy link
Collaborator

The bugs are not from gatk. The bugs are from the mae drop scripts. There is an issue in some path setting and the unlock scripts. There is no feasible way to run mae.

There's a bunch of unlock errors that happen due to bugs in the mae scripts themselves. This is before any gatk is ever run

The fact that Scripts_MAE_deseq_mae_R was called before it failed means that gatk was run. However errors in the gatk command don't get recognised by snakemake, so it only occurs when the broken output is read in R. That's why I was trying to get you to rerun the mae pipeline from scratch.

@mumichae
Copy link
Collaborator

mumichae commented Aug 14, 2020

Seeing as you are working with the demo and that we habe resolved the dependency issues, it might be easier to just redo the setup in a new empty directory, instead of trying to isolate the mae pipeline for now

@gevro
Copy link
Author

gevro commented Aug 14, 2020

Neither of these things fixes the problem. Changing the SCRIPT_ROOT = os.getcwd() now gives this error when running snakemake unlock:
TypeError in line 34 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
unsupported operand type(s) for /: 'str' and 'str'
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 64, in
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 34, in getChrMap

@gevro
Copy link
Author

gevro commented Aug 14, 2020

How do I redo the setup in a new empty directory without having rerun all the other pipelines, which takes 3 days of very significant compute resources?

@mumichae
Copy link
Collaborator

It takes 3 days to execute the demo? It should only take about 20min. And the total input is just abt 600MB, so total memory consumption shouldn't be high. What are your resources, what system are you working on?

@gevro
Copy link
Author

gevro commented Aug 14, 2020

I ran drop for 110 samples, not the demo. Is there any way to transfer all the results of the other pipelines (except for mae) to the new directory?

@mumichae
Copy link
Collaborator

Try to run drop on a new instance of drop demo to see if the problem still occurs first. Right now I don't have any good solution why the code suddenly breaks, as it worked before

@mumichae
Copy link
Collaborator

OK, good to know that that's a way we can fix the issue. You should definitely keep a copy of the output you have at the moment, before you have a full pipeline run. Also be careful if you save in a scratch directory, as these are often used for temporary data.
You can simply copy the output to the new directory, run a drop init and adapt the config.yaml. Always double-check with a dryrun snakemake -n before running the pipeline fully. In order to prevent the pipeline from overwriting your output, call snakemake --touch. This should update the update the timestamps of all the necessary and existsing output files. If snakemake -n still tells you to redo the steps, there might be some upstream input missing in the output.

@gevro
Copy link
Author

gevro commented Aug 15, 2020

Thanks. Which specific directories do I need to copy from the old directory to the new directory?

@mumichae
Copy link
Collaborator

mumichae commented Aug 15, 2020

Create the new project first, call snakemake -n then copy the files. The computationally expensive files are in processed_data/ and processed_results/ directories. These should be autocreated empty directories in your new project directory, so make sure they exist before. Copy the directories of each submodule separately

processed_data/aberrant_expression/
processed_data/aberrant_splicing
processed_results/...
etc

and don't touch the mae directory.

@gevro
Copy link
Author

gevro commented Aug 15, 2020

I made a new directory, then I did drop init, then drop update, then I copied the config.yaml and samples.tsv file, then snakemake -n. It ran fine. However, there is no root directory. I only have these files in the directory...

-rw-rw---- 1 evrong01 evrong01 1.8K Aug 15 15:54 config.yaml
-rw-rw---- 1 evrong01 evrong01 0 Aug 15 15:54 readme.md
-rwxrwx--- 1 evrong01 evrong01 952 Aug 15 15:55 runpipeline.sh
-rw-rw---- 1 evrong01 evrong01 179K Aug 15 15:54 samples.tsv
drwxrwx--- 6 evrong01 evrong01 4.0K Aug 15 15:54 Scripts
-rw-rw---- 1 evrong01 evrong01 2.9K Aug 15 15:54 Snakefile

@gevro
Copy link
Author

gevro commented Aug 15, 2020

Now I created the root directory manually. Then I copied all the processed_data and processed_results except for mae. Then I ran snakemake --touch and it ran fine. Then I ran snakemake -n and it ran fine.

But now I am running snakemake unlock again, and again I'm getting the same error. There must be some bug in the mae code.

Building DAG of jobs...
MissingInputException in line 38 of /gpfs/home/evrong01/.local/lib/python3.6/site-packages/wbuild/wBuild.snakefile:
Missing input files for rule markdown:
MAE/UDP--v32_results.md
[Sat Aug 15 16:02:50 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest3/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest3/.drop/tmp/config.yaml
(exited with non-zero exit code)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest3/.drop/modules/mae-pipeline/.snakemake/log/2020-08-15T160249.624667.snakemake.log

@gevro
Copy link
Author

gevro commented Aug 15, 2020

Also, running 'snakemake aberrantExpression' finishes and says "Nothing to be done.", but it didn't recreate the html_output directory.

So just moving the processed_data and processed_results files to the new directory doesn't help, because then it doesn't recreate the html_output directory. I guess I can just copy those over too.

But still the problem is I can't get mae to run. Unlock doesn't work, and there is no way around it.

@gevro
Copy link
Author

gevro commented Aug 15, 2020

Also, if I run snakemake mae without doing unlock in the new directory, it also says "Nothing to be done", even though there is no mae results in the folder. I suspect that snakemake --touch makes it think that mae is already finished.

@gevro
Copy link
Author

gevro commented Aug 15, 2020

And if I run snakemake mae without doing unlock in the old directory, I get this error. Maybe this is where the bug is.

Subworkflow AE: Nothing to be done.
Subworkflow AS: Nothing to be done.
Executing subworkflow MAE.
Structuring dependencies...
Dependencies file generated.

TypeError in line 34 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
unsupported operand type(s) for /: 'str' and 'str'
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 64, in
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 34, in getChrMap

@gevro
Copy link
Author

gevro commented Aug 15, 2020

In the new directory, I manually deleted the two MAE.done files and now mae seems to run. However, it hit another new error:

Error in eval(jsub, SDenv, parent.frame()) :
object 'gene_status' not found

Calls: [ -> [.data.table -> eval -> eval
Execution halted
[Sat Aug 15 16:19:32 2020]
Error in rule Scripts_MAE_gene_name_mapping_R:
jobid: 14
output: /gpfs/scratch/evrong01/droptest3/root/processed_data/mae/gene_name_mapping_v32.tsv

@mumichae
Copy link
Collaborator

mumichae commented Aug 16, 2020

I think things will just get too messy from here. I'd suggest that you try to use the version of drop that I'm still developing on, as it will make reruns so much easier. The output structure of the HTML will be a bit different and it will still miss some features, but the pipeline core should be fully functional.

If you already have a local clone of the drop repo set a new remote to git@github.com:mumichae/drop.git

git add remote -n mumichae git@github.com:mumichae/drop.git

Otherwise just clone the above URL.

Then checkout import_counts

git checkout import_counts

Next, you need to install drop to your current drop conda environment (as you already have all the dependencies).
Also, you need to use a different version of wbuild as I am developing there as well. So you need to remove the current version wbuild first.

conda remove wbuild
pip install -e <root_of_drop_repo> # should install the correct version of wbuild

Let me know once you have installed the new version and whether you can get the drop demo running with dryrun and proper execution. Then we can proceed.

@gevro
Copy link
Author

gevro commented Aug 16, 2020

I'm getting an installation error:
(/gpfs/home/evrong01/bin/drop_conda) [evrong01@bigpurple-ln2 drop_dev]$ pip install -e .
Traceback (most recent call last):
File "/gpfs/home/evrong01/bin/drop_conda/bin/pip", line 6, in
from pip._internal.cli.main import main
ModuleNotFoundError: No module named 'pip._internal.cli.main'

@mumichae
Copy link
Collaborator

Seems as though your pip is not working. It could be that a wrong version is referenced (seeing as it has worked before). Make sure you are using the pip provided by anaconda. conda list should definitely contain pip. Its version should match with pip --version

@gevro
Copy link
Author

gevro commented Aug 16, 2020

conda remove wbuild broke pip. I reinstalled pip and then managed to install the alternate drop version.

snakemake -n gives this next error:

The downloaded source packages are in
‘/tmp/RtmpkQB0PU/downloaded_packages’
Installation path not writeable, unable to update packages: AnnotationDbi,
AnnotationHub, BiocFileCache, bit, bit64, broom, data.table, DelayedArray,
dplyr, DT, ExperimentHub, fs, GenomicFeatures, Hmisc, httr, loo, mice,
pillar, pkgbuild, ps, quantreg, RcppArmadillo, Rhdf5lib, rlang, rstan,
StanHeaders, SummarizedExperiment, sys, tibble, tidyr, vctrs, xfun, XML
installed OUTRIDER
install c-mertes/FRASER
Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.0 (2020-04-24)
Installing github package(s) 'c-mertes/FRASER'
Error: package 'remotes' not installed in library path(s)
/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0
/gpfs/share/apps/R/4.0.0/lib64/R/library
install with 'install("remotes")'
Execution halted
CalledProcessError in line 5 of /gpfs/scratch/evrong01/dropdemo/Snakefile:
Command '['Rscript', PosixPath('/gpfs/data/bin/drop_dev/drop/installRPackages.R'), PosixPath('/gpfs/data/bin/drop_dev/drop/requirementsR.txt')]' returned non-zero exit status 1.
File "/gpfs/scratch/evrong01/dropdemo/Snakefile", line 5, in
File "/gpfs/data/bin/drop_dev/drop/setupDrop.py", line 33, in installRPackages
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/subprocess.py", line 369, in check_returncode

I went into R and manually installed remotes. Then I get this next error:

  • installing source package ‘Rhdf5lib’ ...
    ** using staged installation
    checking for gcc... gcc
    checking whether the C compiler works... yes
    checking for C compiler default output file name... a.out
    checking for suffix of executables...
    checking whether we are cross compiling... no
    checking for suffix of object files... o
    checking whether we are using the GNU C compiler... yes
    checking whether gcc accepts -g... yes
    checking for gcc option to accept ISO C89... none needed
    checking whether we are using the GNU C++ compiler... yes
    checking whether g++ -std=gnu++11 accepts -g... yes
    checking whether C++ compiler accepts -w... yes
    checking how to run the C++ preprocessor... g++ -std=gnu++11 -E
    checking for grep that handles long lines and -e... /usr/bin/grep
    checking for egrep... /usr/bin/grep -E
    checking for ANSI C header files... yes
    checking for sys/types.h... yes
    checking for sys/stat.h... yes
    checking for stdlib.h... yes
    checking for string.h... yes
    checking for memory.h... yes
    checking for strings.h... yes
    checking for inttypes.h... yes
    checking for stdint.h... yes
    checking for unistd.h... yes
    checking for zlib.h... yes
    checking curl/curl.h usability... yes
    checking curl/curl.h presence... yes
    checking for curl/curl.h... yes
    checking openssl/evp.h usability... yes
    checking openssl/evp.h presence... yes
    checking for openssl/evp.h... yes
    checking openssl/hmac.h usability... yes
    checking openssl/hmac.h presence... yes
    checking for openssl/hmac.h... yes
    checking openssl/sha.h usability... yes
    checking openssl/sha.h presence... yes
    checking for openssl/sha.h... yes
    checking for curl_global_init in -lcurl... no
    checking for EVP_sha256 in -lcrypto... yes
    untarring hdf5small_cxx_hl_1.10.6.tar.gz ...
    building the szip library...
    checking for a BSD-compatible install... /usr/bin/install -c
    checking whether build environment is sane... yes
    checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
    checking for gawk... gawk
    checking whether make sets $(MAKE)... yes
    checking whether make supports nested variables... yes
    checking whether to enable maintainer-specific portions of Makefiles... no
    checking build system type... x86_64-unknown-linux-gnu
    checking host system type... x86_64-unknown-linux-gnu
    checking for config x86_64-unknown-linux-gnu... no
    checking for config x86_64-unknown-linux-gnu... no
    checking for config unknown-linux-gnu... no
    checking for config unknown-linux-gnu... no
    checking for config x86_64-linux-gnu... no
    checking for config x86_64-linux-gnu... no
    checking for config x86_64-unknown... no
    checking for config linux-gnu... found
    compiler 'gcc' is GNU gcc-6.3.0
    checking for gcc... gcc
    checking whether the C compiler works... yes
    checking for C compiler default output file name... a.out
    checking for suffix of executables...
    checking whether we are cross compiling... no
    checking for suffix of object files... o
    checking whether we are using the GNU C compiler... yes
    checking whether gcc accepts -g... yes
    checking for gcc option to accept ISO C89... none needed
    checking whether gcc understands -c and -o together... yes
    checking for style of include used by make... GNU
    checking dependency style of gcc... gcc3
    checking how to run the C preprocessor... /gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp
    configure: error: in /tmp/RtmpxYUhev/R.INSTALL7409582d709f/Rhdf5lib/src/hdf5/szip': configure: error: C preprocessor "/gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp" fails sanity check See config.log' for more details
    make: *** No targets specified and no makefile found. Stop.
    make: *** No rule to make target install'. Stop. building the hdf5 library... checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /usr/bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking whether make supports nested variables... (cached) yes checking whether to enable maintainer-specific portions of Makefiles... no checking build system type... x86_64-unknown-linux-gnu checking host system type... x86_64-unknown-linux-gnu checking shell variables initial values... done checking if basename works... yes checking if xargs works... yes checking for cached host... none checking for config x86_64-unknown-linux-gnu... no checking for config x86_64-unknown-linux-gnu... no checking for config unknown-linux-gnu... no checking for config unknown-linux-gnu... no checking for config x86_64-linux-gnu... no checking for config x86_64-linux-gnu... no checking for config x86_64-unknown... no checking for config linux-gnu... found compiler 'gcc' is GNU gcc-6.3.0 compiler 'g++ -std=gnu++11' is GNU g++-6.3.0 checking for config ./config/site-specific/host-bigpurple-ln2... no checking build mode... production checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking for style of include used by make... GNU checking dependency style of gcc... gcc3 checking if unsupported combinations of configure options are allowed... no checking how to run the C preprocessor... /gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp configure: error: in /tmp/RtmpxYUhev/R.INSTALL7409582d709f/Rhdf5lib/src/hdf5':
    configure: error: C preprocessor "/gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp" fails sanity check
    See `config.log' for more details
    sh configure
    configure: error: cannot find sources (src/H5.c) in /gpfs/share/apps/samtools/1.9/bcftools-1.9 or ..
    make: *** [_config] Error 1
    configure: HDF5_INCLUDE=hdf5/src
    configure: HDF5_CXX_INCLUDE=hdf5/c++/src
    configure: HDF5_HL_INCLUDE=hdf5/hl/src
    configure: HDF5_HL_CXX_INCLUDE=hdf5/hl/c++/src
    configure: HDF5_LIB=hdf5/src/.libs/libhdf5.a
    configure: HDF5_CXX_LIB=hdf5/c++/src/.libs/libhdf5_cpp.a
    configure: HDF5_HL_LIB=hdf5/hl/src/.libs/libhdf5_hl.a
    configure: HDF5_HL_CXX_LIB=hdf5/hl/c++/src/.libs/libhdf5_hl_cpp.a
    configure: SZIP_LIB=hdf5/szip/src/.libs/libsz.a
    configure: creating ./config.status
    config.status: creating src/Makevars
    ** libs
    mkdir -p "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
    cp "hdf5/src/".h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
    cp "hdf5/c++/src/"
    .h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
    cp "hdf5/hl/src/".h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
    cp "hdf5/hl/c++/src/"
    .h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
    mkdir -p "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/lib/"
    cp "hdf5/src/.libs/libhdf5.a" "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/lib/"
    cp: cannot stat ‘hdf5/src/.libs/libhdf5.a’: No such file or directory
    make: *** [copying] Error 1
    ERROR: compilation failed for package ‘Rhdf5lib’
  • removing ‘/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/Rhdf5lib’
    Error: Failed to install 'FRASER' from GitHub:
    (converted from warning) installation of package ‘Rhdf5lib’ had non-zero exit status
    Execution halted
    CalledProcessError in line 5 of /gpfs/scratch/evrong01/dropdemo/Snakefile:
    Command '['Rscript', PosixPath('/gpfs/data/bin/drop_dev/drop/installRPackages.R'), PosixPath('/gpfs/data/bin/drop_dev/drop/requirementsR.txt')]' returned non-zero exit status 1.
    File "/gpfs/scratch/evrong01/dropdemo/Snakefile", line 5, in
    File "/gpfs/data/bin/drop_dev/drop/setupDrop.py", line 33, in installRPackages
    File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/subprocess.py", line 369, in check_returncode

@gevro
Copy link
Author

gevro commented Aug 16, 2020

Have you considered distributing drop as a docker? It might solve all of these installation/configuration issues.

@mumichae
Copy link
Collaborator

mumichae commented Aug 17, 2020

yes, we have a docker https://github.com/c-mertes/docker_drop, but it uses the old version of drop and we would have to take the same steps to update it. You can try to use that instead but you'll need to update drop, as the bugs you are experiencing are still there. Just note that it will need a considerable amount of storage (15GB compressed). I haven't managed to test it on my machine yet due to limited storage though, so I can only help you with the issues that are not related to docker.

As for your error message, it seems as though you aren't accessing the conda environment properly, as you still have other versions of R that you are accessing. In conda, you shouldn't be reinstalling the R packages, as they should already exist. Maybe check your PATH variable echo $PATH as it could set Locations to R and python on top of conda. And make sure you are using the conda R version, not any other local one.

You can decide which environment to use. For docker, note that you'll have to mount your data in the specific folder structure as described in the README https://github.com/c-mertes/docker_drop. I'm not sure if using your old output data works, but you can definitely give it a try.

@gevro
Copy link
Author

gevro commented Aug 17, 2020

Are you updated Docker-drop regularly on https://github.com/c-mertes/docker_drop? Does it have the most up to date version of drop?


When I uninstalled wbuild, it messed up the conda environment. It removed and switched many dependencies and for some reason caused R to get uninstalled. I'm not sure why.

This is why in my opinion dockers are better. Conda is in general a buggy system, because it is not completely disconnected from the local environment and therefore it is not good at handling complicated dependencies. Even though docker requires more memory, it will save users the headache of trying to get the pipeline working.

@gevro
Copy link
Author

gevro commented Aug 17, 2020

I fixed the R PATH issue for the conda environment. But snakemake -n for the demo still gave an error:
Warning messages:
1: In install.packages(...) :
installation of package ‘BiocFileCache’ had non-zero exit status
2: In install.packages(...) :
installation of package ‘biomaRt’ had non-zero exit status
3: In install.packages(...) :
installation of package ‘GenomicFeatures’ had non-zero exit status
4: In install.packages(...) :
installation of package ‘VariantAnnotation’ had non-zero exit status
5: In INSTALL(packages[i, 1]) :
installation of package ‘ggplot2’ had non-zero exit status
AttributeError in line 12 of /gpfs/scratch/evrong01/droptest3/Snakefile:
'Config' object has no attribute 'getConfig'
File "/gpfs/scratch/evrong01/droptest3/Snakefile", line 12, in
File "/gpfs/data/bin/drop_dev/drop/config/DropConfig.py", line 26, in init

I tried to manually installed these and looks like the error is because dplyr and associated packages were not installed. It gives an option to install it, so it must be that when DROP runs the demo it doesn't answer 'yes' to install dependencies such as dplyr.

@gevro
Copy link
Author

gevro commented Aug 17, 2020

I succeeded in getting all the packages installed. Now I'm getting this error for the demo:

droptest3]$ snakemake -n
check for missing R packages
AttributeError in line 12 of /gpfs/scratch/evrong01/droptest3/Snakefile:
'Config' object has no attribute 'getConfig'
File "/gpfs/scratch/evrong01/droptest3/Snakefile", line 12, in
File "/gpfs/data/evronylab/bin/drop_dev/drop/config/DropConfig.py", line 26, in init

@mumichae
Copy link
Collaborator

mumichae commented Aug 18, 2020

That's a bit surprising, as you had all the dependencies before and wbuild doesn't depend on any R packages, so removing wbuild shouldn't remove any R dependencies. Are you still using the correct R library path .libPaths()? You need to make sure that the standard packages are running properly in R (and don't have to be reinstalled, as they should already be present in conda). Otherwise you will get errors later on.

If you want to use the docker instead, you only need to do download the developer version of drop (as you did before on your local machine) and remove wbuild using pip, as described below. Then you can reinstall the newest drop version from the local repo. Don't forget to prepare and mount your input data as described in the README.

Updating DROP and wbuild
The error you got is what we want, as you are now using the new drop version. Unfortunately wbuild doesn't seem to have been uninstalled properly so try with

pip uninstall wbuild

then conda list should not contain wbuild anymore.
Continue with reinstalling drop

pip install -e <path-to-drop>

And make sure wbuild is installed. In case that doesn't work, try pip uninstall drop before reinstalling it.

Then you should have the correct dependencies to get a successful dryrun.

@gevro
Copy link
Author

gevro commented Aug 18, 2020

For some reason now gatk is not being found in my conda environment. It was there before. I think it's too complicated. I'm not sure what to try next. If there is a simpler way to setup the drop environment, I'm happy to try, but conda seems complicated. If you have a docker that is built and ready to go, I'm happy to try that.

@mumichae
Copy link
Collaborator

It seems as though you aren't properly accessing your conda environment. You can verify with conda env list to see which one you're in.

If you are working with docker, just follow the instructions in https://github.com/c-mertes/docker_drop and then follow my previous instructions on updating to the developer version of drop (git@github.com:mumichae/drop.git, branch import_counts) and wbuild (git@github.com:mumichae/wBuild.git branch subindex). As they are still work in progress, there is no dedicated docker container for that version.

@gevro
Copy link
Author

gevro commented Aug 18, 2020

Thanks. To make it more simple for users, do you have a docker that is already built and confirmed to work on docker hub?

@mumichae
Copy link
Collaborator

Well yes, we have a prebuilt docker container that os about 15GB in compressed size. You don't need to rebuild bit just run it as described in the README and it'll be downloaded from the mertes/drop repository (which will take time)

docker run --rm --network=host -ti mertes/drop

for writing the demo or if you are mounting your own data:

docker run --rm -ti -v ... etc.

It has the old version of drop, so you'll need to manually update to use any github developer version (what I explained before).

Does that answer your question?

@gevro
Copy link
Author

gevro commented Aug 18, 2020

Ok, I think to keep things simple I will just wait until a more stable version of drop is released with the above issues resolved. I would appreciate if you let me know once it is released. Thanks.

@gevro
Copy link
Author

gevro commented Sep 3, 2020

Hi, We have a new RNA-seq sample from a syndromic family that we would like to try on DROP. Are any of the new versions of DROP available yet? Thanks.

@mumichae mumichae self-assigned this Oct 21, 2020
@mumichae mumichae added the bug Something isn't working label Oct 21, 2020
This was referenced Oct 26, 2020
@mumichae mumichae closed this as completed Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants