Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in rule AberrantSplicing_pipeline_Counting_00_define_datasets_from_anno_R #492

Closed
frankaugs opened this issue Sep 13, 2023 · 3 comments

Comments

@frankaugs
Copy link

Dear DROP team,
Thank you for developing DROP. Recently, I raise an error while running snakemake aberrantSplicing with my own and external RNA-seq data:

rule AberrantSplicing_pipeline_Counting_00_define_datasets_from_anno_R:
input: /home/ngs/rnaseq/03align_out_5/drop_1/sampleAnnotation_s.tsv, Scripts/AberrantSplicing/pipeline/Counting/00_define_datasets_from_anno.R
output: /home/ngs/rnaseq/03align_out_5/drop_1/project3/Output/processed_data/aberrant_splicing/annotations/fraser.tsv, /home/ngs/rnaseq/03align_out_5/drop_1/project3/htmlOutput/AberrantSplicing/annotations/fraser.html
log: /home/ngs/rnaseq/03align_out_5/drop_1/.drop/tmp/AS/fraser/00_defineDataset.Rds
jobid: 11
reason: Missing output files: /home/ngs/rnaseq/03align_out_5/drop_1/project3/Output/processed_data/aberrant_splicing/annotations/fraser.tsv
wildcards: dataset=fraser
resources: tmpdir=/tmp

Config
projectTitle: "DROP: RNAseq"
root: /home/ngs/rnaseq/03align_out_5/drop_1/project3/Output # root directory of all output objects and tables
htmlOutputPath: /home/ngs/rnaseq/03align_out_5/drop_1/project3/htmlOutput # path for HTML rendered reports
indexWithFolderName: true # whether the root base name should be part of the index name

hpoFile: null # if null, downloads it from webserver
sampleAnnotation: /home/ngs/rnaseq/03align_out_5/drop_1/sampleAnnotation_s.tsv # path to sample annotation (see documentation on how to create it)

geneAnnotation:
v29: /home/ngs/rnaseq/03align_out_5/drop_1/gencode.v29.gtf
genomeAssembly: hg19
genome: /home/ngs/rnaseq/03align_out_5/drop_1/hg19_ucsc.fa # path to reference genome sequence in fasta format.
# You can define multiple reference genomes in yaml format, ncbi: path/to/ncbi, ucsc: path/to/ucsc
# the keywords that define the path should be in the GENOME column of the sample annotation table

random_seed: false # just for demo data, remove for analysis

exportCounts:
# specify which gene annotations to include and which
# groups to exclude when exporting counts
geneAnnotations:
- v29
excludeGroups:
- null

aberrantExpression:
run: fasle
groups:
- outrider
fpkmCutoff: 1
implementation: autoencoder
padjCutoff: 0.05
zScoreCutoff: 0
genesToTest: null
maxTestedDimensionProportion: 3
yieldSize: 2000000

aberrantSplicing:
run: true
groups:
- fraser
recount: false
longRead: false
keepNonStandardChrs: false
filter: true
minExpressionInOneSample: 20
quantileMinExpression: 10
minDeltaPsi: 0.05
implementation: PCA
padjCutoff: 0.1
maxTestedDimensionProportion: 6

mae:
run: false
groups:
- group1
- group2
- group3
gatkIgnoreHeaderCheck: true
padjCutoff: 0.05
allelicRatioCutoff: 0.8
addAF: true
maxAF: 0.001
maxVarFreqCohort: 0.05
# VCF-BAM matching
qcVcf: Data/qc_vcf_1000G.vcf.gz
qcGroups:
- mae
dnaRnaMatchCutoff: 0.85

rnaVariantCalling:
run: false
groups:
- batch_0
highQualityVCFs:
- Data/Mills_and_1000G_gold_standard.indels.hg19.sites.chrPrefix.vcf.gz
- Data/1000G_phase1.snps.high_confidence.hg19.sites.chrPrefix.vcf.gz
dbSNP: Data/00-All.vcf.gz
repeat_mask: Data/hg19_repeatMasker_sorted.chrPrefix.bed
createSingleVCF: true
addAF: true
maxAF: 0.001
maxVarFreqCohort: 0.05
hcArgs: ""
minAlt: 3
yieldSize: 100000

tools:
gatkCmd: gatk
bcftoolsCmd: bcftools
samtoolsCmd: samtools

Command: snakemake aberrantSplicing --cores 10

(94GB memory)
I wonder if you could provide a solution to this problem? Thank you!
Best regards,
Frank
2023-09-12.snakemake.log
sampleAnnotation_s.csv

@vyepez88
Copy link
Collaborator

Hi Frank,
Thanks for using DROP and reporting this.

  • which DROP version are you using?
  • we saw a typo in run: false in the aberrant expression dictionary
  • the sample annotation file seems to contain quotation marks, can you please remove them?
  • did you manage to run the demo?
  • can you please execute snakemake sampleAnnotation and inspect the output html file. did everything look the way it should?

@frankaugs
Copy link
Author

Hi Vicente,

Thank you so much for your response.

1.The drop version is 1.3.3.
2.Sure, I have crorected the typo, by the way, if the typo appears here, the drop will not run the module, right?
3.Thank you for your suggestion, I have checked the sample annotation file.
4. I tried to run the demo at the beginning, but it didn't work. When I input drop demo in the terminal, only six files were produced (config.yaml; readme.md;Scripts;Snakefile;.drop;.wBuild). I guess there's something wrong with the environment configuration.
5. Yes, all the files look good.

Here are the command responses:

$ mamba create -n drop3 -c conda-forge -c bioconda drop --override-channels
... ...
... ...
Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: \
SafetyError: The package for r-base located at /home/ngs/anaconda3/pkgs/r-base-4.3.1-h29c4799_3
appears to be corrupted. The path 'lib/R/doc/html/packages.html'
has an incorrect size.
reported size: 3423 bytes
actual size: 54045 bytes

done
Executing transaction: \
|
done

To activate this environment, use

 $ mamba activate drop3

To deactivate an active environment, use

 $ mamba deactivate

$drop demo
create /home/ngs/rnaseq/03align_out_6/drop2/Scripts
create /home/ngs/rnaseq/03align_out_6/drop2/.drop
create /home/ngs/rnaseq/03align_out_6/drop2/.drop/tmp
/home/ngs/rnaseq/03align_out_6/drop2/Scripts/AberrantExpression/pipeline is not a directory, copy over from drop base
/home/ngs/rnaseq/03align_out_6/drop2/Scripts/AberrantSplicing/pipeline is not a directory, copy over from drop base
/home/ngs/rnaseq/03align_out_6/drop2/Scripts/MonoallelicExpression/pipeline is not a directory, copy over from drop base
/home/ngs/rnaseq/03align_out_6/drop2/Scripts/rnaVariantCalling/pipeline is not a directory, copy over from drop base
init...done
download data
File ‘/tmp/main.zip’ already there; not retrieving.

Archive: /tmp/main.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /tmp/main.zip or
/tmp/main.zip.zip, and cannot find /tmp/main.zip.ZIP, period.
Traceback (most recent call last):
File "/home/ngs/anaconda3/envs/rna/bin/drop", line 10, in
sys.exit(main())
File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/drop/cli.py", line 174, in demo
response.check_returncode()
File "/home/ngs/anaconda3/envs/rna/lib/python3.10/subprocess.py", line 457, in check_returncode
raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['bash', '/home/ngs/anaconda3/envs/rna/lib/python3.10/site-packages/drop/download_data.sh']' returned non-zero exit status 9.

And then, I input command snakemake --cores 1 -n. As expected, this program does not work properly. So I tried to run the following modules directly. Surprisingly, both the subsequent Expression and Splicing could proceed normally, respectively. I wonder if you could tell me that whether this problem will affect the Expression or Splicing module in some way?

On the original question:

After my testing, I found why the problem appears. I deleted some contents in the Splicing module of the config file by mistake when I tried to run the Expression module. Now, it worked!

The mistake deletion content:

FRASER1 configuration

FRASER_version: "FRASER" 
deltaPsiCutoff : 0.3 
quantileForFiltering: 0.95 
### For FRASER2, use the follwing parameters instead of the 3 lines above:
# FRASER_version: "FRASER2"
# deltaPsiCutoff : 0.1
# quantileForFiltering: 0.75

Many thanks!

Frank

@vyepez88
Copy link
Collaborator

Great that it worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants