Skip to content

Commit

Permalink
Merge pull request #696 from asp8200/md5_check_of_test_output
Browse files Browse the repository at this point in the history
Adding md5-sums to the test-yml-files
  • Loading branch information
asp8200 committed Aug 23, 2022
2 parents 15ac00d + 7749eb7 commit f84cae4
Show file tree
Hide file tree
Showing 21 changed files with 1,481 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#679](https://github.com/nf-core/sarek/pull/679) - Back to `dev`
- [#685](https://github.com/nf-core/sarek/pull/685) - Updating the nf-core modules used by Sarek.
- [#691](https://github.com/nf-core/sarek/pull/691) - To run the same pytest as before locally, use `PROFILE=docker`
- [#696](https://github.com/nf-core/sarek/pull/696) - Adding check of md5-sums in CI-tests.

### Fixed

Expand Down
78 changes: 78 additions & 0 deletions tests/test_aligner.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,40 +6,73 @@
- preprocessing
files:
- path: results/csv/markduplicates.csv
md5sum: 0d6120bb99e92f6810343270711ca53e
- path: results/csv/markduplicates_no_table.csv
md5sum: 2a2d3d4842befd4def39156463859ee3
- path: results/csv/recalibrated.csv
md5sum: 42628ec994c16f565e5407b40a9c1ac3
- path: results/multiqc
- path: results/preprocessing/markduplicates/test/test.md.cram
# binary changing on reruns
- path: results/preprocessing/markduplicates/test/test.md.cram.crai
# binary changing on reruns
- path: results/preprocessing/recal_table/test/test.recal.table
md5sum: 4ac774bf5f1157e77426fd82f5ac0fbe
- path: results/preprocessing/recalibrated/test/test.recal.cram
# binary changing on reruns
- path: results/preprocessing/recalibrated/test/test.recal.cram.crai
# binary changing on reruns
- path: results/reference/bwamem2/genome.fasta.0123
md5sum: d73300d44f733bcdb7c988fc3ff3e3e9
- path: results/reference/bwamem2/genome.fasta.amb
md5sum: 1891c1de381b3a96d4e72f590fde20c1
- path: results/reference/bwamem2/genome.fasta.ann
md5sum: 2df4aa2d7580639fa0fcdbcad5e2e969
- path: results/reference/bwamem2/genome.fasta.bwt.2bit.64
md5sum: cd4bdf496eab05228a50c45ee43c1ed0
- path: results/reference/bwamem2/genome.fasta.pac
md5sum: 8569fbdb2c98c6fb16dfa73d8eacb070
- path: results/reference/dbsnp/dbsnp_146.hg38.vcf.gz.tbi
md5sum: 628232d0c870f2dbf73c3e81aff7b4b4
- path: results/reference/dict/genome.dict
md5sum: 2433fe2ba31257337bf4c4bd4cb8da15
- path: results/reference/fai/genome.fasta.fai
md5sum: 3520cd30e1b100e55f578db9c855f685
- path: results/reference/intervals/chr22_1-40001.bed
md5sum: 87a15eb9c2ff20ccd5cd8735a28708f7
- path: results/reference/intervals/chr22_1-40001.bed.gz
md5sum: d3341fa28986c40b24fcc10a079dbb80
- path: results/reference/intervals/genome.bed
md5sum: a87dc7d20ebca626f65cc16ff6c97a3e
- path: results/reference/known_indels/mills_and_1000G.indels.vcf.gz.tbi
md5sum: 1bb7ab8f22eb798efd796439d3b29b7a
- path: results/reports/fastqc/test-test_L1
- path: results/reports/markduplicates/test/test.md.metrics
contains: ["test 8547 767 84 523391 3882 0 0 0.385081", "1.0 767 767"]
- path: results/reports/mosdepth/test/test.md.mosdepth.global.dist.txt
md5sum: 76fa71922a3f748e507c2364c531dfcb
- path: results/reports/mosdepth/test/test.md.mosdepth.region.dist.txt
md5sum: abc5df85e302b79985627888870882da
- path: results/reports/mosdepth/test/test.md.mosdepth.summary.txt
md5sum: d536456436eb275159b8c6af83213d80
- path: results/reports/mosdepth/test/test.md.regions.bed.gz
md5sum: 38fe39894abe62e38f8ac214cba64f2b
- path: results/reports/mosdepth/test/test.md.regions.bed.gz.csi
md5sum: b1c2a861f64e20a94108a6de3b76c582
- path: results/reports/mosdepth/test/test.recal.mosdepth.global.dist.txt
md5sum: 76fa71922a3f748e507c2364c531dfcb
- path: results/reports/mosdepth/test/test.recal.mosdepth.region.dist.txt
md5sum: abc5df85e302b79985627888870882da
- path: results/reports/mosdepth/test/test.recal.mosdepth.summary.txt
md5sum: d536456436eb275159b8c6af83213d80
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz
md5sum: 38fe39894abe62e38f8ac214cba64f2b
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz.csi
md5sum: b1c2a861f64e20a94108a6de3b76c582
- path: results/reports/samtools/test/test.md.cram.stats
md5sum: dcf70bbcfb92e01027978f28d2035d78
- path: results/reports/samtools/test/test.recal.cram.stats
md5sum: 5528d952f5dc74a39f28e27165bf96be
- name: Run dragmap
command: nextflow run main.nf -profile test,docker --aligner dragmap --save_reference
tags:
Expand All @@ -48,40 +81,85 @@
- preprocessing
files:
- path: results/csv/markduplicates.csv
md5sum: 0d6120bb99e92f6810343270711ca53e
- path: results/csv/markduplicates_no_table.csv
md5sum: 2a2d3d4842befd4def39156463859ee3
- path: results/csv/recalibrated.csv
md5sum: 42628ec994c16f565e5407b40a9c1ac3
- path: results/multiqc
- path: results/preprocessing/markduplicates/test/test.md.cram
# binary changing on reruns
- path: results/preprocessing/markduplicates/test/test.md.cram.crai
# binary changing on reruns
- path: results/preprocessing/recal_table/test/test.recal.table
md5sum: 75ba4376a17ca69c5134153302f82e92
- path: results/preprocessing/recalibrated/test/test.recal.cram
# binary changing on reruns
- path: results/preprocessing/recalibrated/test/test.recal.cram.crai
# binary changing on reruns
- path: results/reference/dbsnp/dbsnp_146.hg38.vcf.gz.tbi
md5sum: 628232d0c870f2dbf73c3e81aff7b4b4
- path: results/reference/dict/genome.dict
md5sum: 2433fe2ba31257337bf4c4bd4cb8da15
- path: results/reference/dragmap/hash_table.cfg
# hash_table.cfg contains many strings which we could test for - which do we want to test?
contains:
[
"reference_sequences = 1",
"reference_len = 368640",
"reference_len_raw = 40001",
"reference_len_not_n = 40001",
"reference_alt_seed = 204800",
]
- path: results/reference/dragmap/hash_table.cfg.bin
# binary changing on reruns
- path: results/reference/dragmap/hash_table.cmp
md5sum: 1caab4ffc89f81ace615a2e813295cf4
- path: results/reference/dragmap/hash_table_stats.txt
# hash_table_stats.txt contains many string which we could test for - which do we want to test?
contains: ["A bases: 10934", "C bases: 8612", "G bases: 8608", "T bases: 11847"]
- path: results/reference/dragmap/ref_index.bin
md5sum: dbb5c7d26b974e0ac338024fe4535044
- path: results/reference/dragmap/reference.bin
md5sum: be67b80ee48aa96b383fd72f1ccfefea
- path: results/reference/dragmap/repeat_mask.bin
md5sum: 294939f1f80aa7f4a70b9b537e4c0f21
- path: results/reference/dragmap/str_table.bin
md5sum: 45f7818c4a10fdeed04db7a34b5f9ff1
- path: results/reference/fai/genome.fasta.fai
md5sum: 3520cd30e1b100e55f578db9c855f685
- path: results/reference/intervals/chr22_1-40001.bed
md5sum: 87a15eb9c2ff20ccd5cd8735a28708f7
- path: results/reference/intervals/chr22_1-40001.bed.gz
md5sum: d3341fa28986c40b24fcc10a079dbb80
- path: results/reference/intervals/genome.bed
md5sum: a87dc7d20ebca626f65cc16ff6c97a3e
- path: results/reference/known_indels/mills_and_1000G.indels.vcf.gz.tbi
md5sum: 1bb7ab8f22eb798efd796439d3b29b7a
- path: results/reports/fastqc/test-test_L1
- path: results/reports/markduplicates/test/test.md.metrics
contains: ["LB0 13607 543 161 518779 6410 0 0 0.436262"]
- path: results/reports/mosdepth/test/test.md.mosdepth.global.dist.txt
md5sum: be1a800868fc1ce26711654525224e59
- path: results/reports/mosdepth/test/test.md.mosdepth.region.dist.txt
md5sum: 2a3f0fab66518ef0786235470f1f28d0
- path: results/reports/mosdepth/test/test.md.mosdepth.summary.txt
md5sum: d38ab9b0e0e551dc22919304929dd71c
- path: results/reports/mosdepth/test/test.md.regions.bed.gz
md5sum: 0d92f4c698a6476ccaf798aa31a557bc
- path: results/reports/mosdepth/test/test.md.regions.bed.gz.csi
md5sum: d5f1c9389ecf52ba839e834780a94549
- path: results/reports/mosdepth/test/test.recal.mosdepth.global.dist.txt
md5sum: be1a800868fc1ce26711654525224e59
- path: results/reports/mosdepth/test/test.recal.mosdepth.region.dist.txt
md5sum: 2a3f0fab66518ef0786235470f1f28d0
- path: results/reports/mosdepth/test/test.recal.mosdepth.summary.txt
md5sum: d38ab9b0e0e551dc22919304929dd71c
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz
md5sum: 0d92f4c698a6476ccaf798aa31a557bc
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz.csi
md5sum: d5f1c9389ecf52ba839e834780a94549
- path: results/reports/samtools/test/test.md.cram.stats
md5sum: f2ae8b531aa1fb2fbffe9a92e4c81493
- path: results/reports/samtools/test/test.recal.cram.stats
md5sum: f7bab59db4fb8ab49eea71b668d351d5
61 changes: 61 additions & 0 deletions tests/test_annotation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,52 @@
- snpeff
files:
- path: results/annotation/test/test_snpEff.ann.vcf.gz
md5sum: 01f24fdd76f73eefd695beea7b3d3d8e
- path: results/annotation/test/test_snpEff.ann.vcf.gz.tbi
md5sum: 51e418d9be9bb33f1d4123493b15b6c9
- path: results/multiqc
- path: results/reports/snpeff/test/snpEff_summary.html
# snpEff_summary.html changes md5sums on reruns.
contains: ["<b> Genome total length </b>", "<td> 100,286,402 </td>", "<td> MT192765.1 </td>"]
- path: results/reports/snpeff/test/test_snpEff.csv
# test_snpEff.csv changes md5sums on reruns.
contains:
[
"Values , 50,100",
"Count , 1,8",
"Reference , 0",
"Het , 1",
"Hom , 8",
"Missing , 0",
"MT192765.1, Position,0,1",
"MT192765.1,Count,0,0",
]
- path: results/reports/snpeff/test/test_snpEff.genes.txt
md5sum: 130536bf0237d7f3f746d32aaa32840a
- name: Run VEP
command: nextflow run main.nf -profile test,annotation --tools vep --skip_tools multiqc
tags:
- annotation
- vep
files:
- path: results/annotation/test/test_VEP.ann.vcf.gz
# binary changes md5sums on reruns.
- path: results/annotation/test/test_VEP.ann.vcf.gz.tbi
md5sum: 4cb176febbc8c26d717a6c6e67b9c905
- path: results/reports/EnsemblVEP/test/test_VEP.summary.html
# test_VEP.summary.html changes md5sums on reruns.
contains:
[
"<tr><td>Input file</td><td>test.vcf.gz</td></tr><tr><td>Output file</td><td>test_VEP.ann.vcf</td></tr>",
"General statistics",
"Lines of input read",
"Variants processed",
"Variants filtered out",
"Novel / existing variants",
"Overlapped genes",
"Overlapped transcripts",
"Overlapped regulatory features",
]
- name: Run snpEff followed by VEP
command: nextflow run main.nf -profile test,annotation --tools merge --skip_tools multiqc
tags:
Expand All @@ -28,8 +60,23 @@
- vep
files:
- path: results/annotation/test/test_snpEff_VEP.ann.vcf.gz
# binary changes md5sums on reruns.
- path: results/annotation/test/test_snpEff_VEP.ann.vcf.gz.tbi
md5sum: 4cb176febbc8c26d717a6c6e67b9c905
- path: results/reports/EnsemblVEP/test/test_snpEff_VEP.summary.html
# test_snpEff_VEP.summary.html changes md5sums on reruns.
contains:
[
"<tr><td>Input file</td><td>test_snpEff.ann.vcf.gz</td></tr><tr><td>Output file</td><td>test_snpEff_VEP.ann.vcf</td></tr>",
"General statistics",
"Lines of input read",
"Variants processed",
"Variants filtered out",
"Novel / existing variants",
"Overlapped genes",
"Overlapped transcripts",
"Overlapped regulatory features",
]
- path: results/annotation/test/test_snpEff.ann.vcf.gz
should_exist: false
- path: results/annotation/test/test_snpEff.ann.vcf.gz.tbi
Expand All @@ -55,22 +102,36 @@
- vep
files:
- path: results/annotation/test/test_VEP.ann.vcf.gz
# binary changes md5sums on reruns.
- path: results/annotation/test/test_VEP.ann.vcf.gz.tbi
md5sum: 4cb176febbc8c26d717a6c6e67b9c905
- path: results/annotation/test/test_snpEff.ann.vcf.gz
md5sum: 01f24fdd76f73eefd695beea7b3d3d8e
- path: results/annotation/test/test_snpEff.ann.vcf.gz.tbi
md5sum: 51e418d9be9bb33f1d4123493b15b6c9
- path: results/annotation/test/test_snpEff_VEP.ann.vcf.gz
# binary changes md5sums on reruns.
- path: results/annotation/test/test_snpEff_VEP.ann.vcf.gz.tbi
md5sum: 4cb176febbc8c26d717a6c6e67b9c905
- path: results/reports/EnsemblVEP/test/test_VEP.summary.html
# text-based file changes md5sums on reruns.
- path: results/reports/EnsemblVEP/test/test_snpEff_VEP.summary.html
# text-based file changes md5sums on reruns.
- path: results/reports/snpeff/test/snpEff_summary.html
# text-based file changes md5sums on reruns.
- path: results/reports/snpeff/test/test_snpEff.csv
# text-based file changes md5sums on reruns.
- path: results/reports/snpeff/test/test_snpEff.genes.txt
md5sum: 130536bf0237d7f3f746d32aaa32840a
- name: Run VEP with fasta
command: nextflow run main.nf -profile test,annotation --tools vep --vep_include_fasta --skip_tools multiqc
tags:
- annotation
- vep
files:
- path: results/annotation/test/test_VEP.ann.vcf.gz
# binary changes md5sums on reruns.
- path: results/annotation/test/test_VEP.ann.vcf.gz.tbi
md5sum: 4cb176febbc8c26d717a6c6e67b9c905
- path: results/reports/EnsemblVEP/test/test_VEP.summary.html
# text-based file changes md5sums on reruns.
36 changes: 36 additions & 0 deletions tests/test_bam_remap.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,40 +4,76 @@
- alignment_to_fastq
files:
- path: results/cat/test-1_1.merged.fastq.gz
md5sum: 27b1dd4720d589cda1f33028798e859b
- path: results/cat/test-1_2.merged.fastq.gz
md5sum: 2bbac774fffd1a9df53f9ab2fc2b86ab
- path: results/collate/test-1.mapped_1.fq.gz
md5sum: 992b824d00359782db5240eee42d5f06
- path: results/collate/test-1.mapped_2.fq.gz
md5sum: 118bff0ec11c9cc0427a7db21bdebc9c
- path: results/collate/test-1.mapped_other.fq.gz
md5sum: 709872fc2910431b1e8b7074bfe38c67
- path: results/collate/test-1.mapped_singleton.fq.gz
md5sum: 709872fc2910431b1e8b7074bfe38c67
- path: results/collate/test-1.unmapped_1.fq.gz
md5sum: b79faf89e96948ea52f3ca41bee7de9a
- path: results/collate/test-1.unmapped_2.fq.gz
md5sum: 8e18a94bfd77739e184856ac95d5b26a
- path: results/collate/test-1.unmapped_other.fq.gz
md5sum: 709872fc2910431b1e8b7074bfe38c67
- path: results/collate/test-1.unmapped_singleton.fq.gz
md5sum: 709872fc2910431b1e8b7074bfe38c67
- path: results/csv/markduplicates.csv
md5sum: 0d6120bb99e92f6810343270711ca53e
- path: results/csv/markduplicates_no_table.csv
md5sum: 2a2d3d4842befd4def39156463859ee3
- path: results/csv/recalibrated.csv
md5sum: 42628ec994c16f565e5407b40a9c1ac3
- path: results/multiqc
- path: results/preprocessing/markduplicates/test/test.md.cram
# binary changes md5sums on reruns.
- path: results/preprocessing/markduplicates/test/test.md.cram.crai
# binary changes md5sums on reruns.
- path: results/preprocessing/recal_table/test/test.recal.table
md5sum: 9c0517ffdc5d30a5c73b9f7df1ff3060
- path: results/preprocessing/recalibrated/test/test.recal.cram
# binary changes md5sums on reruns.
- path: results/preprocessing/recalibrated/test/test.recal.cram.crai
# binary changes md5sums on reruns.
- path: results/reports/fastqc/test-1
- path: results/reports/markduplicates/test/test.md.metrics
contains: ["test 0 2820 2 2 0 828 0 0.293617 3807", "1.0 0.999986 1178 1178", "2.0 1.47674 800 800", "100.0 1.911145 0 0"]
- path: results/reports/mosdepth/test/test.md.mosdepth.global.dist.txt
md5sum: 9cb9b181119256ed17a77dcf44d58285
- path: results/reports/mosdepth/test/test.md.mosdepth.region.dist.txt
md5sum: 75e1ce7e55af51f4985fa91654a5ea2d
- path: results/reports/mosdepth/test/test.md.mosdepth.summary.txt
md5sum: dbe376360e437c89190139ef0ae6769a
- path: results/reports/mosdepth/test/test.md.regions.bed.gz
md5sum: d9b53915d473710ff0260a0ff694fd32
- path: results/reports/mosdepth/test/test.md.regions.bed.gz.csi
md5sum: d0713716f63ac573f4a3385733e9a537
- path: results/reports/mosdepth/test/test.recal.mosdepth.global.dist.txt
md5sum: 9cb9b181119256ed17a77dcf44d58285
- path: results/reports/mosdepth/test/test.recal.mosdepth.region.dist.txt
md5sum: 75e1ce7e55af51f4985fa91654a5ea2d
- path: results/reports/mosdepth/test/test.recal.mosdepth.summary.txt
md5sum: dbe376360e437c89190139ef0ae6769a
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz
md5sum: d9b53915d473710ff0260a0ff694fd32
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz.csi
md5sum: d0713716f63ac573f4a3385733e9a537
- path: results/reports/samtools/test/test.md.cram.stats
md5sum: 5201890d36c1dd127b930373b6e823e5
- path: results/reports/samtools/test/test.recal.cram.stats
md5sum: bb2fc6118a1404c45f9e828600df8fb1
- path: results/samtools/test-1.bam
# binary changes md5sums on reruns.
- path: results/samtools/test-1.map_map.bam
md5sum: e1d347ccaec52f690c0313047fecf7e6
- path: results/samtools/test-1.map_unmap.bam
md5sum: 0be5ce27b94e047a1437596a91560982
- path: results/samtools/test-1.unmap_map.bam
md5sum: 53423525e9bf327c60916aded73ba8a6
- path: results/samtools/test-1.unmap_unmap.bam
md5sum: 60a80b7e380e228555b8d90990e1c788

0 comments on commit f84cae4

Please sign in to comment.