Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inclusivity test #26

Closed
wants to merge 214 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
214 commits
Select commit Hold shift + click to select a range
e31ff7b
Add .circleci/config.yml
afishman Sep 8, 2020
d2a9e8a
Update Readme.md
afishman Sep 8, 2020
5a0df0d
Merge pull request #1 from APHA-CSU/circleci-project-setup
afishman Sep 11, 2020
efed897
docker build job
afishman Sep 11, 2020
19f498f
disabling docker layer caching
afishman Sep 11, 2020
3cca391
cleaning up circleci config
afishman Sep 11, 2020
460c3f7
updating readme
afishman Sep 11, 2020
a29d485
testing error in dockerfile
afishman Sep 11, 2020
db551c9
touching dockerfile
afishman Sep 11, 2020
a9c552b
removing error line in Dockerfile
afishman Sep 11, 2020
1511ea9
updating Readme.md
afishman Sep 11, 2020
9038316
Merge pull request #2 from APHA-CSU/docker-build
afishman Sep 11, 2020
b037fad
adding .gitignore
afishman Sep 14, 2020
7104833
updating .gitignore
afishman Sep 14, 2020
d23a505
setting Kraken paths in Dockerfile
afishman Sep 14, 2020
eb00baf
biotools path
afishman Sep 15, 2020
f72befe
embedding Install_dependancies.sh into Dockerfile
afishman Sep 15, 2020
da579b4
including kraken2 db
afishman Sep 15, 2020
73b1c44
upgrading to jdk11, fixing Kraken2db path
Sep 16, 2020
178a735
Dockerfile CMD nextflow run
afishman Sep 16, 2020
b11185d
putting WGSCluster and BovPos cvs in outdir
afishman Sep 16, 2020
0b4f991
assert first csv row py
afishman Sep 16, 2020
f472379
minimal test data
afishman Sep 16, 2020
3c08d3e
run-pipeline-docker.sh
afishman Sep 16, 2020
cdad748
print todays results
afishman Sep 16, 2020
47dd28b
renaming minimal read pair
afishman Sep 16, 2020
bf95393
validation
afishman Sep 16, 2020
997b909
fixing config.yml
afishman Sep 16, 2020
0dd4ba4
CIRCLE_BRANCH tag
afishman Sep 16, 2020
6f4a771
setting wd on job
afishman Sep 17, 2020
87c1168
comments
afishman Sep 17, 2020
a72dc52
test script as circleci replacement
afishman Sep 17, 2020
3489308
job script
afishman Sep 17, 2020
f632a63
fixing integration tests
afishman Sep 17, 2020
7a32109
Add extra fork for VarCall process as this is the most time consuming
ellisrichardj Sep 17, 2020
ad0ae80
adding .gitignore
afishman Sep 14, 2020
4ac4075
updating .gitignore
afishman Sep 14, 2020
f8c538c
setting Kraken paths in Dockerfile
afishman Sep 14, 2020
d0de697
biotools path
afishman Sep 15, 2020
c59161f
embedding Install_dependancies.sh into Dockerfile
afishman Sep 15, 2020
18612c7
including kraken2 db
afishman Sep 15, 2020
18d2ccb
upgrading to jdk11, fixing Kraken2db path
Sep 16, 2020
e0adced
Dockerfile CMD nextflow run
afishman Sep 16, 2020
6568f35
putting WGSCluster and BovPos cvs in outdir
afishman Sep 16, 2020
ff754f6
assert first csv row py
afishman Sep 16, 2020
77d1c46
minimal test data
afishman Sep 16, 2020
cc4a5a0
run-pipeline-docker.sh
afishman Sep 16, 2020
a041d10
print todays results
afishman Sep 16, 2020
acba99a
renaming minimal read pair
afishman Sep 16, 2020
965425c
validation
afishman Sep 16, 2020
12d7e21
fixing config.yml
afishman Sep 16, 2020
05b69e4
CIRCLE_BRANCH tag
afishman Sep 16, 2020
1365503
setting wd on job
afishman Sep 17, 2020
a6f8e70
comments
afishman Sep 17, 2020
83621da
test script as circleci replacement
afishman Sep 17, 2020
23ef9cb
job script
afishman Sep 17, 2020
b76bdc8
fixing integration tests
afishman Sep 17, 2020
c875542
Update Readme.md
afishman Sep 17, 2020
e7b4768
Update Readme.md
afishman Sep 17, 2020
3a8d6e9
Update bov-tb
afishman Sep 17, 2020
94215eb
restrict memory for AssignClusterCSS process
ellisrichardj Sep 17, 2020
39b2fa4
Filter vcf to only descriminatory positions for cluster assignment
ellisrichardj Sep 17, 2020
3522167
Remove dependancy location
ellisrichardj Sep 17, 2020
6606714
Corrected poor spelling
ellisrichardj Sep 17, 2020
1d15c06
Merge branch 'Restrict-Memory' of https://github.com/APHA-CSU/BovTB-n…
ellisrichardj Sep 17, 2020
9a21e88
adding second kraken2db
afishman Sep 18, 2020
6b75cb9
circleCI build artifacts + DRYing up minimal-test-job
afishman Sep 18, 2020
b129db1
Merge branch 'master' into Restrict-Memory to fully update
ellisrichardj Sep 18, 2020
70d40d6
upgrading to Bracken 2.6.0
afishman Sep 18, 2020
92f7855
memory map on kraken2
afishman Sep 18, 2020
8a56ad5
circleci syntax
afishman Sep 18, 2020
3111a0d
lowmem --memory-map during minimal-test
afishman Sep 18, 2020
9670ec6
Dockerfile: vim, nano and updating kraken2dbs
afishman Sep 18, 2020
4c3af3c
sp
afishman Sep 18, 2020
828ea6f
fixing merge conflicts
afishman Sep 18, 2020
16540c1
Merge pull request #4 from APHA-CSU/Restrict-Memory
ellisrichardj Sep 18, 2020
ada73b2
Merge branch 'master' into kraken2-testdb
afishman Sep 18, 2020
15f9e37
lowmem option bug
afishman Sep 21, 2020
0baa53c
inclusivity test
afishman Sep 21, 2020
4cf2f90
Merge pull request #5 from APHA-CSU/IDnonbovis-lowmem
afishman Sep 21, 2020
657122e
adding bc to Dockerfile
afishman Sep 22, 2020
718066b
set -e on ReadStats.sh
afishman Sep 22, 2020
4ae644e
testing for B6-16
afishman Sep 22, 2020
d9921ca
obliterate fastq
afishman Sep 23, 2020
af94ebb
updating .gitignore
afishman Sep 23, 2020
39ac254
adding work to gitignore
afishman Sep 23, 2020
3393043
obliterate the fastq quality
afishman Sep 23, 2020
ce23d80
adding bc to Dockerfile
afishman Sep 23, 2020
b819893
adding Results to .gitignore
afishman Sep 24, 2020
2ecb9fb
moving Dockerfile nextflow settings into nextflow.config@
afishman Sep 24, 2020
6fb193d
ignoring Braken errors
afishman Sep 24, 2020
f01081b
set uniform fastq quality
afishman Sep 24, 2020
9d6c025
quality test
afishman Sep 24, 2020
78c9fb3
quality test
afishman Sep 24, 2020
952ae6c
downloading fastq files for quality data
afishman Sep 24, 2020
0074894
quality test working
afishman Sep 24, 2020
ceace9b
fixing minimal pipeline
afishman Sep 24, 2020
bbeb018
adding quality test to local test runner
afishman Sep 24, 2020
817751a
adding circleci job for quality
afishman Sep 25, 2020
2c4037b
renaming files
afishman Sep 25, 2020
55b6631
tinyreads bug
afishman Sep 25, 2020
4ec3010
cleaning up
afishman Sep 25, 2020
24aa2eb
Update Readme.md
afishman Sep 25, 2020
e8b6b92
Update Readme.md
afishman Sep 25, 2020
c71501f
Update tinyreads.bash
afishman Sep 25, 2020
7e9464e
Update set_uniform_fastq_quality.py
afishman Sep 25, 2020
1ae8739
Update tinyreads.bash
afishman Sep 25, 2020
9232db7
Update Readme.md
afishman Sep 25, 2020
c639d59
Update Readme.md
afishman Sep 25, 2020
ea36a78
Merge pull request #6 from APHA-CSU/quality-test
afishman Sep 25, 2020
2d0c216
input arguments to quality
afishman Sep 28, 2020
35c0628
cleaning up tests
afishman Sep 28, 2020
e0b6eda
circleci makes the dirs
afishman Sep 28, 2020
0836cf7
setting the name
afishman Sep 28, 2020
e7eee3b
reducing quality code
afishman Sep 28, 2020
dd69591
nargs on filepath
afishman Sep 28, 2020
0b766ed
slightly neater num reads
afishman Sep 28, 2020
188f76e
setting up aliases
afishman Sep 28, 2020
be3b860
comments
afishman Sep 28, 2020
6c8d1de
print todays cluster alias
afishman Sep 28, 2020
bbc667a
parameterising config.yml
afishman Sep 28, 2020
0519a4a
low and adequate cases
afishman Sep 28, 2020
e471f9c
deleting old install files
afishman Sep 28, 2020
4f78dbb
debugging cleanup
afishman Sep 28, 2020
fa24d60
toadys WGS cluster bug
afishman Sep 28, 2020
ef115e5
fixing merge conflicts
afishman Sep 28, 2020
080160e
nextflow test
afishman Sep 28, 2020
a427127
moving Dockerfile into docker/
afishman Sep 28, 2020
48a3983
updating config.yml to new Dockerfile
afishman Sep 28, 2020
fb1dde8
referencing Dockerfile exactly
afishman Sep 28, 2020
8f46846
Merge pull request #7 from APHA-CSU/cleanup
afishman Sep 29, 2020
9aa9a9c
Include flag for low quality sequence data
ellisrichardj Oct 5, 2020
8f3bf1e
Add explanation comment
ellisrichardj Oct 5, 2020
18a5e94
alter order of elifs
ellisrichardj Oct 5, 2020
f22fe7d
Explicitily call bash and use double square brackets for test
ellisrichardj Oct 7, 2020
fa5ddc3
Change expected outcome for tiny reads test
ellisrichardj Oct 7, 2020
bc8a32f
Corrected comment
ellisrichardj Oct 7, 2020
df19c13
Merge pull request #8 from APHA-CSU/LowQualFlag
ellisrichardj Oct 7, 2020
a32efea
updating reads links
afishman Oct 7, 2020
82cd11c
updating links for qualitytest
afishman Oct 7, 2020
aab829c
Merge pull request #9 from APHA-CSU/qualitytest-update
afishman Oct 8, 2020
a120eeb
renaming bov-tb to run-pipeline
afishman Oct 8, 2020
0d3f24b
lod test
afishman Oct 8, 2020
f23db82
additional lod files
afishman Oct 8, 2020
3b5e175
lod
afishman Oct 8, 2020
79c080e
bumping up lod level
afishman Oct 9, 2020
529aa44
Update Readme.md
afishman Oct 9, 2020
d88ed06
lod
afishman Oct 9, 2020
b536bc1
Merge branch 'lod' of https://github.com/APHA-CSU/BovTB-nf into lod
afishman Oct 9, 2020
47a4374
Merge pull request #10 from APHA-CSU/lod
afishman Oct 16, 2020
9015c0c
Merge branch 'master' into inclusivity-test
afishman Oct 16, 2020
1656d86
single inclusivity case
afishman Oct 16, 2020
9c39c39
inclusivity test
afishman Oct 20, 2020
3f4023a
splitting circleci tests in two
afishman Oct 20, 2020
2d0a13f
lowering the number of inclusivity tests
afishman Dec 2, 2020
30e4cfc
touch
afishman Dec 2, 2020
cc72630
deploy function
afishman Dec 2, 2020
2313521
DRYing up
afishman Dec 2, 2020
7d6b2ce
DRYing up with anchors
afishman Dec 2, 2020
da6c9a3
Merge pull request #12 from APHA-CSU/deploy
afishman Dec 7, 2020
e86bbf4
using submitted ftp instead of the other one
afishman Dec 7, 2020
669e25f
Merge branch 'master' into inclusivity-test
afishman Dec 7, 2020
e839832
deduplicate in its own process
afishman Dec 7, 2020
a613a01
removing echo
afishman Dec 7, 2020
e5d8705
BASHing deduplicate
afishman Dec 7, 2020
63f8ea4
varCall.bash
afishman Dec 7, 2020
b072587
execute all the things
afishman Dec 7, 2020
2eb4e97
trim.bash
afishman Dec 7, 2020
f807417
map2Ref
afishman Dec 7, 2020
2c1829c
mask.bash
afishman Dec 7, 2020
7d9a5d2
ENA connection limit takin' my parallelism
afishman Dec 7, 2020
d393f76
maybe linear will make ENA happy
afishman Dec 8, 2020
0dec6b6
read stats and dockerfile bug
afishman Dec 8, 2020
181e8be
VCF2Consensus
afishman Dec 8, 2020
42adb60
readStats
afishman Dec 8, 2020
2ddfd86
assign cluster
afishman Dec 8, 2020
0d0f556
idNonBovis
afishman Dec 8, 2020
43d714d
pc_aft_trim changed to proportion of uniq reads
ellisrichardj Dec 8, 2020
46de200
pipe grep through cat to avoid notfound error, fixes #11
ellisrichardj Dec 8, 2020
ad1ebfc
Merge pull request #13 from APHA-CSU/Thresholds
ellisrichardj Dec 15, 2020
6b2f3f5
adding sra-toolkit to dockerfile@
afishman Dec 16, 2020
3250494
prefetch
afishman Dec 16, 2020
b97afe1
removing broken install code
afishman Dec 16, 2020
21b7e41
running ln separately
afishman Dec 16, 2020
b903eed
using sra toolkit for inclusivity download cases
afishman Dec 16, 2020
4c7493b
Merge branch 'master' into inclusivity-test
afishman Dec 16, 2020
40ab862
depends on tinyreads
afishman Dec 16, 2020
3bd27a8
Merge pull request #14 from APHA-CSU/inclusivity-test
afishman Jan 5, 2021
4438e35
fixing merge conflicts
afishman Jan 5, 2021
c663b10
bin it.
afishman Jan 8, 2021
b6b14dd
updating Dockerfile to make everything in bin executable@
afishman Jan 8, 2021
fdaa36f
Merge pull request #15 from APHA-CSU/processes
afishman Jan 11, 2021
df22830
python unit tests
afishman Jan 11, 2021
d71ae4d
updating config.yml to include unit tests
afishman Jan 11, 2021
bcecc5e
deduplicate test job
afishman Jan 11, 2021
add50e8
Change method for counting reads, fixes #16
ellisrichardj Jan 11, 2021
7d4f6de
prevent no zero coverage positions error
ellisrichardj Jan 11, 2021
3862ace
removing depend_path arg
afishman Jan 12, 2021
e57c9a7
comment
afishman Jan 12, 2021
fc2f205
Merge pull request #17 from APHA-CSU/FixReadCount
ellisrichardj Jan 12, 2021
0f7ae25
Merge pull request #18 from APHA-CSU/unit_tests
afishman Jan 13, 2021
c313d16
FastUniq on linux path
afishman Jan 13, 2021
2e50729
trimmomatic on path
afishman Jan 13, 2021
4cb9aee
bwa mem on linux path
afishman Jan 13, 2021
26247dd
docker variable
afishman Jan 13, 2021
7765fb8
dropping depend_path for map2ref
afishman Jan 13, 2021
15bd31e
dropping depend_path for varCall.bash
afishman Jan 13, 2021
67d2a75
dropping depend_path for varCall.bash
afishman Jan 13, 2021
c3bbbd9
bug with the trim arg
afishman Jan 13, 2021
ccb8a9f
dropping depend_path for mask.bash
afishman Jan 13, 2021
3b44fe1
dropping depend_path for idnonbovis
afishman Jan 13, 2021
9ec403d
dropping depend_path altogether
afishman Jan 13, 2021
7d21b62
Merge pull request #19 from APHA-CSU/drop_depend_path
afishman Jan 13, 2021
a350b69
Corrected accession column and removed duplication
ellisrichardj Jan 14, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 142 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
version: 2.1

commands:
# Run tests under BovTB-nf/tests/jobs/ and store artifacts
run-test:
parameters:
script:
description: This test calls 'bash -e tests/jobs/<< script >>'
type: string

steps:
- run: |
cd /BovTB-nf/
mkdir /reads/ /results/ /artifacts/
bash -e tests/jobs/<< parameters.script >>

- store_artifacts:
path: /artifacts/

build-and-push:
parameters:
tag:
description: Tag to push built docker image to
type: string

steps:
- checkout
- setup_remote_docker
- run: |
TAG=<< parameters.tag >>
docker build -t aaronsfishman/bov-tb:$TAG -f ./docker/Dockerfile .
echo $DOCKER_PASS | docker login -u $DOCKER_USER --password-stdin
docker push aaronsfishman/bov-tb:$TAG

# Run tests under BovTB-nf/tests/jobs/ and store artifacts
jobs:
# Docker image containing the nextflow pipeline $CIRCLE_BRANCH
build:
docker:
# Circleci base ubuntu image
- image: &build_img cimg/base:2020.01

steps:
- build-and-push:
tag: $CIRCLE_BRANCH

deploy:
docker:
# Circleci base ubuntu image
- image: *build_img

steps:
- build-and-push:
tag: latest

unittests:
executor: nf-pipeline
steps:
- run: |
cd /BovTB-nf/
python tests/jobs/unit_tests.py

tinyreads:
executor: nf-pipeline
steps:
- run-test:
script: "tinyreads.bash"

quality:
executor: nf-pipeline
parameters:
case:
type: string
steps:
- run-test:
script: "quality.bash << parameters.case >>"

lod:
executor: nf-pipeline
parameters:
case:
type: string
steps:
- run-test:
script: "lod.bash << parameters.case >>"

inclusivity:
executor: nf-pipeline
parameters:
case:
type: string
steps:
- run-test:
script: "inclusivity.bash << parameters.case >>"

# Run a job on the current branch's Dockerfile
executors:
nf-pipeline:
docker:
- image: "aaronsfishman/bov-tb:$CIRCLE_BRANCH"

# Orchestrates the validation tests 
workflows:
validation:
jobs:
- build

- deploy:
requires:
- build
filters:
branches:
only: master

- unittests:
requires:
- build

- tinyreads:
requires:
- build

- quality:
requires:
- tinyreads
matrix:
parameters:
case: ["low", "adequate"]

- lod:
requires:
- tinyreads
matrix:
parameters:
case: ["0", "1", "2", "3"]

- inclusivity:
requires:
- tinyreads
matrix:
parameters:
case: ["0", "1", "2", "3"]
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.nextflow.log*
.nextflow/
.DS_Store
temp/
work/
Results*
61 changes: 0 additions & 61 deletions Dockerfile

This file was deleted.

78 changes: 0 additions & 78 deletions Install_dependancies.sh

This file was deleted.

57 changes: 37 additions & 20 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# **BovTB-nf**

------------
[![APHA-CSU](https://circleci.com/gh/APHA-CSU/BovTB-nf.svg?style=svg)](https://app.circleci.com/pipelines/github/APHA-CSU)

This is the updated pipeline for APHA's processing of *Mycobacterium bovis* WGS data. BovTB-nf is designed to process a batch (1 or more samples) of paired-end fastq files generated on an Illumina sequencer. It will first remove duplicate reads from the dataset (FastUniq) and then trim the unique reads based on base-call quality and the presence of adapters (Trimmomatic). Reads are then mapped to the *M. bovis* AF2122 reference genome and variants called (bwa/samtools/bcftools).

It has been built to run using nextflow, using standard bioinformatic tools for the most part. The external dependancies are:
It has been built to run using [nextflow](https://www.nextflow.io/docs/latest/getstarted.html), using standard bioinformatic tools for the most part. The external dependancies are:
- FastUniq
- Trimmomatic
- bwa
Expand All @@ -13,34 +13,51 @@ It has been built to run using nextflow, using standard bioinformatic tools for
- Kraken2 (and database)
- Bracken

## Installation

Of course Nextflow itself is a prerequisite and should be installed as described in the [Nextflow Documentation](https://www.nextflow.io/docs/latest/getstarted.html)
## Run pipeline in Docker

If you have the dependancies installed the pipeline can run by simply typing:
We recommend running the pipeline with our docker image.

nextflow run ellisrichardj/BovTB-nf
To pull the latest image (if it's not already fetched) and run the nextflow container on data:
```
./bov-tb /PATH/TO/READS/ /PATH/TO/OUTPUT/RESULTS/
```

Alternatively, clone the repository:
The `/PATH/TO/READS/` directory should contain fastq files named with the `*_{S*_R1,S*_R2}*.fastq.gz` pattern. For example, a directory with `bovis_S1_R1.fastq.gz` and `bovis_S1_R2.fastq.gz` contains a single pair of reads

git clone https://github.com/ellisrichardj/BovTB-nf.git

If required, there is simple script for installing the dependancies (helpfully called Install_dependancies.sh), which will also update the nextflow config file with their locations.
### Build image from source

# Docker build
You can also build and run the image directly from source
```
docker build /PATH/TO/REPO/ -t my-bov-tb
./bov-tb /PATH/TO/READS/ /PATH/TO/OUTPUT/RESULTS/ my-bov-tb
```

Alternatively, the pipeline can run in an ubuntu image on docker.
To build the image:
`docker build /PATH/TO/REPO/ -t bov-tb`

Run a docker container in bash:
`docker run --rm -it bov-tb`
## Validation

-------------
The pipeline is validated against real-world biological samples sequenced with Illumina NextSeq machines at APHA. The test code for the validation tests is stored under `tests/jobs/`. A summary of each test is described below

## Examples

In its simplest form just run the Nextflow process from the directory containing the fastq files:
### Quality Test

The quality test ensures that low quality reads (<20) are not considered for variant calling and genotyping. This is performed by setting uniform quality values to a real-world *M. bovis* sample and asserting output. Low quality bases are removed from the sequence using `Trimmomatic`, which uses a sliding window that deletes reads when the average base quality drops below 20. A table of expected results is shown below.

| Base Quality | Outcome | flag | group |
| ------------- | ------------- | ------------- | ------------- |
| 19 | CheckRequired | LowCoverage | NA |
| 20 | Pass | BritishbTB | B6-16 |

### Limit of Detection (LoD)

The limit of detection test ensures mixtures of M. Avium and M. Bovis at varying proportions give the correct Outcome. This is performed by taking random reads from reference samples of M. Bovis and M. Avium.


| M. Bovis (%) | M. Avium (%) | Outcome |
| ------------- | ------------- | ------------- |
| 100% | 0% | Pass |
| 65% | 35% | BritishbTB |
| 60% | 40% | CheckRequired |
| 0% | 100% | Comtaminated |

cd /path/to/Data
nextflow run BovTB-nf
Loading