Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for presupplied annotation files (FAA + GFF or FAA + GBK) #340

Closed
wants to merge 54 commits into from
Closed
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
e280092
Start modifying samplesheet check (untested)
jfy133 Apr 26, 2023
9fbf52c
Made the samplesheet work if columns are existing (python error if th…
jfy133 May 10, 2023
cf2b8b7
Continue work
jfy133 May 10, 2023
80106d8
Apply suggestions from code review
jfy133 May 24, 2023
1b60b24
Get most of the log working, needs more testing (particularly non FAA…
jfy133 May 24, 2023
348b16c
Merge branch 'dev' into presupplied-orfs
jfy133 May 31, 2023
43ee615
Sync latest dev changes from annotation into workflow
jfy133 May 31, 2023
183b81a
Update all modules to get right container version and also pyRodigal …
jfy133 May 31, 2023
258a49a
Merge branch 'presupplied-orfs' of github.com:nf-core/funcscan into p…
jfy133 May 31, 2023
a7b03a7
Add test nothing config
jfy133 Feb 7, 2024
29670ab
Merge branch 'dev' into presupplied-orfs
jfy133 Feb 7, 2024
ac0a25d
Get back to previous starting point before bad merge removed old changes
jfy133 Feb 7, 2024
da993dd
Merge branch 'dev' into presupplied-orfs
jfy133 Feb 14, 2024
4370106
Refactor - have amp/arg working. Includes better fargene tagging
jfy133 Feb 14, 2024
7eb5bed
Have it working
jfy133 Feb 14, 2024
28a50ee
Add test profile and docs
jfy133 Feb 14, 2024
9caae3e
Include preanntotaed files in one of the CI runs
jfy133 Feb 14, 2024
23e3929
Fix prettier linting
jfy133 Feb 14, 2024
663a7e9
Fix ci command
jfy133 Feb 14, 2024
6177c90
Fix BAKTA to multiqc channel name
jfy133 Feb 14, 2024
bf0f572
Add a preannotated test to BGC workflows
jfy133 Feb 14, 2024
ea555ab
Make preannotated bgc config accesible
jfy133 Feb 14, 2024
80c5b0c
Install newer version of antismash to see if it'll work with the GFF …
jfy133 Feb 14, 2024
106b76e
Use correct dummy files
jfy133 Feb 14, 2024
c25cab1
Add warning about Prokka GBK/GFF
jfy133 Feb 14, 2024
89fdf9a
Merge branch 'dev' into presupplied-orfs
jasmezz Apr 4, 2024
f9f808d
Wrapping my head around it
jasmezz Apr 4, 2024
9b483ac
Excluded GFF support, fixed multiqc report, update variables etc.
jasmezz Apr 5, 2024
55224f1
Merge branch 'dev' into presupplied-orfs
jasmezz Apr 5, 2024
43e77cf
Update usage docs and samplesheet
jasmezz Apr 5, 2024
311f77c
Update modules.config, fix linting, variable typos
jasmezz Apr 5, 2024
2ac179d
Fix variable typos, fix multiqc channel for bakta
jasmezz Apr 8, 2024
212ce0c
Fix linting
jasmezz Apr 8, 2024
15f9fbf
Prefer pyrodigal in tests, add warning when prodigal + antismash are …
jasmezz Apr 8, 2024
a8716f8
Apply suggestions from code review
jasmezz Apr 10, 2024
fd52fee
Apply suggestions from code review, fix linting
jasmezz Apr 10, 2024
4e0a61f
Change feature to gbk, remove gff from docs
jasmezz Apr 10, 2024
de6bd40
Merge branch 'dev' into presupplied-orfs
jasmezz Apr 10, 2024
b5fc8f4
Fix "feature" renaming to "gbk"
jasmezz Apr 10, 2024
95f8fb5
Fix linting
jasmezz Apr 10, 2024
de0a7bf
Merge branch 'dev' into presupplied-orfs
jasmezz Apr 22, 2024
bf43049
Fix variables
jasmezz Apr 22, 2024
0d2ef7c
Fix channels, missing warnin/docs about no splitting for preanno
jfy133 Apr 24, 2024
7ed3594
Use correct GBK channel
jfy133 Apr 24, 2024
619479a
Add log warning when BGC and preannotated input
jfy133 Apr 24, 2024
85c4359
Start trying to fix taxonomy, not working yet as MMSEQS_TAXONOMYDB no…
jfy133 May 6, 2024
b11bed1
Add more GBK/GBFF updates
jfy133 May 15, 2024
800dff9
Remove dumps
jfy133 May 15, 2024
58787f6
Merge branch 'dev' into presupplied-orfs
jfy133 May 15, 2024
50aa076
Only do splitting when BGC workflow executed
jfy133 May 22, 2024
c419ce9
Fix taxonomy workflow from possibly getting async between two input c…
jfy133 May 22, 2024
d1d0177
Fix prokka annotation MQC collection
jfy133 May 22, 2024
8f1c7ba
Fix linting
jfy133 May 22, 2024
2d8b238
Make it so deepBGC actually produces otutput, and START send only lon…
jfy133 May 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
30 changes: 15 additions & 15 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ concurrency:

jobs:
test:
name: Run pipeline with test data (AMP and ARG workflows)
name: Run pipeline with test data (AMP/ARG)
# Only run on push if this is the nf-core dev branch (merged PRs)
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/funcscan') }}"
runs-on: ubuntu-latest
Expand All @@ -27,9 +27,9 @@ jobs:
- "23.04.0"
- "latest-everything"
parameters:
- "--annotation_tool prodigal"
- "--annotation_tool prokka"
- "--annotation_tool bakta --annotation_bakta_db_downloadtype light --arg_skip_deeparg --arg_skip_amrfinderplus" # Skip deeparg and amrfinderplus due to otherwise running out of space on GitHub Actions
- "-profile docker,test_preannotated --annotation_tool prodigal"
- "-profile docker,test --annotation_tool prokka"
- "-profile docker,test --annotation_tool bakta --annotation_bakta_db_downloadtype light --arg_skip_deeparg --arg_skip_amrfinderplus" # Skip deeparg and amrfinderplus due to otherwise running out of space on GitHub Actions

steps:
- name: Check out pipeline code
Expand All @@ -45,10 +45,10 @@ jobs:

- name: Run pipeline with test data (AMP and ARG workflows)
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results ${{ matrix.parameters }}
nextflow run ${GITHUB_WORKSPACE} ${{ matrix.parameters }} --outdir ./results

test_bgc:
name: Run pipeline with test data (BGC workflow)
name: Run pipeline with test data (BGC)
# Only run on push if this is the nf-core dev branch (merged PRs)
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/funcscan') }}"
runs-on: ubuntu-latest
Expand All @@ -58,9 +58,9 @@ jobs:
- "23.04.0"
- "latest-everything"
parameters:
- "--annotation_tool prodigal"
- "--annotation_tool prokka"
- "--annotation_tool bakta --annotation_bakta_db_downloadtype light"
- "-profile docker,test_preannotated_bgc --annotation_tool prodigal"
- "-profile docker,test_bgc --annotation_tool prokka"
- "-profile docker,test_bgc --annotation_tool bakta --annotation_bakta_db_downloadtype light"

steps:
- name: Check out pipeline code
Expand All @@ -76,10 +76,10 @@ jobs:

- name: Run pipeline with test data (BGC workflow)
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_bgc,docker --outdir ./results ${{ matrix.parameters }} --bgc_skip_deepbgc
nextflow run ${GITHUB_WORKSPACE} ${{ matrix.parameters }} --outdir ./results --bgc_skip_deepbgc

test_taxonomy:
name: Run pipeline with test data (AMP, ARG and BGC taxonomy workflows)
name: Run pipeline with test data (AMP, ARG and BGC taxonomy)
# Only run on push if this is the nf-core dev branch (merged PRs)
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/funcscan') }}"
runs-on: ubuntu-latest
Expand All @@ -89,9 +89,9 @@ jobs:
- "23.04.0"
- "latest-everything"
parameters:
- "--annotation_tool prodigal"
- "--annotation_tool prokka"
- "--annotation_tool bakta --annotation_bakta_db_downloadtype light"
- "-profile docker,test_taxonomy --annotation_tool prodigal" # TODO: Add test_taxonomy_preannotated.config
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that in yet?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, need to pull dev

- "-profile docker,test_taxonomy --annotation_tool prokka"
- "-profile docker,test_taxonomy --annotation_tool bakta --annotation_bakta_db_downloadtype light"

steps:
- name: Check out pipeline code
Expand All @@ -107,4 +107,4 @@ jobs:

- name: Run pipeline with test data (AMP, ARG and BGC taxonomy workflows)
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_taxonomy,docker --outdir ./results ${{ matrix.parameters }}
nextflow run ${GITHUB_WORKSPACE} ${{ matrix.parameters }} --outdir ./results
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#332](https://github.com/nf-core/funcscan/pull/332) & [#327](https://github.com/nf-core/funcscan/pull/327) Merged pipeline template of nf-core/tools version 2.12.1 (by @jfy133, @jasmezz)
- [#338](https://github.com/nf-core/funcscan/pull/338) Set `--meta` parameter to default for Bakta, with singlemode optional. (by @jasmezz)
- [#343](https://github.com/nf-core/funcscan/pull/343) Added contig taxonomic classification using [MMseqs2](https://github.com/soedinglab/MMseqs2/). (by @darcy220606)
- [#340](https://github.com/nf-core/funcscan/pull/340) Added support for supplying pre-annotated sequences to the pipeline. (by @jfy133, @jasmezz)

### `Fixed`

Expand Down
8 changes: 8 additions & 0 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,14 @@ report_section_order:
"nf-core-funcscan-summary":
order: -1002

run_modules:
- prokka
- custom_content

table_columns_visible:
Prokka:
organism: False

export_plots: true

disable_version_detection: true
Expand Down
7 changes: 4 additions & 3 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
sample,fasta
sample_1,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_1.fasta.gz
sample_2,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_2.fasta.gz
sample,fasta,protein,feature
sample_1,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_1.fasta.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_1.faa,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_1.gbk
sample_2,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_2.fasta.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_2.faa.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_2.gbk.gz
sample_3,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs.fasta
22 changes: 20 additions & 2 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,27 @@
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.(fasta|fas|fa|fna)(\\.gz)?$",
"errorMessage": "Fasta file for reads must be provided, cannot contain spaces and must have extension '.fasta', '.fas', '.fa' or '.fna' (any of these can be optionally compressed as '.gz')",
"pattern": "^\\S+\\.(fasta|fas|fna|fa)(\\.gz)?$",
"errorMessage": "Fasta file for reads must be provided, cannot contain spaces and must have extension '.fa.gz', '.fna.gz' or '.fasta.gz'",
"unique": true
},
"protein": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.(faa)(\\.gz)?$",
"errorMessage": "Input file for peptide annotations has incorrect file format. File must end in .fasta, .faa",
"unique": true,
"dependentRequired": ["feature"]
},
"feature": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.g(bk|ff)(\\.gz)?$",
jasmezz marked this conversation as resolved.
Show resolved Hide resolved
"errorMessage": "Input file for feature annotations has incorrect file format. File must end in .gbk or .gff",
jasmezz marked this conversation as resolved.
Show resolved Hide resolved
"unique": true,
"dependentRequired": ["protein"]
}
},
"required": ["sample", "fasta"]
Expand Down
29 changes: 9 additions & 20 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ process {
}

withName: PROKKA {
ext.prefix = { "${meta.id}_prokka" } // to prevent pigz symlink problems of input files if already uncompressed during post-annotation gzipping
publishDir = [
path: { "${params.outdir}/annotation/prokka/" },
mode: params.publish_dir_mode,
Expand All @@ -113,7 +114,7 @@ process {
params.annotation_prokka_rawproduct ? '--rawproduct' : '',
params.annotation_prokka_rnammer ? '--rnammer' : '',
params.annotation_prokka_compliant ? '--compliant' : '',
params.annotation_prokka_addgenes ? '--addgenes' : ''
params.annotation_prokka_addgenes ? '--addgenes' : '',
].join(' ').trim()
}

Expand All @@ -130,6 +131,7 @@ process {
}

withName: BAKTA_BAKTA {
ext.prefix = { "${meta.id}_bakta" } // to prevent pigz symlink problems of input files if already uncompressed during post-annotation gzipping
publishDir = [
path: { "${params.outdir}/annotation/bakta" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -159,28 +161,13 @@ process {
].join(' ').trim()
}

withName: PRODIGAL_GFF {
withName: PRODIGAL {
ext.prefix = { "${meta.id}_prodigal" } // to prevent pigz symlink problems of input files if already uncompressed during post-annotation gzipping
publishDir = [
path: { "${params.outdir}/annotation/prodigal/${meta.id}" },
mode: params.publish_dir_mode,
enabled: params.save_annotations,
pattern: "*.{faa,fna,gff}.gz",
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
ext.args = [
params.annotation_prodigal_singlemode ? "-p single" : "-p meta",
params.annotation_prodigal_closed ? "-c" : "",
params.annotation_prodigal_forcenonsd ? "-n" : "",
"-g ${params.annotation_prodigal_transtable}"
].join(' ').trim()
}

withName: PRODIGAL_GBK {
publishDir = [
path: { "${params.outdir}/annotation/prodigal/${meta.id}" },
mode: params.publish_dir_mode,
enabled: params.save_annotations,
pattern: "*.gbk.gz",
pattern: "*.{faa,fna,gbk,faa.gz,faa.gz,fna.gz,gbk.gz}",
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
ext.args = [
Expand All @@ -192,11 +179,12 @@ process {
}

withName: PYRODIGAL {
ext.prefix = { "${meta.id}_pyrodigal" } // to prevent pigz symlink problems of input files if already uncompressed during post-annotation gzipping
publishDir = [
path: { "${params.outdir}/annotation/pyrodigal/${meta.id}" },
mode: params.publish_dir_mode,
enabled: params.save_annotations,
pattern: "*.{faa,fna,gff,score}.gz",
pattern: "*.{faa,fna,gbk,score}.gz",
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
ext.args = [
Expand Down Expand Up @@ -270,6 +258,7 @@ process {
}

withName: FARGENE {
tag = {"${meta.id}|${hmm_model}"}
publishDir = [
[
path: { "${params.outdir}/arg/fargene/${meta.id}" },
Expand Down
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ params {
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/samplesheet_reduced.csv'
amp_hmmsearch_models = 'https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/hmms/mybacteriocin.hmm'

annotation_tool = 'prodigal'
annotation_tool = 'pyrodigal'

run_arg_screening = true
arg_fargene_hmmmodel = 'class_a,class_b_1_2'
Expand Down
2 changes: 1 addition & 1 deletion conf/test_bgc.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ params {
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/samplesheet_reduced.csv'
bgc_hmmsearch_models = 'https://raw.githubusercontent.com/antismash/antismash/fd61de057e082fbf071732ac64b8b2e8883de32f/antismash/detection/hmm_detection/data/ToyB.hmm'

annotation_tool = 'prodigal'
annotation_tool = 'pyrodigal'

run_arg_screening = false
run_amp_screening = false
Expand Down
27 changes: 23 additions & 4 deletions conf/test_nothing.config
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,8 @@
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Although in this case we turn everything off

Use as follows:
nextflow run nf-core/funcscan -profile test,<docker/singularity> --outdir <OUTDIR>
nextflow run nf-core/funcscan -profile test_nothing,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/
Expand All @@ -24,10 +22,31 @@ params {
// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/samplesheet_reduced.csv'
amp_hmmsearch_models = 'https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/hmms/mybacteriocin.hmm'
bgc_hmmsearch_models = 'https://raw.githubusercontent.com/antismash/antismash/fd61de057e082fbf071732ac64b8b2e8883de32f/antismash/detection/hmm_detection/data/ToyB.hmm'

annotation_tool = 'prodigal'
annotation_tool = 'pyrodigal'

run_arg_screening = false
run_amp_screening = false
run_bgc_screening = false

arg_fargene_hmmmodel = 'class_a,class_b_1_2'

amp_skip_amplify = true
amp_skip_macrel = true
amp_skip_ampir = true
amp_skip_hmmsearch = true

arg_skip_deeparg = true
arg_skip_fargene = true
arg_skip_rgi = true
arg_skip_amrfinderplus = true
arg_skip_deeparg = true
arg_skip_abricate = true

bgc_skip_antismash = true
bgc_skip_deepbgc = true
bgc_skip_gecco = true
bgc_skip_hmmsearch = true

}
32 changes: 32 additions & 0 deletions conf/test_preannotated.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/funcscan -profile test,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

params {
config_profile_name = 'Test profile - preannotated input'
config_profile_description = 'Minimal test dataset to check pipeline function'

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_time = '6.h'

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/samplesheet_preannotated.csv'
amp_hmmsearch_models = 'https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/hmms/mybacteriocin.hmm'

annotation_tool = 'pyrodigal'

run_arg_screening = true
arg_fargene_hmmmodel = 'class_a,class_b_1_2'

run_amp_screening = true
}
31 changes: 31 additions & 0 deletions conf/test_preannotated_bgc.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/funcscan -profile test_bgc,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

params {
config_profile_name = 'BGC test profile - preannotated input BGC'
config_profile_description = 'Minimal test dataset to check BGC workflow function'

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_time = '6.h'

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/samplesheet_preannotated.csv'
bgc_hmmsearch_models = 'https://raw.githubusercontent.com/antismash/antismash/fd61de057e082fbf071732ac64b8b2e8883de32f/antismash/detection/hmm_detection/data/ToyB.hmm'

annotation_tool = 'pyrodigal'

run_arg_screening = false
run_amp_screening = false
run_bgc_screening = true
}
2 changes: 1 addition & 1 deletion conf/test_taxonomy.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ params {
amp_hmmsearch_models = 'https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/hmms/mybacteriocin.hmm'

run_taxa_classification = true
annotation_tool = 'prodigal'
annotation_tool = 'pyrodigal'

run_arg_screening = true
arg_skip_deeparg = true
Expand Down