Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add module bcftools/norm to sarek (in progress) #1483

Open
wants to merge 30 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c081571
Rename cloudtest back to awstest
adamrtalbot Jan 17, 2024
79fd632
Change logic of cloud tests workflow
adamrtalbot Jan 17, 2024
7c81a4a
fixup
adamrtalbot Jan 17, 2024
5b2cad0
Add repo protection back in
adamrtalbot Jan 17, 2024
e2778aa
[automated] Fix linting with Prettier
nf-core-bot Jan 17, 2024
0420159
Change logic of if statement in cloud tests
adamrtalbot Jan 17, 2024
3347823
Trying contains statement
adamrtalbot Jan 17, 2024
2aff87c
Add comments|
adamrtalbot Jan 17, 2024
a328f6d
CHANGELOG
adamrtalbot Jan 17, 2024
7843389
feat(nf-prov): pin nf-prov to 1.2.2
maxulysse Apr 24, 2024
4772da1
First modification to contribute to the bcftools/norm module in Sarek
JC-Delmas Apr 25, 2024
451aaec
Changes in the GERMLINE_VCFS_NORM process
JC-Delmas Apr 25, 2024
d97726b
Add fasta argument to POST_VARIANTCALLING process.
JC-Delmas Apr 25, 2024
e034ff0
add fasta input as argument
JC-Delmas Apr 25, 2024
8469832
remove vcfs in the GERMLINE_VCFS_NORM process, replaced by germline_v…
JC-Delmas Apr 25, 2024
f2e082c
feat(CHANGELOG): update CHANGELOG
maxulysse Apr 25, 2024
176b50d
Merge branch 'dev' into improve_cloud_tests_matrix
maxulysse Apr 25, 2024
f338898
Update .devcontainer/devcontainer.json
maxulysse Apr 25, 2024
1275645
Rename awstest to cloudtest
adamrtalbot Apr 25, 2024
55e6839
Merge pull request #1482 from maxulysse/nf-prov
maxulysse Apr 25, 2024
752494f
Merge pull request #1378 from nf-core/improve_cloud_tests_matrix
maxulysse Apr 25, 2024
e885888
First modification to contribute to the bcftools/norm module in Sarek
JC-Delmas Apr 25, 2024
9e94a05
Changes in the GERMLINE_VCFS_NORM process
JC-Delmas Apr 25, 2024
2bdba7e
Add fasta argument to POST_VARIANTCALLING process.
JC-Delmas Apr 25, 2024
1214f10
add fasta input as argument
JC-Delmas Apr 25, 2024
b7ba4f2
remove vcfs in the GERMLINE_VCFS_NORM process, replaced by germline_v…
JC-Delmas Apr 25, 2024
34bf47b
Update workflows/sarek/main.nf
JC-Delmas Apr 25, 2024
6dff9af
Resolved merge conflict by keeping changes from branch 34bf47baa9d61f…
JC-Delmas Apr 30, 2024
d289261
Refactor normalization and concatenation of VCF files
JC-Delmas Apr 30, 2024
c78af62
Modify and adjust two scripts to add normalization and integrate FAST…
JC-Delmas May 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
85 changes: 41 additions & 44 deletions .github/workflows/cloudtest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,79 +31,76 @@ on:
default: true

jobs:
trigger-profile-test:
name: Run AWS tests
trigger-test:
name: launch
runs-on: ubuntu-latest
if: ${{ github.repository == 'nf-core/sarek' }}
strategy:
fail-fast: false
matrix:
include:
- profile: test_aws
enabled: ${{ ( github.repository == 'nf-core/sarek' ) && ( github.event_name != 'workflow_dispatch' || ( inputs.test && inputs.aws ) ) }}
test: test
cloud: aws
compute_env: TOWER_COMPUTE_ENV
workdir: TOWER_BUCKET_AWS
- profile: test_azure
enabled: ${{ ( github.repository == 'nf-core/sarek' ) && ( github.event_name != 'workflow_dispatch' || ( inputs.test && inputs.azure ) ) }}
test: test
cloud: azure
compute_env: TOWER_CE_AZURE_CPU
workdir: TOWER_BUCKET_AZURE
- profile: test_full_aws
enabled: ${{ ( github.repository == 'nf-core/sarek' ) && ( github.event_name != 'workflow_dispatch' || ( inputs.somatic && inputs.aws ) ) }}
test: somatic
cloud: aws
compute_env: TOWER_COMPUTE_ENV
workdir: TOWER_BUCKET_AWS
- profile: test_full_azure
enabled: ${{ ( github.repository == 'nf-core/sarek' ) && ( github.event_name != 'workflow_dispatch' || ( inputs.somatic && inputs.azure ) ) }}
test: somatic
cloud: azure
compute_env: TOWER_CE_AZURE_CPU
workdir: TOWER_BUCKET_AZURE
- profile: test_full_germline_aws
enabled: ${{ ( github.repository == 'nf-core/sarek' ) && ( github.event_name != 'workflow_dispatch' || ( inputs.germline && inputs.aws ) ) }}
test: germline
cloud: aws
compute_env: TOWER_COMPUTE_ENV
workdir: TOWER_BUCKET_AWS
- profile: test_full_germline_azure
enabled: ${{ ( github.repository == 'nf-core/sarek' ) && ( github.event_name != 'workflow_dispatch' || ( inputs.germline && inputs.azure ) ) }}
- profile: test_full_germline_ncbench_agilent_aws
enabled: ${{ ( github.repository == 'nf-core/sarek' ) && ( github.event_name != 'workflow_dispatch' || ( inputs.germline_ncbench_agilent && inputs.aws ) ) }}
test: germline
cloud: azure
compute_env: TOWER_CE_AZURE_CPU
workdir: TOWER_BUCKET_AZURE
- profile: test_full_germline_ncbench_agilent
test: germline_ncbench_agilent
cloud: aws
compute_env: TOWER_COMPUTE_ENV
workdir: TOWER_BUCKET_AWS

steps:
# Launch workflow on AWS Batch
- name: AWS Launch
- name: Launch
uses: seqeralabs/action-tower-launch@v2
if: ${{ matrix.enabled && ( github.event_name != 'workflow_dispatch' || inputs.aws ) }}
# If inputs item exists (i.e. workflow_dispatch), then we find matrix.test and check it is false
# If is false, we negate and run the job
if: ( !contains(inputs[matrix.test], 'false') && !contains(inputs[matrix.cloud], 'false') )
with:
run_name: sarek_${{ matrix.profile }}
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
compute_env: ${{ secrets[matrix.compute_env] }}
revision: ${{ github.sha }}
workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/sarek/work-${{ github.sha }}/${{ matrix.profile }}
workdir: ${{ secrets[matrix.workdir] }}/work/sarek/work-${{ github.sha }}/${{ matrix.profile }}
parameters: |
{
"hook_url": "${{ secrets.MEGATESTS_ALERTS_SLACK_HOOK_URL }}",
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/sarek/results-${{ github.sha }}/${{ matrix.profile }}/"
"outdir": "${{ secrets[matrix.workdir] }}/sarek/results-${{ github.sha }}/${{ matrix.profile }}/"
}
profiles: ${{ matrix.profile }}

- uses: actions/upload-artifact@v3
name: Save AWS Logs
name: Save Logs
if: success() || failure()
with:
name: tower-aws-${{ matrix.profile }}-log
path: |
tower_action_*.log
tower_action_*.json

# Launch workflow using Tower CLI tool action
- name: Azure Launch
uses: seqeralabs/action-tower-launch@v2
if: ${{ matrix.enabled && ( github.event_name != 'workflow_dispatch' || inputs.azure ) }}
with:
run_name: sarek_${{ matrix.profile }}
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_CE_AZURE_CPU }}
revision: ${{ github.sha }}
workdir: ${{ secrets.TOWER_BUCKET_AZURE}}/sarek/work-${{ github.sha }}/${{ matrix.profile }}
parameters: |
{
"hook_url": "${{ secrets.MEGATESTS_ALERTS_SLACK_HOOK_URL }}",
"outdir": "${{ secrets.TOWER_BUCKET_AZURE }}/sarek/results-${{ github.sha }}/${{ matrix.profile }}/"
}
profiles: ${{ matrix.profile }}

- uses: actions/upload-artifact@v3
name: Save Azure Logs
if: success() || failure()
with:
name: tower-azure-${{ matrix.profile }}-log
name: tower-${{ matrix.profile }}-log
path: |
tower_action_*.log
tower_action_*.json
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed

- [#1477](https://github.com/nf-core/sarek/pull/1477) - Back to dev
- [#1482](https://github.com/nf-core/sarek/pull/1482) - Pin `nf-prov` plugin to `1.2.2`

### Fixed

- [#1378](https://github.com/nf-core/sarek/pull/1378) - Improve cloud tests launch workflow to use matrix

### Removed

### Dependencies
Expand Down
12 changes: 12 additions & 0 deletions conf/modules/post_variant_calling.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,18 @@

process {

withName: 'GERMLINE_VCFS_NORM'{
ext.args = { [
'--multiallelics - both', //split multiallelic sites into biallelic records and both SNPs and indels should be merged separately into two records
'--rm-dup all' //output only the first instance of a record which is present multiple times
].join(' ') }
ext.when = { params.concatenate_vcfs }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/concat/${meta.id}/" }
]
}

withName: 'GERMLINE_VCFS_CONCAT'{
ext.args = { "-a" }
ext.when = { params.concatenate_vcfs }
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/bcftools/norm/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

60 changes: 60 additions & 0 deletions modules/nf-core/bcftools/norm/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

61 changes: 61 additions & 0 deletions modules/nf-core/bcftools/norm/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@ singularity.registry = 'quay.io'
// Nextflow plugins
plugins {
id 'nf-validation@1.1.3' // Validation of pipeline parameters and creation of an input channel from a sample sheet
id 'nf-prov' // Provenance reports for pipeline runs
id 'nf-prov@1.2.2' // Provenance reports for pipeline runs
}

// Load igenomes.config if required
Expand Down
3 changes: 2 additions & 1 deletion subworkflows/local/post_variantcalling/main.nf
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I've tried an approach, but now I've got a new error:
Process NFCORE_SAREK:SAREK:POST_VARIANTCALLING:CONCATENATE_GERMLINE_VCFS:GERMLINE_VCFS_NORM declares 2 input channels but 1 were specified

-- Check script './workflows/sarek/../../subworkflows/local/post_variantcalling/../vcf_concatenate_germline/main.nf' at line: 27 or see '.nextflow.log' file for more details

I've never done groovy, I think I've broken everything or maybe I'm making it too complex ^^'

Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,14 @@ workflow POST_VARIANTCALLING {

take:
vcfs
fasta
concatenate_vcfs

main:
versions = Channel.empty()

if (concatenate_vcfs){
CONCATENATE_GERMLINE_VCFS(vcfs)
CONCATENATE_GERMLINE_VCFS(vcfs, fasta)

vcfs = vcfs.mix(CONCATENATE_GERMLINE_VCFS.out.vcfs)
versions = versions.mix(CONCATENATE_GERMLINE_VCFS.out.versions)
Expand Down
41 changes: 29 additions & 12 deletions subworkflows/local/vcf_concatenate_germline/main.nf
Original file line number Diff line number Diff line change
@@ -1,42 +1,59 @@
//
// CONCATENATE Germline VCFs
//

// Concatenation of germline vcf-files
include { ADD_INFO_TO_VCF } from '../../../modules/local/add_info_to_vcf/main'
include { TABIX_BGZIPTABIX as TABIX_EXT_VCF } from '../../../modules/nf-core/tabix/bgziptabix/main'
include { BCFTOOLS_CONCAT as GERMLINE_VCFS_CONCAT } from '../../../modules/nf-core/bcftools/concat/main'
include { BCFTOOLS_SORT as GERMLINE_VCFS_CONCAT_SORT } from '../../../modules/nf-core/bcftools/sort/main'
include { TABIX_TABIX as TABIX_GERMLINE_VCFS_CONCAT_SORT } from '../../../modules/nf-core/tabix/tabix/main'
include { ADD_INFO_TO_VCF } from '../../../modules/local/add_info_to_vcf/main'
include { TABIX_BGZIPTABIX as TABIX_EXT_VCF } from '../../../modules/nf-core/tabix/bgziptabix/main'
include { BCFTOOLS_NORM as GERMLINE_VCFS_NORM } from '../../../modules/nf-core/bcftools/norm/main'
include { BCFTOOLS_CONCAT as GERMLINE_VCFS_CONCAT } from '../../../modules/nf-core/bcftools/concat/main'
include { BCFTOOLS_SORT as GERMLINE_VCFS_CONCAT_SORT } from '../../../modules/nf-core/bcftools/sort/main'
include { TABIX_TABIX as TABIX_GERMLINE_VCFS_CONCAT_SORT } from '../../../modules/nf-core/tabix/tabix/main'

workflow CONCATENATE_GERMLINE_VCFS {

take:
vcfs
fasta

main:
versions = Channel.empty()

// Concatenate vcf-files
// Add additional information to VCF files
ADD_INFO_TO_VCF(vcfs)

// Compress the VCF files with bgzip
TABIX_EXT_VCF(ADD_INFO_TO_VCF.out.vcf)

// Normalize the VCF files with BCFTOOLS_NORM
GERMLINE_VCFS_NORM(vcf: ADD_INFO_TO_VCF.out.vcf, fasta: fasta)

// Compress the normalized VCF files with bgzip
TABIX_EXT_VCF(GERMLINE_VCFS_NORM.out.vcf)

// Index the compressed normalized VCF files
TABIX_GERMLINE_VCFS_CONCAT_SORT(TABIX_EXT_VCF.out.gz)

// Gather vcfs and vcf-tbis for concatenating germline-vcfs
germline_vcfs_with_tbis = TABIX_EXT_VCF.out.gz_tbi.map{ meta, vcf, tbi -> [ meta.subMap('id'), vcf, tbi ] }.groupTuple()
germline_vcfs_with_tbis = TABIX_GERMLINE_VCFS_CONCAT_SORT.out.map { meta, vcf, tbi -> [meta.subMap('id'), vcf, tbi] }.groupTuple()

// Concatenate the VCF files
GERMLINE_VCFS_CONCAT(germline_vcfs_with_tbis)

// Sort the concatenated VCF files
GERMLINE_VCFS_CONCAT_SORT(GERMLINE_VCFS_CONCAT.out.vcf)

// Index the sorted concatenated VCF files
TABIX_GERMLINE_VCFS_CONCAT_SORT(GERMLINE_VCFS_CONCAT_SORT.out.vcf)

// Gather versions of all tools used
versions = versions.mix(ADD_INFO_TO_VCF.out.versions)
versions = versions.mix(TABIX_EXT_VCF.out.versions)
versions = versions.mix(GERMLINE_VCFS_NORM.out.versions)
versions = versions.mix(GERMLINE_VCFS_CONCAT.out.versions)
versions = versions.mix(GERMLINE_VCFS_CONCAT.out.versions)
versions = versions.mix(GERMLINE_VCFS_CONCAT_SORT.out.versions)
versions = versions.mix(TABIX_GERMLINE_VCFS_CONCAT_SORT.out.versions)

emit:
vcfs = germline_vcfs_with_tbis // post processed vcfs

vcfs = TABIX_GERMLINE_VCFS_CONCAT_SORT.out.gz_tbi // post-processed VCFs
versions // channel: [ versions.yml ]
}

3 changes: 2 additions & 1 deletion workflows/sarek/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -794,7 +794,8 @@ workflow SAREK {

// POST VARIANTCALLING
POST_VARIANTCALLING(BAM_VARIANT_CALLING_GERMLINE_ALL.out.vcf_all,
params.concatenate_vcfs)
fasta,
params.concatenate_vcfs)

// Gather vcf files for annotation and QC
vcf_to_annotate = Channel.empty()
Expand Down