Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more entry points to the workflow #261

Open
ramprasadn opened this issue Nov 29, 2022 · 7 comments
Open

Add more entry points to the workflow #261

ramprasadn opened this issue Nov 29, 2022 · 7 comments
Labels
Milestone

Comments

@ramprasadn
Copy link
Collaborator

Description of feature

It would be nice to have entry points for different parts of the pipeline ex. snv/sv annotation, mitochondrial analysis.

@ramprasadn ramprasadn added the enhancement Improvement for existing functionality label Nov 29, 2022
@ramprasadn
Copy link
Collaborator Author

break this down into smaller issues

  1. start from duplicate marked bam
  2. start from variant called vcfs

@fa2k
Copy link
Contributor

fa2k commented May 30, 2023

I've made a draft version of this. This is just to have something concrete to look at - I don't think it's necessarily the right way, as I don't know the pipeline that well.

  • Add an input_type parameter to the workflow, which can either be reads for FASTQ (default) or alignments for BAM. In the future there could also be another value for VCF+BAM.
  • Add a column bam to the sample sheet. The CHECK_INPUT process is used to get the BAM and BAI files based on the sample sheet.
  • Add a test_bam config to test it

Incomplete:

  • Modify SAMPLESHEET_CHECK and its script
  • Test bam file and samplesheet are not uploaded to test data repository

Problem:

  • BWAMEM2_MEM_MT crashes ' [E::bwa_set_rg] the read group line is not started with @rg'. (I don't know yet if this is a serious problem with the approach, or a trivial fix)

See changes in my fork fa2k@ff78904

@fa2k
Copy link
Contributor

fa2k commented Jun 5, 2023

I've made it run with both the existing test and a new test for bam input (and cleaned up a bit).

The test outputs are not identical, but I've checked two vcf files:

annotate_snv/justhusky_rohann_vcfanno_filter_vep.vcf: Identical up to different timestamps in headers
annotate_sv/justhusky_svdbquery_vep.vcf: Unknown differences

check_samplesheet.py I made a polymorphic RowChecker - it's a bit strange and we can consider alternatives. Overall, here's the changes compared to the dev branch:

dev...fa2k:raredisease:multiple-entry-points

The test_bam profile requires an override sample sheet, and needs bam file to exist locally.

@fa2k
Copy link
Contributor

fa2k commented Jul 20, 2023

I have updated to integrate the upstream changes from dev. Will make a pull request.

@Jakob37
Copy link
Contributor

Jakob37 commented Dec 21, 2023

Any updates on this feature? Starting from bam-files would be extremely useful.

@fa2k
Copy link
Contributor

fa2k commented Jan 2, 2024

Any updates on this feature? Starting from bam-files would be extremely useful.

Sorry for the late reply. As far as I know, there is no work that has been started. I'm even unlikely to start in the near future. I still need it, and will start eventually if nobody else takes it.

@maxulysse
Copy link
Member

I'd happily share the logic with have in Sarek for this, and really we should converge on more subworkflows and bits of code for this kind of things

@jemten jemten modified the milestones: Release 1.2.0, Release 1.3.0 Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants