Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nf-test quantify pseudoalignment #1246

Merged
merged 13 commits into from
Mar 12, 2024

Conversation

adamrtalbot
Copy link
Contributor

@adamrtalbot adamrtalbot commented Mar 8, 2024

Draft PR for an nf-test for QUANTIFY_PSEUDOALIGNEMNT

Problems:

  • Segmentation fault when running SALMON_QUANT
  • Incompatible files when running SE_GENE_LENGTH_SCALED

All fixed!

This changes one global parameter when using Salmon, which means it now keeps duplicate transcripts (i.e. the same sequence, not the same transcript ID).

The rest is pretty straightforward testing.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/rnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Mar 8, 2024

This PR is against the master branch ❌

  • Do not close this PR
  • Click Edit and change the base to dev
  • This CI test will remain failed until you push a new commit

Hi @adamrtalbot,

It looks like this pull-request is has been made against the adamrtalbot/rnaseq master branch.
The master branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to master are only allowed if they come from the adamrtalbot/rnaseq dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

@adamrtalbot adamrtalbot changed the base branch from master to dev March 8, 2024 11:50
Copy link

github-actions bot commented Mar 8, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 35ff7ad

+| ✅ 169 tests passed       |+
#| ❔   8 tests were ignored |#
!| ❗   7 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: assets/multiqc_config.yml
  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

  • files_exist - File is ignored: conf/modules.config
  • nextflow_config - Config default ignored: params.ribo_database_manifest
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
  • multiqc_config - 'assets/multiqc_config.yml' not found

✅ Tests passed:

Run details

  • nf-core/tools version 2.13.1
  • Run at 2024-03-12 11:40:00

adamrtalbot and others added 6 commits March 8, 2024 11:51
Fixes:
 - a single sample will now work and not raise an error when trying to parse the count data
 - duplicate transcripts now work with Salmon (previously it dropped one of them)
@adamrtalbot adamrtalbot requested review from maxulysse, drpatelh and pinin4fjords and removed request for maxulysse March 8, 2024 18:02
@adamrtalbot
Copy link
Contributor Author

@drpatelh this changes the behaviour of Salmon when dealing with duplicate transcripts. Previously, it combined any transcripts with the same sequence. Now, it will keep all transcripts with the same sequence, which will affect mainly alternate haplotypes but I think it's the right™️ option? Want to check if you think this behaviour is OK before we merge.

@@ -39,7 +39,8 @@ process {
withName: 'SALMON_INDEX' {
ext.args = { [
params.gencode ? '--gencode' : '',
params.pseudo_aligner_kmer_size ? "-k ${params.pseudo_aligner_kmer_size}": ''
params.pseudo_aligner_kmer_size ? "-k ${params.pseudo_aligner_kmer_size}": '',
'--keepDuplicates'
Copy link
Member

@drpatelh drpatelh Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what sort of behaviour this is going to have @adamrtalbot so a bit reluctant to add it in without some proper testing. Did we need to add it to fix something else? Otherwise maybe we create an issue outlining why we need it in and do a proper assessment before adding to the pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discovered it while fixing the tests. When you have two identical sequence transcripts, Salmon will just drop one of them. Downstream, the tools for matching transcripts/genes/names causes an error because it's missing some data. This flag disables this behaviour and makes it match STAR-RSEM, Kallisto etc.

@pinin4fjords has fixed as many downstream problems as he can, but silently dropping transcripts feels like the wrong behaviour in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted SALMON_INDEX from keeping duplicates, raised this issue to discuss further: #1259

When combined with the fix for the module, the tests should pass now (famous last words).

Changes:
 - SALMON_INDEX will keep duplicates
 - summarizedexperiment will handle the missing transcripts
 - Version numbers checked in QUANTIFY_PSEUDO_ALIGNMENT subworkflow
Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you got it now :-)

Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adamrtalbot adamrtalbot merged commit e93915d into nf-core:dev Mar 12, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add nf-test to QUANTIFY_PSEUDO_ALIGNMENT subworkflow
4 participants