Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multiple genomes and annotation for transcriptome mapping #46

Merged
merged 64 commits into from
Dec 11, 2019

Conversation

drpatelh
Copy link
Member

@drpatelh drpatelh commented Dec 9, 2019

Many thanks for contributing to nf-core/nanoseq!

Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/nanoseq branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Learn more about contributing: https://github.com/nf-core/nanoseq/tree/master/.github/CONTRIBUTING.md

Im not going to have much time to work on this for the rest of the week because Ive spent the past few days pretty much glued to getting this solved. Some things left to do:

  • There still may be some bugs in the logic so it will need extensive testing with different entries for genome and transcriptome, and by using the different --skip flags to see if the channels are all defined properly.
  • Tests will also fail because the samplesheet.csv on test-datasets doesnt have a transcriptome entry
  • Add documentation.

If someone fancies addressing the remaining points then its probably wise to update the samplesheet.csv on test-datasets, re-trigger the Travis tests here, merge after passing (hopefully will work first time, if not I can push a commit or two if required) and then some heavy testing 👍 😅

Copy link
Contributor

@lwratten lwratten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good! 👍
Very good timing too! I just started implementing this yesterday - you really saved me a whole lot of headache with this one so thank you very much! 😁
Just a few things here and there from me that might make it a bit more robust/speed up certain processes.

Also was thinking it might be good to alter the output sub directories to reflect whether a sample was aligned to genome or txome - as this will not show up in the file name. What do you reckon?

Also, I'll go through and change the samplesheet now so we can get tests passing. Happy to implement some of the suggestions below too after a merge!

bin/check_samplesheet.py Outdated Show resolved Hide resolved
bin/check_samplesheet.py Outdated Show resolved Hide resolved
bin/check_samplesheet.py Outdated Show resolved Hide resolved
bin/check_samplesheet.py Outdated Show resolved Hide resolved
bin/check_samplesheet.py Show resolved Hide resolved
main.nf Outdated
Comment on lines 491 to 498
// def fix_channel(ArrayList ch) {
//
// if (ch.size() == 7) {
// return [ file(ch[0]), file(ch[1]), ch[2], ch[5], ch[6] ] // [ fasta, gtf, bed, sizes, is_transcripts ]
// } else if (ch.size == 5) {
// return [ file(ch[0]), false, false, ch[3], ch[4] ]
// }
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should have been removed 🤔

main.nf Show resolved Hide resolved
main.nf Show resolved Hide resolved
main.nf Show resolved Hide resolved
@drpatelh
Copy link
Member Author

drpatelh commented Dec 10, 2019

Ok. Ive tested all of the commands in .travis.yml and its working now.

Thanks for adding the transcriptome entry to the test-datasets @lwratten ! I was sitting here thinking the tests should be failing until I looked 😄 We still do need to resolve #37 too.

I dont think we should complicate things by having genome/transcriptome in the file/directory name. It seems tempting but everything should be stored in the samplesheet.csv. Ive tried to factor in as many possibilities as I can but its most likely that people will run the pipeline on a single reference genome/transcriptome. I think properly documenting the exact behaviour of the pipeline is probably more important. Also, please do test the pipeline by using separate genome/transcriptome entries in different formats to see if we are getting the expected behaviour.

@drpatelh
Copy link
Member Author

Would be good if you can double-check the logic I have used for minimap2, graphmap2 for DNA, RNA etc.

Copy link
Contributor

@lwratten lwratten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic for minimap2 and graphmap2 call look good!
Will have a solid test with lots of different references formats and combos once we're merged! 👍

@drpatelh drpatelh merged commit bb81383 into nf-core:dev Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants