Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concat gz files #338

Closed
wants to merge 1 commit into from
Closed

concat gz files #338

wants to merge 1 commit into from

Conversation

lindenb
Copy link

@lindenb lindenb commented Jan 29, 2020

Hi all, I'm passing by thanks to @apeltzer https://twitter.com/alex_peltzer/status/1222317924486127616 :-)

I cannot test the workflow for now (PR checklist...) but, for the line:

gzcat *.gz | gzip > out.gz

you don't need to gunzip and re-gzip the files. A simple cat works and will be faster. See https://stackoverflow.com/questions/8005114


Many thanks to contributing to nf-core/eager!

Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/eager branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Learn more about contributing: https://github.com/nf-core/eager/tree/master/.github/CONTRIBUTING.md

Hi all, I'm passing by thanks to @apeltzer https://twitter.com/alex_peltzer/status/1222317924486127616 :-)

I cannot test the workflow but, for this line:

`gzcat *.gz | gzip > out.gz`

you don't need to gunzip and re-gzip the files. A simple `cat` works. See https://stackoverflow.com/questions/8005114
@jfy133
Copy link
Member

jfy133 commented Jan 29, 2020

Thanks @lindenb ! Very good point!

The current master is very 'old' now, and there is 'rebuilt' version in dev which is much more mature and should be being released in a couple of months.

That said, I see that zcat is still used there. I will close this PR, and feel free to make the changes to the dev branch if you have time.

Otherwise I'll add an issue and will make the swtich to cat myself when I have a moment.

@lindenb
Copy link
Author

lindenb commented Jan 29, 2020

Hi again,

sorry if it's not the best place for asking a question about this workflow.

For the line: https://github.com/nf-core/eager/blob/master/main.nf#L647 you're concatenating all the *.gz file and treat all the files as a single end file.

Wouldn't it be a better strategy to keep the unmerged paired files apart from the single ends and map them as real paired end data ?

@jfy133
Copy link
Member

jfy133 commented Jan 29, 2020

No problem.

In ancient DNA we typically have such short reads almost everything gets merged (>90%). The number of unmerged paired reads is so low it's just easier to lump them into one (so basically the same command as SE mapping), as we get very little extra info from having them unmerged and mapping separately.

We could consider having an option to map them separately if there are more use cases though.

@apeltzer can advise more on the reasoning behind this decision.

@lindenb
Copy link
Author

lindenb commented Jan 29, 2020

@jfy133 thanks !

@jfy133
Copy link
Member

jfy133 commented Jan 29, 2020

@lindenb there is also an EAGER channel on the nf-core slack if you prefer https://nf-co.re/join/slack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants