Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[18.01] Add queryname sorted and input sorted datatypes #5589

Conversation

Projects
None yet
4 participants
@mvdbeek
Copy link
Member

mvdbeek commented Feb 22, 2018

The queryname sorted datatype (BamQuerynameSorted) ensures that the file is
queryname ordered. The BamInputSorted datatype can be used to describe the
output of aligners, which usually keep mate pairs adjacent. The BamInputSorted
datatype allows using tools that require mates to be adjacent, without
requiring an explicit sorting step. This can reduce the time and space
required for duplicate marking tools, HiC tools and structural variant
detection tools that require mates to be grouped together in an alignment file.

The BamQuerynameSorted has a converter that works for all datatypes that can be
consumed by samtools. There is no converter for BamInputSorted, since
BamQuerynameSorted is valid input for tools requiring BamInputSorted
input.

I'm targeting 18.01 since I think it would be very unfortunate if we start
annotating aligners with the very loose BamNative datatype, since this datatype
does not make any promises about sort order at all.

This closes #5497, goes a long way towards the problems mentioned in #5496
and would make galaxyproject/tools-iuc#1732 and galaxyproject/tools-iuc#1591 better

Add queryname sorted and input sorted datatypes
The queryname sorted datatype (BamQuerynameSorted) ensures that the file is
queryname ordered.  The BamInputSorted datatype can be used to describe the
output of aligners, which usually keep mate pairs adjacent. The BamInputSorted
datatype allows using tools that require mates to be adjacent, without
requiring an explicit sorting step.  This can reduce the time and space
required for duplicate marking tools, HiC tools and structural variant
detection tools that require mates to be grouped together in an alignment file.

The BamQuerynameSorted has a converter that works for all datatypes that can be
consumed by samtools. There is no converter for BamInputSorted, since
BamQuerynameSorted is valid input for tools requiring BamInputSorted
input.

I'm targeting 18.01 since I think it would be very unfortunate if we start
annotating aligners with the very loose BamNative datatype, since this datatype
does not make any promises about sort order at all.

This closes #5497, goes
a long way towards the problems mentioned in #5496
and would make galaxyproject/tools-iuc#1732 and galaxyproject/tools-iuc#1591 better

@mvdbeek mvdbeek force-pushed the mvdbeek:samtools_queryname_and_input_order_datatypes branch from 0fd7253 to bb19c7e Feb 22, 2018

@bgruening

This comment has been minimized.

Copy link
Member

bgruening commented Feb 23, 2018

Cool thanks a lot @mvdbeek! 👍 from my site.

@martenson martenson added this to the 18.01 milestone Feb 26, 2018

@jmchilton jmchilton merged commit 77d085a into galaxyproject:release_18.01 Mar 1, 2018

6 checks passed

api test Build finished. 351 tests run, 4 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 173 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 79 tests run, 4 skipped, 0 failed.
Details
selenium test Build finished. 118 tests run, 2 skipped, 0 failed.
Details
toolshed test Build finished. 577 tests run, 0 skipped, 0 failed.
Details
@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Mar 1, 2018

Great thanks a bunch @mvdbeek !

@jmchilton

This comment has been minimized.

Copy link
Member

jmchilton commented Mar 9, 2018

Is it odd that we call these "xxx.bam" but bam native "bam_native". Should we make a last minute change to call bam_native native.bam instead?

@mvdbeek

This comment has been minimized.

Copy link
Member Author

mvdbeek commented Mar 9, 2018

I think that would be good, yes. Maybe even unsorted.bam ?

@mvdbeek mvdbeek deleted the mvdbeek:samtools_queryname_and_input_order_datatypes branch Jun 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.