-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major changes to filter_reads #141
Conversation
2455ddc
to
c922835
Compare
ec37f0b
to
ab73b67
Compare
* Renamed to subset_reads which more closely reflects its actual function * Now ignores taxonomic assignments not passing filter * Added --include-lowconf which allows users to include taxonomic assignments not passing filter * Modified test TSV to make some taxonomic assignments fail filter * Recalculated test files and hashes manually
* Switch from FASTXIterator to skbio.io (slow) * Passing --no-validate only verifies reads begin with '@' * Restructure iterators to reduce statements in innermost loop * Add support for bzip'd FASTX files * Catch mismatch of TSV and FASTX file lengths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's alias to filter_reads
too with a deprecation note. We've sent that command to users.
We should also update our canned Intercom response using this.
(False, False, True, False), # --with-children | ||
(False, False, True, False), # --exclude-reads | ||
(False, False, True, True), # --with-children --exclude-reads | ||
(False, False, False, False, False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is insane, but OK 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just well-tested.
onecodex/scripts/subset_reads.py
Outdated
@click.option('--split-pairs', default=False, is_flag=True, | ||
help='By default, if either read in a pair matches, both will match. Choose this ' | ||
'option to consider each paired-end read separately. Resulting files may *not* ' | ||
'have the same number of reads!') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it --subset-pairs-independently
and have the following help message:
By default, if either read in a pair matches, both will be retained in the subset file. With this option, R1 and R2 files will be evaluated independently. Note that the subset output FASTQs are *not* guaranteed to have the same number of reads!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as requested.
This pull request addresses #79 by:
In addition, this PR might break existing code by: