-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bcftools mpileup #414
bcftools mpileup #414
Conversation
This is the beginning of bringing @pd3 bcftools mpileup changes over. Have started from scratch by bringing over the mpileup related files from current samtools develop and adding changes on top of that (current state is essentially a redo of pd3/bcftools@d7a93f5 and the non-gvcf part of pd3/bcftools@cf3219c). Before bringing in the new things it would be good if people see if everything looks okay. Candidates for removal:
Candidates for updates to default:
There are basically 3 features on the @pd3 fork to be added in after this.
|
putc(c, fp); | ||
} else putc(p->is_refskip? (bam_is_rev(p->b)? '<' : '>') : '*', fp); | ||
if (p->indel > 0) { | ||
putc('+', fp); printw(p->indel, fp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also remove printw()
, which is only used here in pileup_seq()
.
My thoughts. In general, unless it can be kept 100% identical (which for good reasons it cannot) then we should make a clean break and feel free to change whatever we wish. Our end goal should be a command within bcftools that is internally consistent with other bcftools commands. It's preferable for this to happen in one step rather than have two releases with changes in. The caveat here is it obviously needs careful and explicit documentation outlning what has changed and what the mapping from old to new (if existing) is.
Other things I don't really have an opinion about. |
Thanks James. I agree with all that. One other change in the name of consistency within bcftools should probably be to drop the |
On Wed, May 25, 2016 at 03:53:00AM -0700, Shane McCarthy wrote:
Rather confusingly this is -b BED-file in samtools depth and -l in
It took me some time to parse the double negative and lack of Maybe reword it as "Only output specified positions (chr pos) or regions (BED)". James James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova The Wellcome Trust Sanger Institute is operated by Genome Research |
bf1c255
to
cddc380
Compare
OK. I've added three more commits. First one removes the easily deprecated options. The second introduces the The final sync-up for mpileup with the @pd3 branch is pd3/bcftools@cfd7cf9, which is waiting on the decision on the htslib regidx changes. In terms of further possible option rearrangements:
|
7b48fd6
to
79a2317
Compare
* prefix with ^ to negate the selection * assign/rename samples by providing second field: RG_ID_1 SAMPLE_A RG_ID_2 SAMPLE_A RG_ID_3 SAMPLE_B * on read group name conflict give the alignment file, asterisk for all reads in the file: RG_ID_1 FILE_1.bam SAMPLE_A RG_ID_2 FILE_2.bam SAMPLE_A * FILE_3.bam SAMPLE_C Resolves 4th item in #414 (comment) and samtools/samtools#324.
* prefix with ^ to negate the selection * assign/rename samples by providing second field: RG_ID_1 SAMPLE_A RG_ID_2 SAMPLE_A RG_ID_3 SAMPLE_B * on read group name conflict give the alignment file, asterisk for all reads in the file: RG_ID_1 FILE_1.bam SAMPLE_A RG_ID_2 FILE_2.bam SAMPLE_A * FILE_3.bam SAMPLE_C Resolves 4th item in #414 (comment) and samtools/samtools#324.
This branch is ready to be merged now. We should review docs and consider further changes once people have tried out the functionality on develop. |
Not functional yet. Just copying over the files. * bam_plcmd.c for the mpileup command * bam2bcf.[ch] bam2bcf_indel.c for the VCF/BCF creation * sample.[ch] for RG:SM handling
minimal changes to files copied from samtools in order to compile in bcftools. * use regidx from htslib rather than bedidx from samtools * remove sam_opts calls as sam_opts.h not copied from samtools Todo: * copy over relevant functionality from sam_opts.h * remove text based mpileup output * update options and defaults * bring over `---gvcf` and other changes from @pd3 fork
* deprecate `-g -v -u` options (still functional, but with warning) * exit with message to use `samtools mpileup` if `-s/--output-MQ` used * `-O` option was `--output-BP` and is not `--output-type` for consistency with other bcftools commands. If `[buzv]` not given as an option will warn These catches for old text output options are probably not necessary as users may not expect text output from `bcftools mpileup`.
* mpileup.1.out, mpileup.2.out, mpileup.3.out and mpileup.4.out are from samtools with mpileup.1.out and mpileup.3.out converted from the text output * mpileup.5.out a new test with the newer AD, etc annotations * sam/bam/cram test files all stored. perhaps there is some way to store one version and convert within the test ala the vcf-miniview in samtools?
adds to @4e7c8fb86349761fed1b290357dbc792222ecdcb
* remove deprecated `-g`, `-v`, `-u`, `-D`, `-V`, `-S` * remove `-R` short option to make way for `--regions-file` option later
This commit brings over the `--gvcf` functionality from @pd3's branch, consisting of relevant bits from pd3/bcftools@cf3219c and pd3/bcftools@ee8210d Reference only blocks will be merged into gVCF blocks when the minimum per-sample depth falls in the intervals defined by the argument to the `-gvcf` option. Documentaion added to explain the merging and a test added.
pulling over of pd3/bcftools@cf5c354 adding in `-S,--samples-file` option and exiting if no samples are read from the file or list TODO: add exclude logic with `^` prefix as in other bcftools commands removed `config.h` from `sample.c` as leftover this is in samtools, but not bcftools at the moment
switch `-t/--output-tags` option to `-a/--annotate` to make room for the `-t/--targets` option available annotations are now listed on request with `-a ?` rather than cluttering up the help output.
This is meant as a temporary change while we extend the regidx api, but allow bcftools code to use these changes before they appear in some form in htslib. This commit does not add new features, just copies over `regidx.[ch]` and rejiggers the linking to use these local bcftools copies. the `*_c` are removed due to relying on `hts_internal.h` (see fc9aeb6f77668afed412119701c5c58b0fca8091)
* added functions to loop over all regions * lazy index build in case random access is not required * support for chromosome names only, beg-end coordinates not mandatory * set cap at maximum coordinate at 2147483647, hts_itr does not support larger * tab and reg parsers will throw on finding a `0` to catch user error of using 0-based rather than 1-based coords
* `-r/--region` replaced by `-r/--regions` which will accept a comma separated list of regions as in other bcftools commands. `--region` still accepted * `-R/--regions-file` option added to read regions from a file This commit lifts over work originally done in pd3/bcftools@cfd7cf9 Note: when more than one region is given, all indices are stored in memory, which can be a problem when running on many bams. An alternative would be to cache pre-filled `hts_itr`'s for each region. Resolves #369
…ools commands the point of `--no-version` is to remove invocation specify metadata in the header lines for pipeline systems that are tracking this separately. we are outputting the `##reference` line though in mpileup. could drop this as well when `--no-version` used. seems silly to add a separate option.
* prefix with ^ to negate the selection * assign/rename samples by providing second field: RG_ID_1 SAMPLE_A RG_ID_2 SAMPLE_A RG_ID_3 SAMPLE_B * on read group name conflict give the alignment file, asterisk for all reads in the file: RG_ID_1 FILE_1.bam SAMPLE_A RG_ID_2 FILE_2.bam SAMPLE_A * FILE_3.bam SAMPLE_C Resolves 4th item in #414 (comment) and samtools/samtools#324.
…us behaviour Such reads can be matched explicitly using the question mark "?"
d6664de
to
0d95b8e
Compare
Rebased prior to merge. |
Bring mpileup over to bcftools. The motivation being that the
vcf/bcf
generation by mpileup is tightly coupled to the calling process inbcftools call
. Having these in separate tools/repos made mistakes with mismatched versions too likely and hampered development as release of new features often needed to happen in both samtools and bcftools simultaneously.See the individual commit messages for more information.