Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: optional output for zero-coverage positions in samtools mpileup #496

Closed
shiraz3 opened this issue Dec 1, 2015 · 10 comments
Closed
Assignees
Milestone

Comments

@shiraz3
Copy link

shiraz3 commented Dec 1, 2015

At the moment, mpileup skips positions which don't have any reads mapping to them.

However, if you want to use mpileup output downstream in custom scripts, having lines output for positions with zero coverage becomes important. E.g. if you want to compare separate mpileup outputs for different mappings to the same reference.

Could you implement a -a and -aa option like you have for samtools depth?

For zero-coverage positions, the base and quality columns should ideally consist of empty tabs, or some other placeholder such that the number of columns in each line is kept consistent.

@tseemann
Copy link

tseemann commented Dec 5, 2015

+1

@jmarshall jmarshall added this to the 1.4 milestone Dec 7, 2015
@jkbonfield
Copy link
Contributor

Hmm, anyone know more about the mysteries of mpileup and the existing zero-depth support?
I produce two shallow files from test/dat.mpileup.2.sam and then mpileup them:

samtools view -h -s 0.3 test/dat/mpileup.2.sam > shallow.sam
samtools view -h -s 1.3 test/dat/mpileup.2.sam > shallow2.sam
./samtools mpileup -AB shallow.sam shallow2.sam
...
17      551     N       3       GGG     AFG     1       G       F
17      552     N       3       C$CC    D=H     1       C       =
17      553     N       2       AA      GG      1       A       G
17      554     N       1       A       M       0               
17      555     N       2       AA      9H      1       A       9
17      556     N       2       CC      AI      1       C       A
17      557     N       2       AA      FJ      1       A       F
17      558     N       2       AA      EM      1       A       E
17      559     N       2       CC      =I      1       C       =
17      560     N       2       CC      @J      1       C       @
17      561     N       2       AA      BJ      1       A       B
17      562     N       2       C$C     AJ      1       C$      A
17      563     N       1       A       K       0       *       *
17      564     N       1       C       C       0       *       *
17      565     N       2       G^]g    =:      0       *       *
17      566     N       2       Cc      I?      0       *       *
17      567     N       2       Cc      GD      0       *       *
17      568     N       2       Cc      DF      0       *       *
17      569     N       2       Tt      CI      0       *       *
17      570     N       2       Tt      DJ      0       *       *
...

So I get regions where shallow2.sam had data lacking and is displayed as "0 * *", likewise for shallow1.sam, but some regions it's just "0" with no * *. Why? Sometimes even 0 on both:

17      674     N       1       G$      D       1       G$      D
17      681     N       0                       0               
17      682     N       2       aa      @B      1       a       @
17      683     N       3       aa^]A   EAB     2       a^]A    EB

I can't find any description of the difference between a blank column and a column containing *.

It looks quite easy to fill out the missing data (as per depth -a) with "0\t\t" or with "0\t*\t*", but I'm a bit clueless as to what I'm really inferring by making that choice. Anyone know?

@tseemann
Copy link

Confession: I'm terrified of mileup.

@jkbonfield
Copy link
Contributor

Think yourself lucky you're not having to look into its weirder quirks!

Pop quiz - should a cigar snippet of 2I3D yield something like +2CG-3TAA combination in the pileup output? I think yes, for clarity and completeness. Samtools thinks no... I don't even want to contemplate that nugget yet!

jkbonfield added a commit that referenced this issue Feb 16, 2016
Also fixed the tests for c1#pad1.bam and c1#pad2.bam; they should have
been known failures instead of know passes (see issues #57 and #59).

[NEWS]
Mpileup has gained -a and -aa options to output all coordinates
including those with zero coverage.  Single -a limits this to
reference sequences containing some coverage with double -a (-aa or -a
-a) also displays references with zero reads mapped to them.
jkbonfield added a commit that referenced this issue Feb 16, 2016
Also fixed the tests for c1#pad1.bam and c1#pad2.bam; they should have
been known failures instead of know passes (see issues
samtools/htslib/#57 and samtools/htslib/#59).

[NEWS]
Mpileup has gained -a and -aa options to output all coordinates
including those with zero coverage.  Single -a limits this to
reference sequences containing some coverage with double -a (-aa or -a
-a) also displays references with zero reads mapped to them.
@daniazi
Copy link

daniazi commented Apr 5, 2016

Hi, I installed very latest samtools to run mpileup with avail "-aa" option but it gives me following error: mpileup: invalid option -- 'a'

@jkbonfield
Copy link
Contributor

Which version do you mean by the "very latest samtools"? 1.3? Develop? Or develop + pull requests applied?

The change you refer to is in #537, which has not yet been merged into develop. You will need to manually checkout develop and merge in this PR to your local branch if you wish to get this feature immediately.

@daniazi
Copy link

daniazi commented Apr 5, 2016

I installed the version 1.3-23-gcd7e005 (using htslib 1.3-50-g653f45e). I tried to find as you suggested but since I am not familiar with github and other terms I could not find the relevant source code. Can you please guide me where to locate the relevant version?

@daniazi
Copy link

daniazi commented Apr 5, 2016

Hi again, I found it at https://github.com/samtools/samtools/tree/mpileup_zero_%23496

Earlier I was copying git clone link which was downloading develop version I guess but when I clicked "Download Zip" it downloaded the right version. Thanks for your help.

@y9c
Copy link

y9c commented Oct 31, 2023

Hi @jmarshall. I find a bug relative to the -aa argument. When the sam file is empty, -aa won't output and records. Could you make it compatible with empty files, so it is more robust in analysis multiple files in a pipeline?

@whitwham
Copy link
Contributor

whitwham commented Nov 1, 2023

@y9c if you have found a new bug please could you open a new issue. This issue is closed and comes from 8 years ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants