You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your work on Sequana. Really appreciate that you are making parts of the pipeline usable standalone, like sequana_coverage. I got a couple of requests regarding it.
First, you call the input file "BED", however technically it's not. You request the 3rd column to be the coverage:
- a BED file that is a tabulated file at least 3 columns.
The first column being the reference, the second is the position
and the third column contains the coverage itself.
However, by the standard and the 3rd column must be the end coordinate of a region, with the 2nd column being the start of this region, 0-based:
The first three required BED fields are:
chrom - The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).
chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
chromEnd - The ending position of the feature in the chromosome or scaffold.
However I guess it's usually fine to put values like coverage into the optional 4th column; but first 3 should really stay coordinates. So wondering if you could disable this check by any chance?
This leading to the next request: mosdepth provides a twice faster method to generate per-base coverage compared than samtools depth. It also generates a genuine BED file, compressing consecutive bases of the same coverage into regions, e.g. in the beginning of a chromosome it would typically have
21 0 9411191 0
21 9411191 9411192 1
...
Instead of repeated
21 0 0
21 1 0
21 2 0
...
Saving a lot of disk space (samtools depth output for a whole genome took 45G in my test run).
Also, mosdepth can generate a window-based coverage, which can be used directly for sequana_coverage visualizations, saving much more computation and disk space. Wondering if you could consider using input from mosdepth instead and even running it internally for BAM inputs?
Vlad
The text was updated successfully, but these errors were encountered:
Hi Vlad, thanks this is very helpful. This won't be done immediately but this looks very promising indeed. I will implement this feature (mosdepth). As for the BED file, thanks for the clarification. We were a bit lazy here by calling the input file a BED file indeed. I let this issue aside for now but will come back to it in Feb/March if possible.
Hi,
Thanks for your work on Sequana. Really appreciate that you are making parts of the pipeline usable standalone, like
sequana_coverage
. I got a couple of requests regarding it.First, you call the input file "BED", however technically it's not. You request the 3rd column to be the coverage:
However, by the standard and the 3rd column must be the
end
coordinate of a region, with the 2nd column being thestart
of this region, 0-based:However I guess it's usually fine to put values like coverage into the optional 4th column; but first 3 should really stay coordinates. So wondering if you could disable this check by any chance?
This leading to the next request: mosdepth provides a twice faster method to generate per-base coverage compared than
samtools depth
. It also generates a genuine BED file, compressing consecutive bases of the same coverage into regions, e.g. in the beginning of a chromosome it would typically haveInstead of repeated
Saving a lot of disk space (
samtools depth
output for a whole genome took 45G in my test run).Also,
mosdepth
can generate a window-based coverage, which can be used directly forsequana_coverage
visualizations, saving much more computation and disk space. Wondering if you could consider using input frommosdepth
instead and even running it internally for BAM inputs?Vlad
The text was updated successfully, but these errors were encountered: