Skip to content

Show how the Rsamtools pileup function could be improve to include details about indels

Notifications You must be signed in to change notification settings

mfoll/Rsamtools_pileup_indels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

##Show how the Rsamtools pileup function could be improve to include details about indels

library(Rsamtools)
bam_file <- "test.bam"
sbp <- ScanBamParam(which = GRanges("chr17", IRanges(7578425, 7578425)))
p_param <- PileupParam(max_depth = 30000, min_base_quality = 0, include_insertions = TRUE)
res <- pileup(bam_file, scanBamParam = sbp, pileupParam = p_param)
res
##    seqnames     pos strand nucleotide count           which_label
## 1     chr17 7578425      +          +     7 chr17:7578425-7578425
## 2     chr17 7578425      -          +    14 chr17:7578425-7578425
## 3     chr17 7578425      +          -    61 chr17:7578425-7578425
## 4     chr17 7578425      -          -     2 chr17:7578425-7578425
## 5     chr17 7578425      +          A    11 chr17:7578425-7578425
## 6     chr17 7578425      -          A     5 chr17:7578425-7578425
## 7     chr17 7578425      +          C    21 chr17:7578425-7578425
## 8     chr17 7578425      -          C    23 chr17:7578425-7578425
## 9     chr17 7578425      +          G     7 chr17:7578425-7578425
## 10    chr17 7578425      -          G     5 chr17:7578425-7578425
## 11    chr17 7578425      +          T  4866 chr17:7578425-7578425
## 12    chr17 7578425      -          T  6273 chr17:7578425-7578425

Note that first four lines don't give details about the actual insertions and deletions. Samtools pilepup actually gives this information and it would be good to have them (optionally) here too. For example here:

  • The 7 insertions on the + strand consist of 6 insertions of a G and 1 insertion of TCA.
  • The 14 insertions on the - strand consist of 11 deletions of a G, 2 of a A and one of a C.
  • The 61 deletions on the + strand consist of 41 insertions of TGT, 18 of a T and 2 of TG.
  • The 2 deletions on the - strand consist of 2 deletions of a T.

So one possible way would be to output a data frame like this containing the full data:

##    seqnames     pos strand nucleotide count           which_label
## 1     chr17 7578425      +         +G     6 chr17:7578425-7578425
## 2     chr17 7578425      +       +TCA     1 chr17:7578425-7578425
## 3     chr17 7578425      -         +G    11 chr17:7578425-7578425
## 4     chr17 7578425      -         +A     2 chr17:7578425-7578425
## 5     chr17 7578425      -         +C     1 chr17:7578425-7578425
## 6     chr17 7578425      +       -TGT    41 chr17:7578425-7578425
## 7     chr17 7578425      +         -T    18 chr17:7578425-7578425
## 8     chr17 7578425      +        -TG     2 chr17:7578425-7578425
## 9     chr17 7578425      -         -T     2 chr17:7578425-7578425
## 51    chr17 7578425      +          A    11 chr17:7578425-7578425
## 61    chr17 7578425      -          A     5 chr17:7578425-7578425
## 71    chr17 7578425      +          C    21 chr17:7578425-7578425
## 81    chr17 7578425      -          C    23 chr17:7578425-7578425
## 91    chr17 7578425      +          G     7 chr17:7578425-7578425
## 10    chr17 7578425      -          G     5 chr17:7578425-7578425
## 11    chr17 7578425      +          T  4866 chr17:7578425-7578425
## 12    chr17 7578425      -          T  6273 chr17:7578425-7578425

Note that the last 8 lines are just the same as above, only the indels lines now show more details.

About

Show how the Rsamtools pileup function could be improve to include details about indels

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published