Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samtools rmdup removes single read from pair #497

Closed
mfz opened this issue Dec 3, 2015 · 3 comments
Closed

samtools rmdup removes single read from pair #497

mfz opened this issue Dec 3, 2015 · 3 comments
Assignees

Comments

@mfz
Copy link

mfz commented Dec 3, 2015

samtools rmdup removes single read from a pair when 2 pairs with same position for all 4 reads are present

i.e. input file

WPHISEQ06:99:C6L3PANXX:5:1101:1124:56798_1:N:0: 163 chr2 175312986 42 34M = 175312986 -34
WPHISEQ06:99:C6L3PANXX:5:1101:13415:91660_1:N:0: 163 chr2 175312986 23 34M = 175312986 -34
WPHISEQ06:99:C6L3PANXX:5:1101:1124:56798_1:N:0: 83 chr2 175312986 42 34M = 175312986 34
WPHISEQ06:99:C6L3PANXX:5:1101:13415:91660_1:N:0: 83 chr2 175312986 23 34M = 175312986 34

output after samtools rmdup

WPHISEQ06:99:C6L3PANXX:5:1101:1124:56798_1:N:0: 163 chr2 175312986 42 34M = 175312986 -34
WPHISEQ06:99:C6L3PANXX:5:1101:13415:91660_1:N:0: 163 chr2 175312986 23 34M = 175312986 -34
WPHISEQ06:99:C6L3PANXX:5:1101:13415:91660_1:N:0: 83 chr2 175312986 23 34M = 175312986 34

samtools git version 1.2-242-g4d56437
htslib git version 1.2.1-256-ga356746

Example file to reproduce error
test.sam.txt

@jmarshall jmarshall added this to the 1.4 milestone Dec 7, 2015
@kirkmcclure
Copy link

The rmdup logic assumes that the QNAME fields are identical for duplicates.
After manually changing QNAME for the second pair in the example -

./samtools view -b /tmp/Issue497.sam |./samtools sort -n -o - - |./samtools rmdup - - |./samtools view -h - |tail
[bam_rmdup_core] processing reference chr2...
[bam_rmdup_core] 1 / 2 = 0.5000 in library '    '
@SQ SN:chrUn_KI270757v1 LN:71251
@SQ SN:chrUn_GL000214v1 LN:137718
@SQ SN:chrUn_KI270742v1 LN:186739
@SQ SN:chrUn_GL000216v2 LN:176608
@SQ SN:chrUn_GL000218v1 LN:161147
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@SQ SN:chrY_KI270740v1_random   LN:37240
WPHISEQ06:99:C6L3PANXX:5:1101:1124:56798_1:N:0: 163 chr2    1753129 86  23  34M =   175312986   -34 ATACAAAAATTTACCGCTTTACTAATAATCCACT  ;?B1DFEGGFGGG11?/<=<FGGGGGGGGGGGGG
WPHISEQ06:99:C6L3PANXX:5:1101:1124:56798_1:N:0: 83  chr2    1753129 86  23  34M =   175312986   34  ATACAAAAATTTACCGCTTGACTAATAATCCACT  GGGGB:1GGF=E=;/1GF=1BEF;1>11GB@A3B

@jrandall
Copy link
Contributor

Confirmed that the provided test file still exhibits this issue with current samtools/htslib ("1.3-5-g664cc5f (using htslib 1.3-5-gdf4a80e)").

@mcshane mcshane removed this from the 1.4 milestone Feb 6, 2017
@whitwham
Copy link
Contributor

As rmdup is deprecated and nobody has touched this for two years I am going to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants