Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected assertion for specific kmer #35

Closed
iqbal-lab opened this issue Aug 4, 2016 · 3 comments
Closed

Unexpected assertion for specific kmer #35

iqbal-lab opened this issue Aug 4, 2016 · 3 comments
Assignees

Comments

@iqbal-lab
Copy link
Collaborator

iqbal-lab commented Aug 4, 2016

Sorina spotted some samples were hitting/failing an assertion. Basically, if a read was mapping horizontally uniquely but not vertically uniquely, this assert was checking that at that time, nothing had been pushed onto the sites list, in the knowledge that it was going to be pushed after exiting the current function. Actually the problem can be reproduced using just the last kmer of the read, and putting it in the kmers input file.

So, what is the command?

gramtools -p Chr10_div_in_msp34_prg.in 
-c csa_file -i test.fastq -s Mask_sites -a Mask_alleles 
-v Coverages -r Reads_processed -b int_alphabet_file 
-l csa_memory -k 9 -f  onekmer

where

cat onekmer
TATGAAGAA

cat test.fastq
@0
AAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAGAA
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

OK, so what happens. First of all, if you look in the PRG file, you can see that the kmer overlaps site 79, and overlaps the end of many alleles, with AA poking out of the right hand end of the site.
So, when we precalculate the sa_interval and sites for this kmer, you should see all these alleles in the sites object. But in fact somehow it reduces it all down to just alleles 24 and 34.

I guess the thing to do is run it in gdb and look at where sa_intervals and sites are erased.

@iqbal-lab
Copy link
Collaborator Author

iqbal-lab commented Aug 4, 2016

These are the alleles at site 79 - you will need to scroll right to see the end, and to notice
I have marked alleles 24 and 34

AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCAT**TATGAAG**
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCCAAAAAGGATGAATGTAAAGGTGCATTAAAGCAT**TATGAAG**
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGAATGATTCACAACTATTAGAAATATCCAAAAAGGATGAATGTAAAGGTGCATTAAAGCAT**TATGAAG**
AGGATGTAATAAAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAACAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAGAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATGAATGTAAAGAAGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAA
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGGTGCATTAAAGAATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAAATTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGAATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAACAAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATGTCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAAGACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGAATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAATAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAA
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AAAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG 24
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCCAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATCAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGGTTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGATGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAATTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTGAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTTACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AAGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG 34
AGGATGTAATAAAACTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG
AGGATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGGAGCATTAAAGCATTATGAAG
AGGATGTAATAAAATTATGTAATATCCAACAATTTACAAACCAGGATGATTCACAACTATTAGAAATATCAAAAGAGGATAAATGTAAAGGTGCATTAAAGCATTATGAAG
AGAATGTAATAGAATTATGTAATGTAAGACAAATTACAAACCAGAATGGTTCACAACTATTAGAAATATCAAAAAAGGATAAATGTAAAGAAGCATTAAAGCATTATGAAG

The next two bases after this site are AA

So our kmer TATGAAGAA overlaps many (in fact all except two) of these alleles, running past the right hand 79 marker and then finishing on the AA adjacent. I've highlighted the first few.

@iqbal-lab
Copy link
Collaborator Author

I think this issue is v old andmaybe dead @sm0179 ?

@sm0179
Copy link
Collaborator

sm0179 commented Jun 8, 2017

yes

@sm0179 sm0179 closed this as completed Jun 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants