-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to interpret the color dump file? #28
Comments
Hi Jarno,
Right! That was intentional, but better to keep style consistent. I'll add it. Thanks! |
Would be nice to have, not urgent though! It might benefit others also for interoperability between tools. |
Sure!
So, one line for each piece of information; things are one-single-space separated. Please, provide feedback. Q. Is it better to call Note that unitigs will be output sorted by color list in this format, as this is the way they are stored in Fulgor. |
That looks good to me!
In my terminology, that would be num_colors. But either of those two are fine. For reference, in Themisto I have a colored unitig dump command that produces two files:
Those ids are actually colexicographic ranks of some k-mer in the unitig, if I remember correctly.
The unitigs are listed in the same order as they come in the fasta file, so the unitig id here is actually a bit redundant. I don't write the lengths of the lists but that is a minor detail. Anyway, my format should be quite easily comparable to yours. |
I see. Your formats makes sense, although I'd prefer to keep everything in one file. |
(I'm reporting here part of our conversation of X, just to keep track of it.)
|
Ok, done as of 9d5901e. Can you check it? For the small example with the 10 salmonella files shipped with the repo, we can build the three files above as follows.
|
@jnalanko, have you had a chance to try it? |
Still no! I'll try to verify it against my Themisto index this weekend. |
Update: running my verifier now on a big dataset. I could not compare dumps directly because Themisto outputs both strands, whereas Fulgor only canonical, and also cyclic unitigs are tricky. It's not a very optimized verifier so it might take a day to run. |
But what are you trying to verify? Recall that GGCAT does not necessarily output maximal unitigs, so there might be discrepancies in the unitigs. Kmers and their color sets must instead always be the same. |
I'm verifying the color set of each k-mer, which should be the same in both tools, or otherwise there is a bug somewhere. |
Oh I see. When building the indexes with |
Finished! Matches 100% with Themisto color sets! Love to see it. |
Good! Weren't you sure? :) |
99% sure :). I've been hit with some really rare and subtle bugs before. |
We've all been there, but great to know! |
Alright! Closing this now. |
Hello team Fulgor!
I'm trying to dump the color sets out of a Fulgor index. The dump looks like this:
This seems to list all the distinct color sets, but this is missing the information about which color set corresponds to which unitig. Is it possible to somehow easily extract that information from the index? This would let me verify my Themisto indexes against Fulgor, and vice versa.
By the way, this dump is missing a newline at the end of the file :).
The text was updated successfully, but these errors were encountered: