Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mergeMSTs] Problems with mst and query #24

Open
eseiler opened this issue Jun 2, 2023 · 0 comments
Open

[mergeMSTs] Problems with mst and query #24

eseiler opened this issue Jun 2, 2023 · 0 comments

Comments

@eseiler
Copy link

eseiler commented Jun 2, 2023

Hey there,

While using the mergeMSTs branch, I ran into some trouble with mst and query.

mst

mantis mst doesn't seem to work.

It wants to load eqclass_rr.cls files:

mantis/src/mst.cc

Lines 33 to 34 in 7406e8f

eqclass_files =
mantis::fs::GetFilesExt(prefix.c_str(), mantis::EQCLASS_FILE);

This will later lead to a segmentation fault because the files do not exist.

mantis build will always delete eqclass_rr.cls files at the end:

mantis/src/mst.cc

Lines 729 to 737 in 7406e8f

if (opt.remove_colorClasses && !opt.keep_colorclasses) {
for (auto &f : mantis::fs::GetFilesExt(opt.prefix.c_str(), mantis::EQCLASS_FILE)) {
std::cerr << f.c_str() << "\n";
if (std::remove(f.c_str()) != 0) {
std::cerr << "Unable to delete file " << f << "\n";
std::exit(1);
}
}
}

mantis build doesn't have an option to toggle this behavior.
Changing qopt.remove_colorClasses = true; to qopt.remove_colorClasses = false; here, fixes the issue:

qopt.prefix = bopt.out; qopt.numThreads = bopt.numthreads; qopt.remove_colorClasses = true;

query

The default non-bulk query only works if the eqclass_rr.cls files are present and -1 is used:

mantis query -1 -k 20 -p index/ reads.fasta

To have eqclass_rr.cls files, the above fix is needed, and mst must have been run with -k.

Alternatively, bulk-mode (-b) works without the eqclass_rr.cls files. So, mst can also be run with -d.

mantis query -b -k 20 -p index/ reads.fasta

The problem in non-bulk query seems to be that findSamples is called for every query sequence:

mantis/src/mstQuery.cc

Lines 492 to 498 in 7406e8f

while (ipfile >> read) {
mstQuery.reset();
mstQuery.parseKmers(numOfQueries, read, indexK);
mstQuery.findSamples(cdbg, cache_lru, &rs, queryStats, 1);
output_results(mstQuery, opfile, sampleNames, queryStats, 1);
numOfQueries++;
}

The function then accesses cdbg.get_current_cqf()->keybits():

uint64_t ksize{cdbg.get_current_cqf()->keybits()}, numBlocks{cdbg.get_numBlocks()};

This works fine for the first query, but for the second one there is no CQF to access because it has been replaced with
an invalid one:

cdbg.replaceCQFInMemory(invalid);

I tried loading the first block 0 at the begin of findSamples and just passing the keybits as an extra parameter.
But then there is an out-of-bounds access at

allQueries[q][numSamples]++;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant