-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auxiliary methods for taxonomic classification #343
Conversation
Signed-off-by: Radu Muntean <radu.muntean.2@gmail.com>
throw std::runtime_error( | ||
folly::to<string>("The current 'call_annotated_rows' call has returned, ", unique_matrix_rows.size(), | ||
"rows. The maximum number of rows that can be returned is ", | ||
std::numeric_limits<uint32_t>::max(), | ||
". Please reduce the query batch size")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use fmt::format
if (prefix.size() > str.size()) { | ||
return false; | ||
} | ||
return prefix == str.substr(0, static_cast<int>(prefix.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return prefix == str.substr(0, static_cast<int>(prefix.size())); | |
return prefix == std::string_view(str).substr(0, prefix.size()); |
void AnnotatedDBG::call_annotated_rows(const std::vector<node_index> &rows, | ||
std::function<void(const std::string&)> callback_cell, | ||
std::function<void()> callback_row) const { | ||
assert(check_compatibility()); | ||
|
||
auto unique_matrix_rows = annotator_->get_matrix().get_rows(rows); | ||
|
||
//TODO make sure that this function works even if we have duplications in 'rows'. Then, delete this error catch. | ||
if (rows.size() != unique_matrix_rows.size()) { | ||
throw std::runtime_error("The current 'call_annotated_rows' call contains duplication."); | ||
} | ||
|
||
if (unique_matrix_rows.size() >= std::numeric_limits<uint32_t>::max()) { | ||
throw std::runtime_error( | ||
folly::to<string>("The current 'call_annotated_rows' call has returned, ", unique_matrix_rows.size(), | ||
"rows. The maximum number of rows that can be returned is ", | ||
std::numeric_limits<uint32_t>::max(), | ||
". Please reduce the query batch size")); | ||
} | ||
const auto &label_encoder = annotator_->get_label_encoder(); | ||
for (auto row : unique_matrix_rows) { | ||
for (auto cell : row) { | ||
callback_cell(label_encoder.decode(cell)); | ||
} | ||
callback_row(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite see a reason to add this function. You have access to all members, so can implement this directly where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! answered in the other comment
". Please reduce the query batch size")); | ||
} | ||
const auto &label_encoder = annotator_->get_label_encoder(); | ||
for (auto row : unique_matrix_rows) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unnecessarily making a copy
for (auto row : unique_matrix_rows) { | |
for (const auto &row : unique_matrix_rows) { |
@@ -156,6 +156,10 @@ class AnnotatedDBG : public AnnotatedSequenceGraph { | |||
int32_t match_score = 1, | |||
int32_t mismatch_score = 2) const; | |||
|
|||
void call_annotated_rows(const std::vector<node_index> &rows, | |||
std::function<void(const std::string&)> callback_cell, | |||
std::function<void()> callback_row) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the second callback supposed to be used for? It's not clear...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to remove this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments! I need to have access to annotator_
field and I can't implement it directly where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is using 2 callback functions: one for doing a preprocessing per cell (i.e. get the taxid from label) and one for a final computation on an annotated row (i.e. get the LCA taxid). If you think that it would be more clear, I can reimplement it to use only one callback that receives the list of labels on one row (but this will use more memory -> store list of labels instead of list of integers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do you need to have a non-const reference to the annotator? If not, you can just call get_annotator().get_matrix()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I missed that get_annotator()
. Did the changes now, thanks for the feedback
78ad12f
to
b92912e
Compare
Add some small methods that will be further used for tax class
Signed-off-by: Radu Muntean radu.muntean.2@gmail.com