-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Search] Use a buffer to cache the intermediate results between search incovations such that memory allocations are reduced on multiple invocations of search. #29
Comments
So for regular usage nothing changes, but users can optionally pass in an external buffers object via the config system. The internals of this type are not exposed and can therefore be changed at any time. |
Resolution 10.06.2020
|
Hm, that sounds quite complicated (for the user), but I don't know what you have discussed yet. Why do we need polymorphism? We know the type of the buffers. My proposal would look like this: seqan3::search_buffers buffers; // the user doesn't know what's inside, but we do
seqan3::configuration const default_cfg = seqan3::search_cfg::max_error_total{zero_errors} |
seqan3::search_cfg::max_error_substitution{zero_errors} |
seqan3::search_cfg::max_error_insertion{zero_errors} |
seqan3::search_cfg::max_error_deletion{zero_errors} |
seqan3::search_cfg::output_query_id |
seqan3::search_cfg::output_reference_id |
seqan3::search_cfg::output_reference_begin_position |
seqan3::search_cfg::hit_all |
seqan3::search_cfg::with_buffers{buffers}; // <--------
// after that, search normally with the config |
Note that this whole issue is not about how results are returned from the top-level search but how intermediate results are stored internally (cursors and results of the cursors' locate operation). |
We know. The main point is:
We (briefly) discussed how we could model this with a sound interface. (Just passing a std::vector is a bad interface, because it is to specific to the internals) I noticed that we already solved this problem for a different data structure (e.g. IBF), where we used a client-agent pattern to model this. And René noted that If you want and have time, we could talk about this in a web-call. 👍 |
Yep, that's why I propose do just create a named
Maybe we can do a Lambda call sometime soon and then have 30min nerd session on this topic afterwards ;) |
Well, that's the point. I am not completely into the details of the index cursors, but often we end up having different types depending on the configuration (and I am not sure what will come in the future, different indices, different cursors etc.). So while these types usually depend on the configuration, you need to allocate buffer for something you don't know the type yet and so you get into difficulties and hard to-use-interfaces with strange dependencies (get the correct buffer type for the configuration element, but then add this to the configuration again). So I have seen (but not pursued any further, i.e. it has to be investigated) that with the pmr memory resource you as a user can simply provide memory (chunks of std::byte). So that is great, because now we can use these buffers to manage and reuse the memory inside of the algorithms and do not carry to much about the types. Of course we can wrap a default implementation of this inside of a dedicated struct. But also the user can do more advanced stuff by being able to provide memory specifically allocated on high-bandwidth memory or the stack etc.. |
It's true that the buffers might depend on the configuration at some point. My initial response would be to just put all potentially usable buffers into While the approach you suggest is powerful, it sounds like it will add complexity to the simple use-case. And while some algorithms might profit from specialised storage, the things we are talking about for search are small, simple objects that need to be in main memory (AFAICT). An alternative with a non-generic buffer type could be exposing the buffers type as a member typedef of the config, but this is already suboptimal usability-wise from my POV: seqan3::configuration const cfg = seqan3::search_cfg::max_error_total{zero_errors} |
seqan3::search_cfg::max_error_substitution{zero_errors} |
seqan3::search_cfg::max_error_insertion{zero_errors} |
seqan3::search_cfg::max_error_deletion{zero_errors} |
seqan3::search_cfg::output_query_id |
seqan3::search_cfg::output_reference_id |
seqan3::search_cfg::output_reference_begin_position |
seqan3::search_cfg::hit_all;
decltype(cfg)::buffers_type buffers;
cfg.set_buffers(buffers);
// after that, search normally with the config edit: let's talk about this on a call sometime. |
2020-09-28 We briefly discussed this, and we think this is a spike and needs a prototype to be able to discuss this further. |
Description
A thread local buffer used for the search can have a impact on the performance by avoiding unncessary memory allocations which also have to be synchronised by the OS.
Also see here for possible improvements: seqan/seqan3#1528
Acceptance Criteria
Tasks
Definition of Done
The text was updated successfully, but these errors were encountered: