-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC] Improve search documentation #1989
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one question.
* The k-mer can be either an exact string of length k or it can contain one or more wildcards, | ||
* which denote positions of arbitrary characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to give an example of a k-mer with wildcards? Or refer to the documentation where the use of wildcards is explained?
Codecov Report
@@ Coverage Diff @@
## master #1989 +/- ##
=======================================
Coverage 97.88% 97.88%
=======================================
Files 261 261
Lines 9825 9838 +13
=======================================
+ Hits 9617 9630 +13
Misses 208 208
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition! :)
Thanks, that was fast ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but we do not offer a k-mer index implementation
* SeqAn currently implements two kinds of indices: FM indices and k-mer indices (also known as q-gram indices). | ||
* Generally speaking, k-mer indices support very fast searching of exact k-mers (strings of length k) or k-mers with | ||
* predefined wildcard positions, which match with any character. | ||
* FM indices on the other hand are more versatile and work | ||
* with arbitrary pattern lengths and error numbers or positions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have a k-mer index implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's why I couldn't find it! I attempted to write a snippet for it 😅
include/seqan3/search/all.hpp
Outdated
* In the future we plan to improve the optimum search schemes, such that they handle unidirectional indices and | ||
* higher error counts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* In the future we plan to improve the optimum search schemes, such that they handle unidirectional indices and | |
* higher error counts. | |
* In the future, we plan to improve the optimum search schemes to handle higher error counts. |
Unless I'm mistaken, with unidirectional indices you can only do backtracking, so there is no point for the search schemes.
include/seqan3/search/all.hpp
Outdated
* # K-mer Indices | ||
* | ||
* A k-mer index can be used to efficiently retrieve all occurrences of a certain k-mer in the text. | ||
* The k-mer can be either an exact string of length k or it can contain one or more wildcards, | ||
* which denote positions of arbitrary characters. | ||
* | ||
* An exact k-mer is represented as seqan3::ungapped, and wildcards can be defined with seqan3::shape. | ||
* Please check the respective documentation for details and examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either we \if
the k-mer index stuff, or clearly state that they are not here yet.
For the in-general k-mer description I would add a link to the kmer_hash_view / minimiser_view (maybe via \sa
)
* \brief Implementation of a k-mer Index. | ||
* | ||
* \details | ||
* | ||
* A k-mer index is a data structure that stores all occurrences of k-mers in a text. | ||
* A k-mer can be either an exact string of length k or it can contain one or more wildcards, | ||
* which denote positions of arbitrary characters. | ||
* | ||
* The index is very fast for retrieving all occurrences of a k-mer in the underlying text. | ||
* Usually the query length (k) is small and the underlying text is very large. | ||
* The parameter k and the position(s) of wildcards must be fixed at index creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing here is currently the shape
:)
I removed the part in the introduction. Below in the index section I marked it as not implemented, because users may be interested in what's coming. In the k-mer index landing page I phrased it "shapes for a k-mer index". |
include/seqan3/search/all.hpp
Outdated
* with arbitrary pattern lengths and error numbers or positions. | ||
* \else | ||
* SeqAn currently implements only the FM index and a k-mer index is planned. The FM index works | ||
* with arbitrary pattern lengths and error numbers or positions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* with arbitrary pattern lengths and error numbers or positions. | |
* with arbitrary pattern lengths and error numbers. |
The positions probably refers to "you can have an error anywhere in the query" ?
But I think it's rather confusing without additional explanation and would just remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was my intention...
include/seqan3/search/all.hpp
Outdated
* predefined wildcard positions that do not have to match. FM indices on the other hand are more versatile and work | ||
* with arbitrary pattern lengths and error numbers / positions. | ||
* \todo k-mer indices are coming soon. Stay tuned! | ||
* Add a description of the DREAM index here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Add a description of the DREAM index here. | |
* \todo Add a description of the DREAM index here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a continuation of the line above and is currently displayed in the same todo box.
Probably you intend to make a new todo item, so I change it :)
include/seqan3/search/all.hpp
Outdated
* Latter are currently only available for searches with up to and including three errors using bidirectional indices. | ||
* The optimum search schemes will be improved in the future to handle unidirectional indices and higher error counts. | ||
* The latter are currently only available for searches with up to three errors using bidirectional indices. | ||
* In the future we plan to improve the optimum search schemes, to handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* In the future we plan to improve the optimum search schemes, to handle | |
* In the future we plan to improve the optimum search schemes to handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
(travis is green but did not report back)
I fixed some inconsistencies and missing documentation in the search module.
Closes seqan/product_backlog#30.