Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Improve search documentation #1989

Merged
merged 4 commits into from
Jul 22, 2020
Merged

Conversation

joergi-w
Copy link
Member

I fixed some inconsistencies and missing documentation in the search module.

Closes seqan/product_backlog#30.

Copy link
Contributor

@MitraDarja MitraDarja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question.

Comment on lines +78 to +79
* The k-mer can be either an exact string of length k or it can contain one or more wildcards,
* which denote positions of arbitrary characters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to give an example of a k-mer with wildcards? Or refer to the documentation where the use of wildcards is explained?

@codecov
Copy link

codecov bot commented Jul 21, 2020

Codecov Report

Merging #1989 into master will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1989   +/-   ##
=======================================
  Coverage   97.88%   97.88%           
=======================================
  Files         261      261           
  Lines        9825     9838   +13     
=======================================
+ Hits         9617     9630   +13     
  Misses        208      208           
Impacted Files Coverage Δ
...an3/search/configuration/default_configuration.hpp 100.00% <ø> (ø)
include/seqan3/search/kmer_index/shape.hpp 100.00% <ø> (ø)
include/seqan3/range/views/minimiser_hash.hpp 100.00% <0.00%> (ø)
include/seqan3/core/algorithm/configuration.hpp 100.00% <0.00%> (ø)
include/seqan3/range/views/minimiser.hpp 99.03% <0.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30e8966...1c425de. Read the comment docs.

Copy link
Contributor

@MitraDarja MitraDarja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition! :)

@joergi-w
Copy link
Member Author

Thanks, that was fast ;)

@joergi-w joergi-w requested review from eseiler and removed request for smehringer July 21, 2020 09:50
Copy link
Member

@eseiler eseiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but we do not offer a k-mer index implementation

Comment on lines +23 to +27
* SeqAn currently implements two kinds of indices: FM indices and k-mer indices (also known as q-gram indices).
* Generally speaking, k-mer indices support very fast searching of exact k-mers (strings of length k) or k-mers with
* predefined wildcard positions, which match with any character.
* FM indices on the other hand are more versatile and work
* with arbitrary pattern lengths and error numbers or positions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have a k-mer index implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's why I couldn't find it! I attempted to write a snippet for it 😅

Comment on lines 43 to 44
* In the future we plan to improve the optimum search schemes, such that they handle unidirectional indices and
* higher error counts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* In the future we plan to improve the optimum search schemes, such that they handle unidirectional indices and
* higher error counts.
* In the future, we plan to improve the optimum search schemes to handle higher error counts.

Unless I'm mistaken, with unidirectional indices you can only do backtracking, so there is no point for the search schemes.

Comment on lines 75 to 82
* # K-mer Indices
*
* A k-mer index can be used to efficiently retrieve all occurrences of a certain k-mer in the text.
* The k-mer can be either an exact string of length k or it can contain one or more wildcards,
* which denote positions of arbitrary characters.
*
* An exact k-mer is represented as seqan3::ungapped, and wildcards can be defined with seqan3::shape.
* Please check the respective documentation for details and examples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we \if the k-mer index stuff, or clearly state that they are not here yet.
For the in-general k-mer description I would add a link to the kmer_hash_view / minimiser_view (maybe via \sa)

Comment on lines 14 to 24
* \brief Implementation of a k-mer Index.
*
* \details
*
* A k-mer index is a data structure that stores all occurrences of k-mers in a text.
* A k-mer can be either an exact string of length k or it can contain one or more wildcards,
* which denote positions of arbitrary characters.
*
* The index is very fast for retrieving all occurrences of a k-mer in the underlying text.
* Usually the query length (k) is small and the underlying text is very large.
* The parameter k and the position(s) of wildcards must be fixed at index creation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing here is currently the shape :)

@joergi-w
Copy link
Member Author

I removed the part in the introduction. Below in the index section I marked it as not implemented, because users may be interested in what's coming. In the k-mer index landing page I phrased it "shapes for a k-mer index".

@joergi-w joergi-w requested a review from eseiler July 22, 2020 08:37
* with arbitrary pattern lengths and error numbers or positions.
* \else
* SeqAn currently implements only the FM index and a k-mer index is planned. The FM index works
* with arbitrary pattern lengths and error numbers or positions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* with arbitrary pattern lengths and error numbers or positions.
* with arbitrary pattern lengths and error numbers.

The positions probably refers to "you can have an error anywhere in the query" ?
But I think it's rather confusing without additional explanation and would just remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was my intention...

* predefined wildcard positions that do not have to match. FM indices on the other hand are more versatile and work
* with arbitrary pattern lengths and error numbers / positions.
* \todo k-mer indices are coming soon. Stay tuned!
* Add a description of the DREAM index here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Add a description of the DREAM index here.
* \todo Add a description of the DREAM index here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a continuation of the line above and is currently displayed in the same todo box.
Probably you intend to make a new todo item, so I change it :)

* Latter are currently only available for searches with up to and including three errors using bidirectional indices.
* The optimum search schemes will be improved in the future to handle unidirectional indices and higher error counts.
* The latter are currently only available for searches with up to three errors using bidirectional indices.
* In the future we plan to improve the optimum search schemes, to handle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* In the future we plan to improve the optimum search schemes, to handle
* In the future we plan to improve the optimum search schemes to handle

@joergi-w joergi-w requested a review from eseiler July 22, 2020 14:58
Copy link
Member

@eseiler eseiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

(travis is green but did not report back)

@eseiler eseiler merged commit 87cecde into seqan:master Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the documentation for the search module
3 participants