Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MISC] Refactor fm_index #1892

Merged
merged 5 commits into from
Jun 30, 2020
Merged

[MISC] Refactor fm_index #1892

merged 5 commits into from
Jun 30, 2020

Conversation

eseiler
Copy link
Member

@eseiler eseiler commented Jun 10, 2020

Resolves #1542

@eseiler eseiler force-pushed the fix/search_bug branch 4 times, most recently from 6f7409b to ace6907 Compare June 10, 2020 20:24
@codecov
Copy link

codecov bot commented Jun 11, 2020

Codecov Report

Merging #1892 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1892   +/-   ##
=======================================
  Coverage   97.76%   97.77%           
=======================================
  Files         258      258           
  Lines        9685     9687    +2     
=======================================
+ Hits         9469     9471    +2     
  Misses        216      216           
Impacted Files Coverage Δ
...ix/detail/two_dimensional_matrix_iterator_base.hpp 100.00% <100.00%> (ø)
...ude/seqan3/range/detail/random_access_iterator.hpp 100.00% <100.00%> (ø)
include/seqan3/search/fm_index/bi_fm_index.hpp 100.00% <100.00%> (ø)
...lude/seqan3/search/fm_index/bi_fm_index_cursor.hpp 100.00% <100.00%> (ø)
include/seqan3/search/fm_index/fm_index.hpp 100.00% <100.00%> (ø)
...clude/seqan3/range/views/enforce_random_access.hpp 100.00% <0.00%> (ø)
include/seqan3/range/views/kmer_hash.hpp 94.65% <0.00%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35e15cd...214e23f. Read the comment docs.

@eseiler eseiler force-pushed the fix/search_bug branch 2 times, most recently from 4fc615d to cd8abc5 Compare June 12, 2020 10:24
@marehr
Copy link
Member

marehr commented Jun 15, 2020

auto genome = std::vector<"CGTAGGTCGATGC"_dna4, "CAAAA"_dna4>;
seqan3::bi_fm_index index{genome};

auto it = index.cursor(); // bidirectional cursor
it.extend_right("CGT"_dna4); // search CGT
seqan3::debug_stream << it.locate() << '\n'; // (0,0)

auto rev_it = it.to_rev_cursor();
seqan3::debug_stream << rev_it.locate() << '\n'; // (?,?)

Possible Outputs:

  • (1,10) old behaviour
  • (0,10) sequence indices are not reversed
  • (0,0) sequence indices are not reversed and text positions are from right to left

Open Questions:

  • Is this even used anywhere? Do we need it for some search implementation?
  • Does this need to be public?
  • to_rev_cursor.locate() is extremely slow (because sample size is non-existing)
    • if locate is SO slow is it even worth to use this cursor?
    • would this alternative definition of reverse_cursor be better?
      • reverse_cursor(text) = forward_cursor(reverse(text))

2020-06-15 Resolution:

Ask some people, if they don't need it or see an immediate use-case, otherwise throw it from the API.

Make it private in the mean-time and add a new issue.

2020-06-24 Resolution:

  • We remove the to_rev_interface since it is in the current state not stable or clearly defined what the expected behaviour should be.

@eseiler eseiler force-pushed the fix/search_bug branch 2 times, most recently from 18f9539 to cbb0faa Compare June 24, 2020 17:47
@eseiler eseiler marked this pull request as ready for review June 24, 2020 17:48
@eseiler eseiler requested review from a team and joergi-w and removed request for a team June 24, 2020 17:48
@eseiler eseiler force-pushed the fix/search_bug branch 2 times, most recently from 17a08aa to d2e2a8e Compare June 24, 2020 20:06
CHANGELOG.md Outdated Show resolved Hide resolved
test/unit/search/search_collection_test.cpp Show resolved Hide resolved
include/seqan3/search/fm_index/fm_index.hpp Outdated Show resolved Hide resolved
include/seqan3/search/fm_index/fm_index.hpp Outdated Show resolved Hide resolved
include/seqan3/search/fm_index/fm_index.hpp Show resolved Hide resolved
include/seqan3/search/fm_index/fm_index.hpp Outdated Show resolved Hide resolved
include/seqan3/search/fm_index/fm_index.hpp Outdated Show resolved Hide resolved
include/seqan3/search/fm_index/fm_index.hpp Outdated Show resolved Hide resolved
@eseiler eseiler requested a review from joergi-w June 25, 2020 09:26
void construct(text_t && text)
{
detail::fm_index_validator::validate<alphabet_t, text_layout_mode_>(text);

fwd_fm = fm_index_type{text};
rev_fm = rev_fm_index_type{text};
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are spaces accidentally inserted 💅

include/seqan3/search/fm_index/fm_index.hpp Show resolved Hide resolved
Copy link
Member

@joergi-w joergi-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks good now! The most recent commit added some unintended spaces, which can be fixed together with the 2nd review.

@joergi-w joergi-w requested review from a team and smehringer and removed request for a team June 25, 2020 10:10
Copy link
Member

@smehringer smehringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly some questions

tmp_text.back() = delimiter;

std::ranges::reverse(tmp_text);
std::ranges::reverse(tmp_text);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we reversing the text here in the if (!reverse) branch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reverse is whether the original text was reversed (reverse_fm_index).
If it's a "normal" fm_index, you can just reverse the test; if the text comes from the reverse_fm_index you have to do special stuff if there is only one item in the collection

Comment on lines 629 to 623
}
public:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
public:
}
public:

@@ -274,51 +284,46 @@ class fm_index
//!\cond
requires (text_layout_mode_ == text_layout::collection)
//!\endcond
void construct(text_t && text)
void construct(text_t && text, bool reverse = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this construct has a boolean for reversing in the normal fm_index when you have a whole class that can handle all the reversing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just handles the reversing of the text; the alternative would be to copy the complete code of the construct method to the reverse_fm_index and add the tiny if I need...

@eseiler eseiler requested a review from smehringer June 30, 2020 09:17
@smehringer smehringer merged commit 85469f8 into seqan:master Jun 30, 2020
@eseiler eseiler deleted the fix/search_bug branch June 30, 2020 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Search on bi_fm_index finds too many results
4 participants