Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Node access for fm_index_cursor #2076

Merged
merged 2 commits into from
Sep 22, 2020

Conversation

eseiler
Copy link
Member

@eseiler eseiler commented Aug 27, 2020

Resolves #2061

@eseiler eseiler requested review from a team and marehr and removed request for a team August 27, 2020 08:51
@codecov
Copy link

codecov bot commented Aug 27, 2020

Codecov Report

Merging #2076 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2076   +/-   ##
=======================================
  Coverage   97.95%   97.96%           
=======================================
  Files         264      264           
  Lines       10045    10050    +5     
=======================================
+ Hits         9840     9845    +5     
  Misses        205      205           
Impacted Files Coverage Δ
.../seqan3/search/fm_index/detail/fm_index_cursor.hpp 100.00% <ø> (ø)
include/seqan3/search/fm_index/fm_index_cursor.hpp 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 205674c...279273a. Read the comment docs.

Copy link
Member

@marehr marehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we wanted to expose the intervals; I'm not in favour for exposing the node. If we can, I would want to hide them. I'm marking this as WIP for now, because it is still not clear what the use case is.

@marehr marehr marked this pull request as draft August 27, 2020 12:56
@eseiler eseiler requested a review from marehr August 31, 2020 12:14
CHANGELOG.md Outdated Show resolved Hide resolved
@eseiler eseiler marked this pull request as ready for review August 31, 2020 12:17
Copy link
Member

@marehr marehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks already pretty good, just one question

@eseiler eseiler requested a review from marehr September 2, 2020 08:11
@marehr
Copy link
Member

marehr commented Sep 8, 2020

Core Meeting 2020-09-08

We want a suffix_array_interval() function that returns a seqan3::suffix_array_interval which is an aggregate struct and has the two members left_bound and right_bound (of size_t) and behaves like a std::pair.

Document if half-open or not.

// Using half-open interval
struct suffix_array_interval
{
   size_t begin_position;
   size_t end_position;
};

// Using closed interval
struct suffix_array_interval
{
   size_t left_bound;
   size_t right_bound;
};

Use closed (which seems to be the way the sdsl has implemented this) if it can handle empty texts (and other corner cases). If it can't this is bad and the sdsl must fix it :(

We want to give half-open intervals back.

Copy link
Member

@marehr marehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Waiting on implementing the half-open interval accessor.)

@eseiler
Copy link
Member Author

eseiler commented Sep 8, 2020

@marehr can you have a quick look at include/seqan3/search/fm_index/fm_index_cursor.hpp

If it is fine, I can start adjusting all the tests and documentation

@eseiler eseiler requested a review from marehr September 8, 2020 14:28
@marehr
Copy link
Member

marehr commented Sep 8, 2020

@marehr can you have a quick look at include/seqan3/search/fm_index/fm_index_cursor.hpp

If it is fine, I can start adjusting all the tests and documentation

looks fine :) Or do you think that something seems odd?

@eseiler eseiler removed the request for review from marehr September 8, 2020 16:47
@eseiler
Copy link
Member Author

eseiler commented Sep 8, 2020

@marehr I also added a conversion operator to tuple ... it is nice for the use case where you want to reuse variables (see snippet, structured binding works with aggregate type, but std::tie needs the conversion)

open for discussion

auto [left_bound, right_bound] = cur.suffix_array_interval();
/// .. do some stuff with cursor
std::tie(left_bound, right_bound) = cur.suffix_array_interval();

Copy link
Member

@marehr marehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome :)

Do we really need the tuple thingy?

Did you find any instances where the interval can be empty?

size_t end_position;

//!\brief Convert to tuple.
operator std::tuple<size_t &, size_t &>()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
operator std::tuple<size_t &, size_t &>()
operator std::tuple<size_t &, size_t &>() &

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, you can also do this:

struct foo
{
  int a;
  int b;
};

int main()
{
    foo bar{1, 3};

    auto & [a, b] = bar;

    b = 5;

    return bar.b; // has value 5
}

so we don't need a std::tie.

Copy link
Member Author

@eseiler eseiler Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what I want is

int main()
{
    foo bar{1, 3};

    auto & [a, b] = bar;

    b = 5;

    foo bar2{2, 4};
    std::tie(a, b) = bar2;

    return bar.b; // has value 5
}

I'm not sure if this is really, really needed, but when I wrote the snippets, I assumed that it will work just to find out that it does not. Hence, I added it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the &:

https://godbolt.org/z/4E6KcT

It doesn't work because .. ehm .. not so sure...
It would work if we the suffix_array_interval struct wasn't generated by another struct.

include/seqan3/search/fm_index/fm_index_cursor.hpp Outdated Show resolved Hide resolved
@eseiler
Copy link
Member Author

eseiler commented Sep 17, 2020

Did you find any instances where the interval can be empty?

No, the only thing that could happen is that you construct the cursor on an empty index.
Internally, closed intervals are used, so rb would underflow. But this was already a problem before...
I added an assert when constructing the cursor, such that it will assert if constructed on an empty index.

The most elegant solution would be to use half-open intervals in the FM-index, because then the cursor is always in a valid state (The cursor on an empty index would then use [0,0) instead of [0, size_t{-1}].)

size_t end_position;

//!\brief Convert to tuple.
operator std::tuple<size_t &, size_t &>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
operator std::tuple<size_t &, size_t &>()
operator std::tuple<size_t, size_t>() const noexcept

So this wouldn't work?

Because your version should require that this is an lvalue reference as well. Then the question remains, what is with the other versions const & should be also added. Move version are not required since we move/copy built-in types anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marehr marehr requested a review from rrahn September 18, 2020 12:14
Copy link
Contributor

@rrahn rrahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. Some minor change requrests.

@@ -32,6 +32,39 @@ namespace seqan3
* \{
*/

//!\brief The underlying suffix array intervals.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//!\brief The underlying suffix array intervals.
//!\brief The underlying suffix array interval.

@@ -44,6 +43,10 @@ TYPED_TEST_P(fm_index_cursor_collection_test, ctr)
// custom constructor
TypeParam it0{fm};
EXPECT_EQ(it0.query_length(), 0u);
if constexpr (!seqan3::bi_fm_index_specialisation<typename TypeParam::index_type>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this not be part of the test fixture?

@@ -47,6 +46,10 @@ TYPED_TEST_P(fm_index_cursor_test, ctr)
// custom constructor
TypeParam it0{fm};
EXPECT_EQ(it0.query_length(), 0u);
if constexpr (!seqan3::bi_fm_index_specialisation<typename TypeParam::index_type>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. So I would call:

if constexpr (!TestFixture::is_bi_fm_index)
{
...
}

Copy link
Contributor

@rrahn rrahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot 👍

@eseiler eseiler merged commit ac39faa into seqan:master Sep 22, 2020
@eseiler eseiler deleted the feature/node_access branch May 14, 2021 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FM index cursor enable node access
3 participants