Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

near-phrase: fix bug that phrase specified as last not used as last #1531

Merged
merged 4 commits into from
Mar 3, 2023

Conversation

HashidaTKS
Copy link
Contributor

@HashidaTKS HashidaTKS commented Mar 2, 2023

Fix a bug that a phrase specified as last is not used as last.
This bug occurs when a size of must_last token is smaller than the sizes of other tokens in the same phrase.
If the size of must_last token is max of all tokens in the same phrase but there are the same size tokens in it, this bug also occurs depending on environment. (See also: #1530 )

This change fixes the failure of the following tests.

https://github.com/groonga/groonga/actions/runs/4280133750/jobs/7451550455#step:25:294

This bug occers when a size of `must_last` token is smaller than
the sizes of other tokens in the same phrase.
@HashidaTKS
Copy link
Contributor Author

@kou

Would please you review this?

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a comment to grn_ii_select_data_find_phrase() that this returns the must last token info if exists and why?

It may not be good that the current btr::n_must_lasts approach but this may be a workaround with the current approach.

@@ -11582,13 +11582,17 @@ grn_ii_select_data_find_phrase(grn_ctx *ctx,
token_info **tip;
token_info **tis = data->token_infos;
token_info **tie = tis + data->n_token_infos;
token_info *must_last_ti = NULL;
uint32_t n_tokens_in_same_pos = 0;
for (tip = tis; ; tip++) {
if (tip == tie) { tip = tis; }
data->token_info = *tip;
if (data->token_info->phrase_id != phrase_id) {
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to reset here to avoid returning wrong phrase's token info?

Suggested change
continue;
must_last_ti = NULL;
continue;

Copy link
Contributor Author

@HashidaTKS HashidaTKS Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think.
must_last_ti is updated or assigned to data->token_info->phrase_id only when data->token_info->phrase_id == phrase_id, and phrase_id is never updated in this function.
So wrong phrase's token info is never assigned to data->token_info->phrase_id.

Also, if resetting must_last_ti at this line, this function don't work as expected in the case below:

phrase_id specified with argument: 1
token1: phrase_id = 1, must_last = true
token2: phrase_id = 2
token3: phrase_id = 1, must_last = false

Processing token must_last_ti
token1 token1
token2 NULL
token3 NULL

When processing token3, must_last_ti should be token1, but NULL.

But... we never get this sort order because the tokens of same phrase_id are continuous...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

@HashidaTKS
Copy link
Contributor Author

Could you also add a comment to grn_ii_select_data_find_phrase() that this returns the must last token info if exists and why?

Sure, I have added.

@kou kou merged commit c33b004 into master Mar 3, 2023
@kou kou deleted the fix-near-phrase-last-bug branch March 3, 2023 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants