-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
near-phrase: fix bug that phrase specified as last not used as last #1531
Conversation
This bug occers when a size of `must_last` token is smaller than the sizes of other tokens in the same phrase.
Would please you review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a comment to grn_ii_select_data_find_phrase()
that this returns the must last token info if exists and why?
It may not be good that the current btr::n_must_lasts
approach but this may be a workaround with the current approach.
@@ -11582,13 +11582,17 @@ grn_ii_select_data_find_phrase(grn_ctx *ctx, | |||
token_info **tip; | |||
token_info **tis = data->token_infos; | |||
token_info **tie = tis + data->n_token_infos; | |||
token_info *must_last_ti = NULL; | |||
uint32_t n_tokens_in_same_pos = 0; | |||
for (tip = tis; ; tip++) { | |||
if (tip == tie) { tip = tis; } | |||
data->token_info = *tip; | |||
if (data->token_info->phrase_id != phrase_id) { | |||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to reset here to avoid returning wrong phrase's token info?
continue; | |
must_last_ti = NULL; | |
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think.
must_last_ti
is updated or assigned to data->token_info->phrase_id
only when data->token_info->phrase_id == phrase_id
, and phrase_id
is never updated in this function.
So wrong phrase's token info is never assigned to data->token_info->phrase_id
.
Also, if resetting must_last_ti
at this line, this function don't work as expected in the case below:
phrase_id
specified with argument: 1
token1
: phrase_id = 1, must_last = true
token2
: phrase_id = 2
token3
: phrase_id = 1, must_last = false
Processing token | must_last_ti |
---|---|
token1 |
token1 |
token2 |
NULL |
token3 |
NULL |
When processing token3
, must_last_ti
should be token1
, but NULL
.
But... we never get this sort order because the tokens of same phrase_id
are continuous...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
Sure, I have added. |
Fix a bug that a phrase specified as last is not used as last.
This bug occurs when a size of
must_last
token is smaller than the sizes of other tokens in the same phrase.If the size of
must_last
token is max of all tokens in the same phrase but there are the same size tokens in it, this bug also occurs depending on environment. (See also: #1530 )This change fixes the failure of the following tests.
https://github.com/groonga/groonga/actions/runs/4280133750/jobs/7451550455#step:25:294