Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behavior when searching for tokens #143

Closed
ximion opened this issue Aug 14, 2023 · 4 comments
Closed

Strange behavior when searching for tokens #143

ximion opened this issue Aug 14, 2023 · 4 comments

Comments

@ximion
Copy link
Collaborator

ximion commented Aug 14, 2023

Hi!

I was debugging some strange behavior in AppStream which may actually be an issue with libxmlb (or at the very least I need some guidance on what the correct approach is here).

AppStream performs its own stemming, so a term like strategy is stemmed to strategi (making it no longer a fulltext search match). AppStream adds the stemmed terms to a node using xb_builder_node_add_token ();.
I have verified that the tokens are actually added, they show up in xb-tool dump output in the correct places.

Now, I first perform a query to get all component nodes in the XML, like this: cpt_nodes = xb_silo_query (csec->silo, "components/component", 0, &tmp_error);

A new query is constructed like this:

query = xb_query_new (csec->silo,
			queries[j].xpath,
			&error_query);

With XPath summary[text()~=?]

Then I iterate over all component nodes cpt_node with search term term (a single, stemmed word):

g_auto(XbQueryContext) context = XB_QUERY_CONTEXT_INIT ();

xb_value_bindings_bind_str (xb_query_context_get_bindings (&context),
			        0,
			        term,
			        NULL);
n = xb_node_query_with_context (cpt_node, query, &context, NULL);

So, the stemmed strategi does not show up in the full text of summary, "A strategy game", it has however been added to the node as token before. So I would expect this code to find the node and yield a result. This is not what happens though.

In XbSilo's xb_silo_machine_func_search_cb, only op2 has XB_OPCODE_FLAG_TOKENIZED set, so we do not perform the fast TOKEN/TOKEN search: https://github.com/hughsie/libxmlb/blob/main/src/xb-silo.c#L1520

Instead, the code always falls back to the slower full-text search below, which in this case does not yield any result.

Why does op1 (containing the "strategi" string) not have that one as token, and why does this inevitably always fall back to the slower search path? Am I supposed to bind the query value differently? Dirty hacks like just adding the string as a token as well to the opcode do not seem to work, so I think I am missing something here / do not understand how tokens relate to text search in libxmlb in concept.

I'm glad for any help :-)

@hughsie
Copy link
Owner

hughsie commented Aug 14, 2023

xb_silo_machine_fixup_attr_search_token_cb() should be tokenizing the op1, no? If you could do a small extra [failing] test in ./src/xb-self-test.c I can take a look tomorrow.

@ximion
Copy link
Collaborator Author

ximion commented Aug 14, 2023

xb_silo_machine_fixup_attr_search_token_cb() should be tokenizing the op1, no?

It does not, I checked that early... I'll try to figure out why, and see if I can create a small testcase today (if not today, then I'll try tomorrow). Thank you!

ximion added a commit to ximion/libxmlb that referenced this issue Aug 14, 2023
@ximion
Copy link
Collaborator Author

ximion commented Aug 14, 2023

Okay, I couldn't wait, here's the test case. I noticed that this works just fine when using xb_silo_query_first, so something must be wrong in the way that I manually construct the query, or first fetching all component nodes is wrong...

ximion added a commit to ximion/libxmlb that referenced this issue Aug 15, 2023
hughsie added a commit that referenced this issue Aug 15, 2023
hughsie added a commit that referenced this issue Aug 16, 2023
@ximion
Copy link
Collaborator Author

ximion commented Aug 16, 2023

Neat! I'll see if I can add a workaround to AppStream to work with older libxmlb versions as well :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants