New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange behavior when searching for tokens #143
Comments
xb_silo_machine_fixup_attr_search_token_cb() should be tokenizing the op1, no? If you could do a small extra [failing] test in ./src/xb-self-test.c I can take a look tomorrow. |
It does not, I checked that early... I'll try to figure out why, and see if I can create a small testcase today (if not today, then I'll try tomorrow). Thank you! |
Okay, I couldn't wait, here's the test case. I noticed that this works just fine when using |
Neat! I'll see if I can add a workaround to AppStream to work with older libxmlb versions as well :-) |
Hi!
I was debugging some strange behavior in AppStream which may actually be an issue with libxmlb (or at the very least I need some guidance on what the correct approach is here).
AppStream performs its own stemming, so a term like
strategy
is stemmed tostrategi
(making it no longer a fulltext search match). AppStream adds the stemmed terms to a node usingxb_builder_node_add_token ();
.I have verified that the tokens are actually added, they show up in
xb-tool dump
output in the correct places.Now, I first perform a query to get all component nodes in the XML, like this:
cpt_nodes = xb_silo_query (csec->silo, "components/component", 0, &tmp_error);
A new query is constructed like this:
With XPath
summary[text()~=?]
Then I iterate over all component nodes
cpt_node
with search termterm
(a single, stemmed word):So, the stemmed
strategi
does not show up in the full text ofsummary
,"A strategy game"
, it has however been added to the node as token before. So I would expect this code to find the node and yield a result. This is not what happens though.In XbSilo's
xb_silo_machine_func_search_cb
, onlyop2
hasXB_OPCODE_FLAG_TOKENIZED
set, so we do not perform the fast TOKEN/TOKEN search: https://github.com/hughsie/libxmlb/blob/main/src/xb-silo.c#L1520Instead, the code always falls back to the slower full-text search below, which in this case does not yield any result.
Why does
op1
(containing the "strategi" string) not have that one as token, and why does this inevitably always fall back to the slower search path? Am I supposed to bind the query value differently? Dirty hacks like just adding the string as a token as well to the opcode do not seem to work, so I think I am missing something here / do not understand how tokens relate to text search in libxmlb in concept.I'm glad for any help :-)
The text was updated successfully, but these errors were encountered: