-
Notifications
You must be signed in to change notification settings - Fork 2
NTA query lacks support for special characters #46
Comments
An alternative solution is to strip diacritics from the query: searching for |
I suspect the current behavior might change over time as KB is still working on this issue but currently this is the case and we could fix it this way for the nta query. I you are thinking of stripping the diacritics before sending the query to the different sources then I am hesitant to do so because it might impact the results form other sources. In this case we should investigate the impact for the different sources. |
It indeed depends on the Virtuoso config. Virtuoso on NDE's data platform distinguishes between diacritics - "Sebastien" and "Sébastien" are distinct words (i.e. you cannot find the latter if you remove the diacritic). I have a solution in mind for fixing the query - I'll post it in this ticket. |
@EnnoMeijers and @ddeboer: A straightforward solution (for NTA, but also for other sources that use Virtuoso, such as RKDartists) would be to wrap search words inside (single or double) quotes. The approach looks like this: Example search phrase of a user:
Following this approach, a query for the NTA could look like this (for testing only):
Shortcut to Yasgui: https://api.triplydb.com/s/sSRMMT5bu This query is a stab at solving the problem - it's not perfect. For instance, how should we handle search phrases with boolean parenthesizations, such as The overarching question perhaps is if and to what extend we want to 'rewrite' or 'preprocess' search phrases of users. Rewriting could make the searches 'smarter': users don't have to know the search syntaxes of sources; the Network of Terms takes care of this. On the other hand, rewriting could have undesired side-effects, depending on the user's input. What do you think? |
I think this is an interesting approach for fixing a major part of the current problems experienced by the LM. It seems to be quite heavy in the processing resulting in a slow response of the KB sparqle endpoint, are we still in an acceptable range here? I tried expanding the query to include searching in schema:alternateName as well (using the property path '|') but that resulted in a mysterious error: "Virtuoso 37000 Error SP031: SPARQL compiler: The group does not contain triple pattern with '$name' object before bif:contains() predicate". At some point it probably will be inevitable to do preprocessing on the input but I think we should be cautious with this because it might introduce more complexity and less predictable behavior. I suggest we should do more exploration on the real need for support for boolean search syntax. My impression is that the current user expectation is the every day Google search experience. I think we should aim for a similar experience and stay away from introducing complexity for supporting boolean search operations. |
Thanks!
I haven't noticed a real loss in performance. Virtuoso should execute the conversion/pre-processing in the FILTER (e.g. with STR() and REPLACE()) just once - before doing the actual searching - so it should have minimal impact.
Yes, I'm experiencing the same issue - The query underneath uses VALUES for including other predicates, such as
Shortcut to Yasgui: https://api.triplydb.com/s/u04WhDLr3
That makes sense! |
Ok, looks good, let's go ahead with it! |
🎉 This issue has been resolved in version 5.5.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
The current nta.rq use the bif:contains operator for Virtuoso bus but lacks support for searching strings with special characters, like searching for 'Sébastien' in schema:givenName. Additional single quotes surrounding the searchTerm should be added.
Searching for this name in the demonstrator results in an empty set. Searching in the KB's endpoints results in matches for both 'Sébastien' and 'Sebastien', see query. Fixing nta.rq is not straightforward because of the current magic in the query for handling boolean operators in the search string.
The text was updated successfully, but these errors were encountered: