-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#306 - Unable to link "Indonesia" #358
#306 - Unable to link "Indonesia" #358
Conversation
Split query for candidate concept retrieval in two parts - one that matches the label exactly and one using full-text-search. Apply limit only for the full-text-search part.
Fixed a syntax error
Still WIP? |
Yes, can you try to comment out MaxQueryCostEstimationTime in Virtuoso's .ini file please? #306? |
@rcffc the max seems to be set to 400 seconds - that seems the pretty generous?! |
Ok, commented out and server is re-starting. This may take a few hours. |
Since the PR is still WIP, I expect you don't need a review immediately. Please request one when you're done. I might still comment on the code from time to time. If something needs to be discussed, just add a comment. |
- pass both unmodified type string and processed mention to the candidate retrieval query builder - try matching both surface form and type string directly - first rank by edit distance to typed string, then by signature overlap score to make it easier to link instances with many candidates by typing - remove occurrences of GRAPH, it is unclear what it's benefits are - TODO reactivate Caching
{ | ||
aString = RenderUtils.escape(aString).toLowerCase(Locale.ENGLISH); | ||
|
||
String fullTextMatchingString = getFullTextMatchingQueryPart("string", aLimit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, why is this a parameter at all?
" {", | ||
" VALUES ?labelpredicate {rdfs:label skos:altLabel}", | ||
" {", | ||
" ?e2 ?labelpredicate ?" + aString + " @en .", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why not hard-code the variable name in the same way that e.g. ?labelpredicate
is hardcoded?
" {", | ||
" VALUES ?labelpredicate {rdfs:label skos:altLabel}", | ||
" {", | ||
" ?e2 ?labelpredicate ?" + aString + " @en .", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't notice the ?
that introduces the variable name here. So escaping here doesn't make sense in deed. But I still wonder why the variable name is not simply hard-coded.
@rcffc seems to work reasonably. I believe that it would be nice if the "exact match" could be case-insensitive. E.g. one of the texts I have the Wired magazine is written as So I think the PR makes improvements, but it feels like we're not there yet. I would suggest to merge it and then to open new issues for case-insensitive search and also for unit tests that check if specific concepts are found when certain search strings are being used. |
@rcffc btw, is the caching still disabled? Somehow doing entity linking feels very slow. |
Caching has been reactivated. |
@rcffc are you sure that you can reach the KB server? |
I read that it would be expensive because we would need to use FILTER, comparing over all concepts in a knowledge base. (https://www.cray.com/blog/dont-use-hammer-screw-nail-alternatives-regex-sparql/) |
Yes. |
@rcffc Maybe case sensitivity is something that needs to be configured on the KB server (e.g. Virtuoso) such that the data is simply all indexed internally in lower case? |
@rcffc Does entity linking work for you on master? |
No. |
Split query for candidate concept retrieval in two parts - one that matches the label exactly and one using full-text-search.
Apply limit only for the full-text-search part.
Cannot test it out at the moment, since the server is still down.