Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#306 - Unable to link "Indonesia" #358

Merged
merged 24 commits into from
Aug 23, 2018

Conversation

rcffc
Copy link
Contributor

@rcffc rcffc commented Jul 1, 2018

Split query for candidate concept retrieval in two parts - one that matches the label exactly and one using full-text-search.
Apply limit only for the full-text-search part.
Cannot test it out at the moment, since the server is still down.

Split query for candidate concept retrieval in two parts - one that matches the label exactly and one using full-text-search.
Apply limit only for the full-text-search part.
@rcffc rcffc added 🐛Bug Something isn't working WIP Module: Entity Linking labels Jul 1, 2018
@rcffc rcffc added this to the 0.4.0 milestone Jul 1, 2018
@rcffc rcffc self-assigned this Jul 1, 2018
@rcffc rcffc requested review from jcklie and reckart July 1, 2018 19:39
@ukp-svc-jenkins
Copy link

28% (0.0%) vs master 28%

@ukp-svc-jenkins
Copy link

28% (0.0%) vs master 28%

@reckart reckart changed the base branch from master to 0.4.x July 5, 2018 15:22
@ukp-svc-jenkins
Copy link

28% (0.0%) vs master 28%

@reckart reckart added the Needs attention Needs attention by the assignees to update, or resolve. Use if an issue/PR becomes stale. label Jul 5, 2018
@reckart reckart modified the milestones: 0.4.0, 0.4.1 Jul 5, 2018
Fixed a syntax error
@ukp-svc-jenkins
Copy link

28% (-0.06%) vs master 28%

@reckart
Copy link
Member

reckart commented Jul 6, 2018

Still WIP?

@rcffc
Copy link
Contributor Author

rcffc commented Jul 6, 2018

Yes, can you try to comment out MaxQueryCostEstimationTime in Virtuoso's .ini file please? #306?

@reckart
Copy link
Member

reckart commented Jul 6, 2018

@rcffc the max seems to be set to 400 seconds - that seems the pretty generous?!

@reckart
Copy link
Member

reckart commented Jul 6, 2018

Ok, commented out and server is re-starting. This may take a few hours.

@ukp-svc-jenkins
Copy link

28% (0.0%) vs master 28%

@reckart reckart removed the Needs attention Needs attention by the assignees to update, or resolve. Use if an issue/PR becomes stale. label Jul 6, 2018
@reckart
Copy link
Member

reckart commented Jul 6, 2018

Since the PR is still WIP, I expect you don't need a review immediately. Please request one when you're done. I might still comment on the code from time to time. If something needs to be discussed, just add a comment.

@reckart reckart removed request for jcklie and reckart July 6, 2018 22:23
- pass both unmodified type string and processed mention to the candidate retrieval query builder
- try matching both surface form and type string directly 
- first rank by edit distance to typed string, then by signature overlap score to make it easier to link instances with many candidates by typing
- remove occurrences of GRAPH, it is unclear what it's benefits are
- TODO reactivate Caching
@betoboullosa betoboullosa added this to the 0.5.1 milestone Aug 13, 2018
@ukp-svc-jenkins
Copy link

28% (-11.92%) vs master 40%

{
aString = RenderUtils.escape(aString).toLowerCase(Locale.ENGLISH);

String fullTextMatchingString = getFullTextMatchingQueryPart("string", aLimit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, why is this a parameter at all?

" {",
" VALUES ?labelpredicate {rdfs:label skos:altLabel}",
" {",
" ?e2 ?labelpredicate ?" + aString + " @en .",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why not hard-code the variable name in the same way that e.g. ?labelpredicate is hardcoded?

" {",
" VALUES ?labelpredicate {rdfs:label skos:altLabel}",
" {",
" ?e2 ?labelpredicate ?" + aString + " @en .",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't notice the ? that introduces the variable name here. So escaping here doesn't make sense in deed. But I still wonder why the variable name is not simply hard-coded.

@reckart
Copy link
Member

reckart commented Aug 14, 2018

@rcffc seems to work reasonably. I believe that it would be nice if the "exact match" could be case-insensitive. E.g. one of the texts I have the Wired magazine is written as WIRED but that is not found in the ontology - I have to search for Wired instead. Furthermore, if I search for Wired magazine, then I don't find it at all even though the description is monthly American magazine.

So I think the PR makes improvements, but it feels like we're not there yet. I would suggest to merge it and then to open new issues for case-insensitive search and also for unit tests that check if specific concepts are found when certain search strings are being used.

@reckart
Copy link
Member

reckart commented Aug 14, 2018

@rcffc btw, is the caching still disabled? Somehow doing entity linking feels very slow.

@reckart
Copy link
Member

reckart commented Aug 22, 2018

@rcffc @rentier is this only slow for me?

@rcffc
Copy link
Contributor Author

rcffc commented Aug 22, 2018

For me it does not even work anymore, the page lock times out, same as in #475. (0.5.0, commit b48f649 and 0.4.3 commit 9ac30ad)

@rcffc
Copy link
Contributor Author

rcffc commented Aug 22, 2018

Caching has been reactivated.

@reckart
Copy link
Member

reckart commented Aug 22, 2018

@rcffc are you sure that you can reach the KB server?

@rcffc
Copy link
Contributor Author

rcffc commented Aug 22, 2018

I believe that it would be nice if the "exact match" could be case-insensitive.

I read that it would be expensive because we would need to use FILTER, comparing over all concepts in a knowledge base. (https://www.cray.com/blog/dont-use-hammer-screw-nail-alternatives-regex-sparql/)

@rcffc
Copy link
Contributor Author

rcffc commented Aug 22, 2018

@rcffc are you sure that you can reach the KB server?

Yes.

@reckart
Copy link
Member

reckart commented Aug 22, 2018

@rcffc Maybe case sensitivity is something that needs to be configured on the KB server (e.g. Virtuoso) such that the data is simply all indexed internally in lower case?

@reckart
Copy link
Member

reckart commented Aug 22, 2018

@rcffc Does entity linking work for you on master?

@rcffc
Copy link
Contributor Author

rcffc commented Aug 22, 2018

@rcffc Does entity linking work for you on master?

No.

@reckart
Copy link
Member

reckart commented Aug 22, 2018

@rcffc My trusty entity linking project on blinky experimental seems to work (build b48f649). Can you test on blinky?

@reckart reckart modified the milestones: 0.5.1, 0.5.0 Aug 22, 2018
@rcffc
Copy link
Contributor Author

rcffc commented Aug 23, 2018

Do you mean REC Entity Linking Playground? Yes it works, but it does not use UKP_WIKIDATA. If I use my own project with UKP_WIKIDATA on blinky it does not load the annotations:
peek 2018-08-23 13-49

@reckart
Copy link
Member

reckart commented Aug 23, 2018

@rcffc ok, I see the problem. I have opened a separate issue for it: #486

Since it doesn't seem to be related to the present PR, I'm merging it now.

@reckart reckart merged commit b7f27dd into 0.4.x Aug 23, 2018
@reckart reckart deleted the bugifx/306-combine-exact-and-intelligent-matching branch August 23, 2018 13:32
@reckart reckart restored the bugifx/306-combine-exact-and-intelligent-matching branch August 23, 2018 13:32
@reckart reckart deleted the bugifx/306-combine-exact-and-intelligent-matching branch August 23, 2018 13:35
@reckart reckart modified the milestones: 0.5.0, 0.4.3 Aug 23, 2018
@rcffc rcffc removed the Needs attention Needs attention by the assignees to update, or resolve. Use if an issue/PR becomes stale. label Aug 26, 2018
@reckart reckart modified the milestones: 0.4.3, 0.5.0 Aug 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛Bug Something isn't working Module: Entity Linking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants