Question about alias map #12

hitercs · 2020-12-16T07:00:39Z

Hi,

Thanks for your great work! Super cool.

I have one question about the alias map generation.
As I have seen in the alias2qids_kore50.json and alias2qids_rss500.json files, the same alias in different examples may have different candidate sets.
e.g. in alias2qids_kore50.json, the candidate list of "david_0" and "david_1" are not identical.

In details:
In "david_1" but not in "david_0":
{'Q5240660', 'Q5236763', 'Q5240530', 'Q16079082', 'Q27827705', 'Q17318723', 'Q18632066', 'Q5238957', 'Q768479', 'Q3017915', 'Q5239424', 'Q20684456', 'Q5234065', 'Q5234667', 'Q5230766', 'Q10264386', 'Q672856', 'Q1174097', 'Q5240118', 'Q583264'}

In "david_0" but not in "david_1":
{'Q312649', 'Q1173922', 'Q19668637', 'Q5236091', 'Q178517', 'Q2420499', 'Q353983', 'Q24248231', 'Q5239917', 'Q336640', 'Q5241350', 'Q184903', 'Q338628', 'Q2071', 'Q1175688', 'Q41564', 'Q1173934', 'Q214601', 'Q5236705', 'Q1177021'}

So I am wondering how was the candidates are generated? are they context dependent?

by the way, could you please explain the score (e.g. ["Q8016", 5947]) associated with each candidate entity? What does it mean and how does it is calculated?

Thanks a lot.

lorr1 · 2020-12-17T20:05:43Z

Hello! Great question.

So we do have a contextual candidate generator that we used for kore50 and rss500. This takes into account contextual similarities between an entity's Wikipedia page and the sentence itself. So because the sentences are different, the lists are different.

The score for a candidate is based on a few features that we used for this contextual generation: the similarity between the mention, the overall entity popularity, and the similarity between the sentence and the entity's Wikipedia page. We only use this score for filtering the lists.

hitercs · 2020-12-18T01:32:45Z

@lorr1 Thanks for answer! How about the the score in data/wiki_entity_data/entity_mappings/alias2qids.json file? Is it generated by averaging the scores of the same alias-entity pairs in Wikipedia anchor texts using the same contextual candidate generator?
Thanks.

lorr1 · 2020-12-19T07:15:05Z

So that one is used just for training so is not contextual. We could certainly make it that way (and are exploring these ideas!) but didn't use a contextual one for training. That score based on an overall entity's occurrence in Wikipedia. So it's a ranking based on entity popularity. Note that this is not conditioned on a specific alias - it's just overall entity popularity. We found that this was necessary when incorporating aliases from Wikidata that may never have been seen in Wikipedia yet still be valid aliases.

hitercs · 2020-12-19T07:47:20Z

Great. Got it. Well understood. Thanks!

Add label source

lorr1 added the question Further information is requested label Dec 17, 2020

lorr1 self-assigned this Dec 17, 2020

hitercs closed this as completed Dec 19, 2020

lorr1 added a commit that referenced this issue Mar 25, 2022

Merge pull request #12 from HazyResearch/temp_labelsource

ade04d8

Add label source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about alias map #12

Question about alias map #12

hitercs commented Dec 16, 2020

lorr1 commented Dec 17, 2020

hitercs commented Dec 18, 2020 •

edited

Loading

lorr1 commented Dec 19, 2020

hitercs commented Dec 19, 2020

Question about alias map #12

Question about alias map #12

Comments

hitercs commented Dec 16, 2020

lorr1 commented Dec 17, 2020

hitercs commented Dec 18, 2020 • edited Loading

lorr1 commented Dec 19, 2020

hitercs commented Dec 19, 2020

hitercs commented Dec 18, 2020 •

edited

Loading