Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you have any plans to release the frequency of the entity? #9

Closed
xiahaoyun opened this issue Mar 23, 2022 · 4 comments
Closed

Do you have any plans to release the frequency of the entity? #9

xiahaoyun opened this issue Mar 23, 2022 · 4 comments

Comments

@xiahaoyun
Copy link

It is mentioned in the paper that “In our analysis, we use the Wikipedia hyperlink count as a proxy for an entity’s frequency.”.
My understanding is to traverse the entire Wikipedia to count the number of hyperlinks that link to an entity details page to represent the entity frequency.
This can be difficult, and implementations can introduce bias, Do you have any plans to release a file describing the frequencies of entities, or code that counts entity frequencies.

Thank you so much!

@a3616001
Copy link
Member

Yes, we just counted the number of hyperlinks as the frequency.
Please find the entity frequency that we used here.

@xiahaoyun
Copy link
Author

Thanks for your prompt reply!
The file you provided was helpful.

@xiahaoyun
Copy link
Author

I'm trying to extract entities from an existing query, looking for the frequency of entities. But I found that there are a large number of entities that can't find the corresponding frequency. Please is it because of something wrong with my code or is there really some entities missing from the file.

Below is my code:
https://gist.github.com/xiahaoyun/237534fc756da928cfbcae0e4f54f457

@xiahaoyun xiahaoyun reopened this Apr 1, 2022
@a3616001
Copy link
Member

a3616001 commented Apr 4, 2022

We used the label of the Wikidata items to generate queries, which cannot be used to identify an entity; while in the entity-frequency file, the key is the corresponding Wikipedia title of the item, which is an unique ID for the entity.

For example, the label of the Wikidata item Q138518 is "Princeton", while the corresponding Wikipedia title is "Princeton, New Jersey".

When building the dataset, we extracted (subject, relation, object) triples from TREx and used the label to generate queries, but we didn't record the corresponding item ID of each query. So, unfortunately, we are not able to map the entity back to ID and get its frequency.

Let me know if you have any further questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants