Skip to content

Source code for "Efficient Entity Candidate Generation for Low-Resource Languages"

Notifications You must be signed in to change notification settings

epfl-dlab/pti-candgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Source code for "Entity-for-mention Candidate Generation for Low-resource Languages"

Data

Download the data from this link

It contains 18 folders (one per language) which should be put inside the mentions_dumps folder.

The files interlanguage_links_wikidata-20190901.csv and qid2title_CHAR.pkl have to be in the same folder as the README file.

Virtual Environment

Setup the virtual environment named pti to install all the required dependencies conda env create -f pti.yml.

Activate the installed environment conda activate pti

Running PTI and Charagram

PTI and Charagram can be run by simply executing

python PTI/main.py alpha lambda target_lang pivot_lang

and

python CHARAGRAM/main.py mu amount_training_data target_lang pivot_lang

respectively. Zero-shot setting is enabled by setting alpha=-1 or mu=-1. Following guidelines by the authors of Charagram, amount_training_data is set to 80,000 for the main experiments contained in the paper. Both the pivot and target language are represented with the two or three character code indicated in the submission.

Running WikiPriors

To run WikiPriors simply execute

python WikiPriors/main.py args.

where the arguments args are listed below:

  • --tlang: target language.
  • --plang: pivot language.
  • --ncands: number of retrieved candidate entities. The default value is 30.
  • --zeroshot: boolean to indicate whether the learning setting is zero-shot or not.

About

Source code for "Efficient Entity Candidate Generation for Low-Resource Languages"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages