Skip to content

Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport (EMNLP 2022)

Notifications You must be signed in to change notification settings

kellymarchisio/goat-for-bli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport

This is an implementation of the experiments presented in:

If you use this software for academic research, please cite 2022 paper above.

This codebase is an extension of that used in the following paper, available at github.com/kellymarchisio/euc-v-graph-bli

Requirements

  • Python3
  • CuPy
  • sklearn
  • scipy

Setup

To download pretrained word embeddings, run sh get_data.sh from the embs/ folder. To download MUSE dictionaries and create development sets, run sh create_dicts.sh from the dicts/ folder. To download GOAT, run sh get_packages.sh from the third_party/ folder. Note: There is a small bug in the GOAT implementation that should be fixed before running this code (the published paper did fix the bug locally). The quick one-line fix is in third_party/get_packages.sh.

Usage

Note: All results are written to the exp/ directory.

To run the main experiments presented in Table 1 and A3/4 of the publication, run, for example:

sh exps.sh single en de goat 100
sh exps.sh single en de sgm 100
sh exps.sh single en de proc 100

for a run of English-German with 100 seeds using GOAT, SGM, or Procrustes.

To run iterative experiments presented in Table A5, one may run:

sh exps.sh stoch-add en de proc 100

For the combination system, one may run the below for English-German starting with Iterative Procrustes and 100 seeds (-EG from Table 3, or GOAT -PG from Table A6).

sh combo-exps.sh en de proc 100 barycenter goat

This command runs GOAT -PP from Table A6 (Start with GOAT, end with Iterative Procrustes):

sh combo-exps.sh en de goat 100 barycenter goat

To use SGM instead of GOAT, you can run either of the below:

sh combo-exps.sh en de proc 100 randomized sgm
sh combo-exps.sh en de sgm 100 randomized sgm

About

Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport (EMNLP 2022)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published