code for "End-to-End Reinforcement Learning for Automatic Taxonomy Induction" ACL 2018 [arXiv]
python 2.7
dynet 2.0
tqdm
Preprocessed pickled data including everything else for the WordNet data can be downloaded here.
Preprocessed pickled data including everything else for the WordNet data and SemEval-2016 can be downloaded here. If you run on SemEval-2016, use dev_twodatasets.tsv instead of dev_wnbo_hyper.tsv. Caution: it may take 40+ GB memory.
Go to https://morningmoni.github.io/wordnet-vis/ to see the visualization of WordNet subtrees.
DIY
- To build everything from scratch, first download corpora such as Wikipedia, UMBC, and 1 Billion Word Language Model Benchmark.
- To preprocess the corpus, generate a vocabulary file and use the scripts under ./corpus/ to find dependency paths between terms in the vocabulary. The scripts are modified based on LexNET. Instructions can be found here. It may take several hours to finish this process.
- Run train_RL.py and it will compute all the features and save them into pickle files.
Run train_RL.py for training and testing. All the parameters are in argparse and have default values so that you can run without specifying any parameters (but feel free to tune them).
In each epoch, the performance on training/validation/test sets is reported. You may exit the program at any time.
@InProceedings{P18-1229, author = "Mao, Yuning and Ren, Xiang and Shen, Jiaming and Gu, Xiaotao and Han, Jiawei", title = "End-to-End Reinforcement Learning for Automatic Taxonomy Induction", booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", year = "2018", publisher = "Association for Computational Linguistics", pages = "2462--2472", location = "Melbourne, Australia", url = "http://aclweb.org/anthology/P18-1229" }