This repo contains the output files and analysis results reported in the paper "Grammar Induction with Neural Language Models: An Unusual Replication" , where we perform an in-depth analysis of the Parsing Reading Predict Networks .
The parsed files can be downloaded here. The parsed files are named in the following way:
- Example: parsed_WSJ_PRPNUP_WSJFull_ESUP.jsonl
We also share the pretrained model that provides the best F-1 score (PRPN-LM trained on AllNLI with language modeling criterion) which can be downloaded here.
You will need the original PTB corpus to use NLTK for reading the WSJ trees in
data_ptb.py, which is used in PRPN_UP (
parse_data.py. The original PTB corpus can be downloaded here.
The vocabulary files for all models as well as the preprocessed PTB data files used in PRPN_LM (
main_LM.py) can be downloaded here.
To produce parses using pretrained model:
python parse_data.py --data path_to_data --checkpoint path_to_model/model_lm.pt --seed 1111 --eval_data path_to_multinli/multinli_1.0_dev_matched.jsonl --save_eval_path save_path/parsed_MNLI.jsonl
 Phu Mon Htut, Kyunghyun Cho, Samuel R. Bowman. Grammar Induction with Neural Language Models: An Unusual Replication. To appear in Proceedings of the EMNLP. 2018.
 Yikang Shen, Zhouhan Lin, Chin wei Huang, and Aaron Courville. Neural language modeling by jointly learning syntax and lexicon. Proceedings International Conference on Learning Representations. 2018. [code]