This repository contains code and data used in the following paper:
@inproceedings{lan2018subword,
author = {Lan, Wuwei and Xu, Wei},
title = {Character-based Neural Networks for Sentence Pair Modeling},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)},
year = {2018}
}
The original PWIM is from this paper:
@inproceedings{he-lin:2016:N16-1,
author = {He, Hua and Lin, Jimmy},
title = {Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)},
year = {2016}
}
-
This repositiory only contains MSRP dataset, here is Twitter-URL here and PIT-2015.
-
We follow this code to do data preprocessing.
-
The model was implemented with PyTorch 0.4.0 and Torchtext 0.1.1 .
-
Sample command to run: python main.py, you can check main.py to add more arguments.
-
There is a demo you can try (download save_dir, which contains model trained on Twitter-URL with unigram CNN):
python -W ignore demo.py 'do you know where my book is' 'i cannot find my book, do you know where is it'