This is a word aligner for English: given two English sentences, it aligns related words in the two sentences. It exploits the semantic and contextual similarities of the words to make alignment decisions.
Initially, this is a fork of ma-sultan/monolingual-word-aligner, the aligner presented in Sultan et al., 2015 that has been very successful in SemEval STS (Semantic Textual Similarity) Task in recent years.
# download the repo
git clone https://github.com/rgtjf/monolingual-word-aligner.git
# require stopwords from nltk
python -m nltk.downloader stopwords
# require stanford corenlp
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2015-12-09.zip
unzip stanford-corenlp-full-2015-12-09.zip
# lanch the stanford CoreNLP
cd stanford-corenlp-full-2015-12-09/
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
# after this, you will find stanfordCoreNLP server at http://localhost:9000/
python test_align.py
sh download.sh
python run_stsbenchmark.py
Methods (eval on STSbenchmark) | Dev | Test |
---|---|---|
aligner | 0.6991 | 0.6379 |
idf_aligner | 0.7969 | 0.7622 |