Skip to content

rgtjf/monolingual-word-aligner

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Word Aligner for English

This is a word aligner for English: given two English sentences, it aligns related words in the two sentences. It exploits the semantic and contextual similarities of the words to make alignment decisions.

Ack

Initially, this is a fork of ma-sultan/monolingual-word-aligner, the aligner presented in Sultan et al., 2015 that has been very successful in SemEval STS (Semantic Textual Similarity) Task in recent years.

Install

# download the repo
git clone https://github.com/rgtjf/monolingual-word-aligner.git

# require stopwords from nltk
python -m nltk.downloader stopwords

# require stanford corenlp
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2015-12-09.zip
unzip stanford-corenlp-full-2015-12-09.zip

# lanch the stanford CoreNLP
cd stanford-corenlp-full-2015-12-09/
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
# after this, you will find stanfordCoreNLP server at http://localhost:9000/

python test_align.py

Evaluate on STSBenchmark

sh download.sh
python run_stsbenchmark.py

Results

Methods (eval on STSbenchmark) Dev Test
aligner 0.6991 0.6379
idf_aligner 0.7969 0.7622

Reference

STSBenchmark board

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%