Hierarchical Embeddings for Hypernymy Detection and Directionality
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.metadata
.settings
code_mapping_across_languages
datasets_across_languages
datasets_classification
evaluation_scripts
get-pretrainedHyperVecEmbeddings fix the missing part_0 Jul 26, 2018
hypernymy_resources
src
.classpath
.gitignore
.project
.pydevproject
HyperVec.jar
README.md
config.cfg
create_features.py
pom.xml

README.md

HyperVec

Hierarchical Embeddings for Hypernymy Detection and Directionality

Prerequisite

  • spaCy: for parsing, version 2.0.11
  • a corpus such as wikipedia corpus (plain-text)

Preprocess

  • Create the feature files:

    python create_features.py -input corpus-file.txt -output output-file-name -pos pos_tag

    in which: pos_tag is either NN (for the noun features) or VB (for the verb features)

Configuration

See the config.cfg to set agruments for model.

Training embeddings

java -jar HyperVec.jar config.cfg vector-size window-size

For example, training embeddings with 100 dimensions; window-size = 5:

java -jar HyperVec.jar config.cfg 100 5

Pretrained (hypervec) embeddings

The embeddings used in our paper can be downloaded by using the script in get-pretrainedHyperVecEmbeddings/download_embeddings.sh. Note that the script downloads 9 files and concatenates them again to a single file (hypervec.txt.gz). The format is the default word2vec format: first line with header information, other lines word followed by whitespace seperated vector.

Information about the embeddings: creatd using the ENCOW14A corpus (14.5bn token), 100 dimensions, sym. window of 5, 15 negative samples, 0.025 learning rate, threshhold set to 0.05. The resulting vocabulary contains about 2.7m words.

Example usage: Evaluation BLESS,BIBLESS and AWBLESS

To reproduce our experiments from Table 3 use the code in the datasets_classification/, assuming your vector file is located in the same folder and named hypervec.txt.gz. java -jar eval-dir.jar hypervec.txt.gz (Evaluate directionality on BLESS.txt using hyperscore) java -jar eval-bless.jar hypervec.txt.gz 2 1000 (Evaluate classification on BIBLESS.txt, AWBLESS.txt using 2% of the training data and 1000 random iterations)

Citation info

If you use the code or the created feature norms, please cite our paper (Bibtex), the paper can be found here: PDF, the poster from EMNLP can be found here: Poster