Skip to content
This is a repository for the AI LAB article "係り受けに基づく日本語単語埋込 (Dependency-based Japanese Word Embeddings)" ( Article URL https://ai-lab.lapras.com/nlp/japanese-word-embedding/)
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md

README.md

Overview

This is a repository to share dependency-based Japanese word embeddings which we trained for experiments in the article 係り受けに基づく日本語単語埋込 (Dependency-based Japanese Word Embeddings).

We applied the method proposed in the paper Dependency-based Word Embeddings to Japanese.

Training Details

To prepare the training data, we first extracted sentences from Japanese Wikipedia dumps.
Then, we parsed them using an NLP framework GiNZA.
Finally, we trained the embeddings with the script provided in the page of the paper's first author.

The parameter settings for the experiments is as below where DIM is the number of dimensions written in each file name.

-size DIM -negative 15 -threads 20

Download URL

You can download the data from links below.
Download begins soon after you click on a link.

How to Use the Embeddings

You can use the embeddings in the same way as embeddings trained by using the original implementation of Word2Vec.

Here is an example code to load them from your Python script.

from gensim.models import KeyedVectors
vectors = KeyedVectors.load_word2vec_format("path/to/embeddings")

When Using Them for Your Research

When writing your paper using them, please cite this bibtex,

@misc{matsuno2019dependencybasedjapanesewordembeddings,  
    title  = {Dependency-based Japanese Word Embeddings},  
    author = {Tomoki, Matsuno},  
    affiliation = {LAPRAS inc.},
    url    = {https://github.com/lapras-inc/dependency-based-japanese-word-embeddings},  
    year   = {2019}  
}  

References

  • 松田寛, 大村舞, 浅原正幸. 短単位品詞の用法曖昧性解決と依存関係ラベリングの同時学習, 言語処理学会 第 25 回年次大会 発表論文集, 2019.
  • Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, .
  • Levy, O. & Goldberg, Y. (2014). Dependency-Based Word Embeddings. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (p./pp. 302--308), June, Baltimore, Maryland: Association for Computational Linguistics.
You can’t perform that action at this time.