Implementations of spectral word embedding methods
Switch branches/tags
Nothing to show
Clone or download
Latest commit b4404c0 Jun 10, 2016
Permalink
Failed to load latest commit information.
cpp Delete weighting_tf option Jun 7, 2016
experiments Fix path Jun 10, 2016
src Fix paths Jun 10, 2016
tools Fix paths Jun 10, 2016
.gitignore Ignore TAGS Jun 10, 2016
.gitmodules Add submodule tools/word2vec Apr 1, 2016
.travis.yml Fix invalid `cd` May 26, 2016
README.md Fix typo Jun 10, 2016

README.md

kadingir

This is an open source implementation of

Oshikiri, T., Fukui, K., Shimodaira, H. (2016). Cross-Lingual Word Representations via Spectral Graph Embeddings. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. (To appear)

Contents

  • src/ : Source codes (C++ & Rcpp version)
  • cpp/ : Source codes (C++ only version)
  • experiments/ : Code used in the experiments
  • tools/

Implemented methods

  • CL-LSI [Littman+ 1998]
  • Eigenwords [Dhillon+ 2012] [Dhillon+ 2015]
    • One-Step CCA (OSCCA)
    • Two-Step CCA (TSCCA)
  • Eigendocs
  • Cross-Lingual Eigenwords (CL-Eigenwords) [Oshikiri+ 2016]

Required datasets for experiments

Submodules

References

  • Oshikiri, T., Fukui, K., Shimodaira, H. (2016). Cross-Lingual Word Representations via Spectral Graph Embeddings. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. (To appear)
  • Dhillon, P., Rodu, J., Foster, D., and Ungar, L. (2012). Two step cca: A new spectral method for estimating vector models of words. In Langford, J. and Pineau, J., editors, Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML ’12, pages 1551–1558, New York, NY, USA. Omnipress.
  • Dhillon, P. S., Foster, D. P., and Ungar, L. H. (2015). Eigenwords: Spectral word embeddings. Journal of Machine Learning Research, 16:3035–3078.
  • Littman, M. L., Dumais, S. T., & Landauer, T. K. (1998). Automatic cross-language information retrieval using latent semantic indexing. Cross-Language Information Retrieval, 51–62.

License

GPL v3