Cr5

This repository contains the code for the following paper, which proposes a novel approach of learning crosslingual word embeddings optimized for document level aggregation.

"Crosslingual Document Embedding as Reduced-Rank Ridge Regression". Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 2019.

Pretrained crosslingual embeddings

We also publish a dataset of pretrained word embeddings in 28 languages, where words are embedded in a shared latent space. The dataset is available here.

If you found the provided resources useful, please cite the above paper. Here's a BibTeX entry you may use:

@inproceedings{josifoski-wsdm2019-cr5,
  title={Crosslingual Document Embedding as Reduced-Rank Ridge Regression},
  author={Josifoski, Martin and Paskov, Ivan S. and Paskov, Hristo S. and Jaggi, Martin and West, Robert},
  booktitle={Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining},
  organization={ACM},
  year={2019}
}

Any questions or suggestions?

Contact martin.josifoski@epfl.ch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Cr5

Pretrained crosslingual embeddings

Any questions or suggestions?

Files

README.md

Latest commit

History

README.md

File metadata and controls

Cr5

Pretrained crosslingual embeddings

Any questions or suggestions?