Permalink
Find file
09d1156 Mar 7, 2017
@EdouardGrave @VessoVit @sleepinyourhat
125 lines (112 sloc) 15.7 KB

Pre-trained word vectors

We are publishing pre-trained word vectors for 90 languages, trained on Wikipedia using fastText. These vectors in dimension 300 were obtained using the skip-gram model described in Bojanowski et al. (2016) with default parameters.

Format

The word vectors come in both the binary and text default formats of fastText. In the text format, each line contain a word followed by its embedding. Each value is space separated. Words are ordered by their frequency in a descending order.

License

The pre-trained word vectors are distributed under the Creative Commons Attribution-Share-Alike License 3.0.

References

If you use these word embeddings, please cite the following paper:

P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

@article{bojanowski2016enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.04606},
  year={2016}
}

Models

The models can be downloaded from: