- Set up a python environment with gensim installed. More detailed instructions here. You can also follow this video tutorial about Python virtualenv.
pip install gensim
Train the model
- Clone this repository or download this python script
git clone https://github.com/ml5js/training-word2vec/
- The script supports training from a single text file or directory of files. Create a text file or folder of multiple files. Now run
train.pywith the name of the file or folder.
python train.py file.xt python train.py files/
- The script will output a
vectors.jsonfile, however, if you would like to specify an output file name you can use the additional argument
python train.py data.txt -o output.json
- The output JSON file can be used now with the ml5.js word2vec examples.