# Download latest available release
wget https://github.com/estamos/word2vec-thesis/releases/download/final/word2vec-thesis-final.tar.gz
tar -xvf word2vec-thesis-final.tar.gz
cd word2vec-thesis-final
cp test/test.py models
cd models
python test.py
Gensim parameter | Tensorflow parameter | Type | Details |
---|---|---|---|
alpha | learning_rate | float | The initial learning rate |
cbow_mean | - | boolean | 0: use the sum of the context word vectors 1: use the mean, only applies when cbow is used |
epochs | epochs | int | Number of iterations (epochs) over the corpus |
hs | - | boolean | 0: hierarchical softmax will be used for model training 1: if negative is non-zero, negative sampling will be used |
min_count | min_count | int | Maximum distance between the current and predicted word within asentence |
negative | num_neg_samples | int | how many "noise words" should be drawn |
sample | subsample | float | The threshold for configuring which higher-frequency words are randomly downsampled |
sg | - | boolean | 0: CBOW 1: Skipgram |
vector_size | embedding_dim | int | Dimensionality of the word vectors |
window | window_size | int | Maximum distance between the current and predicted word within a sentence |
Trained with parameters
Gensim parameter | Value |
---|---|
window | 10 |
min_count | 2 |
workers | 10 |
total_examples | len(documents) |
epochs | 10 |
CBOW | Skipgram |
---|---|
956.5 | 3768.5 |
CBOW | Skipgram |
---|---|
1327456338 | 1327454735 |
Epoch | CBOW | Skipgram |
---|---|---|
Average | 95.65 | 376.85 |
1 | 95.9 | 338.3 |
2 | 95.3 | 340.0 |
3 | 96.7 | 339.9 |
4 | 96.1 | 448.0 |
5 | 95.4 | 339.3 |
6 | 95.3 | 339.8 |
7 | 95.6 | 339.9 |
8 | 95.3 | 599.3 |
9 | 95.3 | 342.8 |
10 | 95.6 | 341.2 |
Epoch | CBOW | Skipgram |
---|---|---|
Average | 132745634 | 132745474 |
1 | 132750757 | 132744876 |
2 | 132744712 | 132741580 |
3 | 132743879 | 132750658 |
4 | 132748376 | 132743435 |
5 | 132747942 | 132749631 |
6 | 132746112 | 132744974 |
7 | 132744511 | 132745877 |
8 | 132742194 | 132744706 |
9 | 132740767 | 132745693 |
10 | 132747088 | 132743305 |
.
βββ LICENSE
βββ README.md
βββ dataset
βΒ Β βββ wiki_en_corpus.txt
βββ logs
βΒ Β βββ cbow-log.rtf
βΒ Β βββ skipgram-log.rtf
βββ models
βΒ Β βββ word2vec-cbow-trained.model
βΒ Β βββ word2vec-cbow-trained.model.syn1neg.npy
βΒ Β βββ word2vec-cbow-trained.model.wv.vectors.npy
βΒ Β βββ word2vec-skipgram-trained.model
βΒ Β βββ word2vec-skipgram-trained.model.syn1neg.npy
βΒ Β βββ word2vec-skipgram-trained.model.wv.vectors.npy
βββ test
βΒ Β βββ test.py
βββ train
βββ cbow
βΒ Β βββ cbow.py
βββ skipgram
βββ skipgram.py