Skip to content

mohammed-elkomy/word_vectors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Vectors

My implementations for word vector representation algorithms (word2vec and glove) on text8 dataset I also implemented word2vec evaluation as in the paper but the results aren't good and needs more tuning that I leave as future work for me :( I pushed my trained wordvectors in the same repo :3 only 15mb .. so small :)

Directory Structure

.
├── eval_data               
│   ├── questions-phrases.txt           
│   └──questions-words.txt         
├── glove                        # glove model
|   ├── glove_temp               # my trained glove word vectors  
│   └── myglove.py               # glove model
├── word2vec                     # Test files (alternatively `spec` or `tests`)
│   ├── mygensim.py              # word2vec gensim model
│   └── tf_myword2vec.py         # my word2vec model
├── tensorboard visualization.py # saves a checkpoint for tensorboard
└── eval_wordvectors.py          # evaluation script for wordvectors and demo of KNN

k nearest neighbours are still funny :D

one
=====================================
one                  1.0000
eight                0.9855
seven                0.9838
six                  0.9809
nine                 0.9799
four                 0.9773
five                 0.9769
three                0.9717
two                  0.9651
zero                 0.9605

fish
=====================================
fish                 1.0000
aquarium             0.3756
farming              0.3731
populations          0.3715
products             0.3617
meat                 0.3462
food                 0.3401
remote               0.3191
eating               0.3190
animals              0.3181

cairo
=====================================
cairo                1.0000
strikers             0.3217
egypt                0.3199
algeria              0.3044
muhammed             0.2954
iran                 0.2923
zoo                  0.2831
amaranth             0.2795
orleans              0.2765
kantele              0.2755

human
=====================================
human                1.0000
nature               0.6536
rights               0.6256
behavior             0.6133
animal               0.5955
or                   0.5950
individual           0.5948
particular           0.5872
any                  0.5736
such                 0.5730

mohammed
=====================================
mohammed             1.0000
dora                 0.3219
shah                 0.3163
frederic             0.3102
alexius              0.3079
vin                  0.3073
irradiation          0.3006
tires                0.2967
westcott             0.2941
godard               0.2902

time
=====================================
time                 1.0000
when                 0.8355
only                 0.8259
this                 0.8234
but                  0.8196
since                0.8124
while                0.8050
it                   0.8031
however              0.8028
which                0.8006

standards
=====================================
standards            1.0000
organizations        0.4634
international        0.4552
standard             0.4307
uses                 0.4016
types                0.3887
methods              0.3741
industry             0.3650
internet             0.3638
common               0.3585

program
=====================================
program              1.0000
programs             0.6051
computer             0.5709
system               0.5676
for                  0.5636
based                0.5591
provides             0.5435
software             0.5316
information          0.5182
using                0.5115

machine
=====================================
machine              1.0000
machines             0.5007
gun                  0.4386
tools                0.4355
translation          0.4120
computer             0.4078
using                0.3988
guns                 0.3985
purpose              0.3958
method               0.3944

intelligence
=====================================
intelligence         1.0000
artificial           0.5564
agency               0.4820
service              0.4342
security             0.4068
military             0.3907
secret               0.3791
department           0.3684
agents               0.3684
national             0.3607

Resources

glove

word2vec

Releases

No releases published

Packages

No packages published

Languages