Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sense2vec Similarity Question #42

Closed
dbl001 opened this issue Jan 5, 2018 · 6 comments
Closed

Sense2vec Similarity Question #42

dbl001 opened this issue Jan 5, 2018 · 6 comments
Labels
usage General usage

Comments

@dbl001
Copy link

dbl001 commented Jan 5, 2018

Why do 'flies|VERB' and 'flies|NOUN' have a similarity of 1.0?
I'm running sense2vec on Anaconda, with Python 3.6 on OS X 10.11.6

$ python --version
Python 3.6.3 :: Anaconda custom (64-bit)
$ sputnik --name sense2vec --repository-url http://index.spacy.io install reddit_vectors
Downloading...
Downloaded 560.90MB 100.00% 2.15MB/s eta 0s              
archive.gz checksum/md5 OK
INFO:sputnik.pool:install reddit_vectors-1.1.0
$ conda list spacy
# packages in environment at /Users/davidlaxer/anaconda/envs/spacy:
#
spacy                     2.0.4                    py36_0    conda-forge
spacy                     0.101.0                   <pip>
$ conda list sense2vec
# packages in environment at /Users/davidlaxer/anaconda/envs/spacy:
#
sense2vec                 0.6.0                     <pip>
$ conda list thinc
# packages in environment at /Users/davidlaxer/anaconda/envs/spacy:
#
thinc                     6.10.0                   py36_0    conda-forge
thinc                     5.0.8                     <pip>

Here's my example:

import sense2vec
model = sense2vec.load()
freq, query_vector1 = model["flies|NOUN"]
model.most_similar(query_vector1, n=5)
(['flies|NOUN', 'gnats|NOUN', 'snakes|NOUN', 'birds|NOUN',  'grasshoppers|NOUN'],
 <MemoryView of 'ndarray' at 0x1af394c540>)

freq, query_vector2 = model["flies|VERB"]
model.most_similar(query_vector2, n=5)

(['flies|VERB', 'flys|VERB', 'flying|VERB', 'jumps|VERB', 'swoops|VERB'],
 <MemoryView of 'ndarray' at 0x1af394c6e8>)
In [42]: model.data.similarity(query_vector1, query_vector1)
1.0

screen shot 2018-01-05 at 2 53 08 pm

From a model I trained:

In [40] new_model = gensim.models.Word2Vec.load('/Users/davidlaxer/LSTM-Sentiment-Analysis/corpus_output_256.txt')
In [41] new_model.similarity('flies|NOUN', 'flies|VERB')
0.9954307438574328
In [43] new_model.wv.vocab["flies|VERB"].index
5895
In [44] new_model.wv.vocab["flies|NOUN"].index
7349
In [45] new_model.wv["flies|VERB"]  
array([ 0.15279259,  0.04471067,  0.0923325 , -0.07349139,  0.04180749,
     -0.71864516,  0.08252977, -0.02405624,  0.28384277,  0.01706951,
     -0.15931296, -0.21216595, -0.0352594 ,  0.13597694,  0.07868216,
     -0.15907238, -0.30132023,  0.01954124,  0.22636545, -0.19983807,
     -0.03842518,  0.49959993, -0.18679027, -0.16045345,  0.05813084,
      0.12905809,  0.1305625 ,  0.42689237,  0.19311258, -0.1002808 ,
      0.07427863, -0.19840011,  0.42542475, -0.32158205,  0.15129171,
     -0.32177079, -0.04034998, -0.05301504,  0.38441092, -0.31020632,
      0.42528978, -0.26249531, -0.25648555,  0.16558036,  0.28656447,
     -0.11909373,  0.09208378, -0.08886475, -0.40061441,  0.02873728,
      0.07275984, -0.05674595, -0.09471942, -0.01308586, -0.2777423 ,
     -0.05253473, -0.00179329, -0.15887854,  0.31784746, -0.00895729,
      0.50658983,  0.09232203,  0.16289137, -0.20241632, -0.01240843,
      0.20972176,  0.065593  ,  0.40676439, -0.16795945,  0.08079262,
      0.27334401,  0.16058736, -0.15362383, -0.13958427,  0.17041191,
     -0.08574789, -0.20200305,  0.16288304,  0.11220794,  0.44721738,
     -0.14058201,  0.13652138, -0.0134679 ,  0.20938247,  0.34156594,
      0.21730828, -0.19907214,  0.02451441,  0.12492239,  0.08635994,
     -0.29003018,  0.01458945,  0.02637799,  0.10671763, -0.17983682,
      0.01115436, -0.02827467,  0.13415532,  0.4656623 , -0.34222263,
      0.44238791, -0.29407004, -0.16681372,  0.04466435, -0.21825369,
     -0.09138768,  0.02407285, -0.57841706, -0.19544049, -0.07518575,
      0.36430466, -0.13164517, -0.01708322,  0.11068137,  0.2811991 ,
      0.02544841,  0.10672008,  0.06147943,  0.09167367, -0.71296901,
      0.04190712, -0.47360554, -0.01762259,  0.0359503 , -0.24351278,
     -0.01718491, -0.04033662,  0.03032484, -0.33736056, -0.13555804,
      0.02156358, -0.50073934, -0.0706998 ,  0.41698509, -0.23886077,
     -0.06120266, -0.0681426 ,  0.15182504,  0.13283113, -0.05899575,
     -0.11477304, -0.18594885, -0.17855589,  0.31381837,  0.25157636,
      0.41943148,  0.05070408, -0.03173119, -0.04240219, -0.25305411,
     -0.36856946,  0.20292452,  0.10858628,  0.17122397,  0.01447193,
     -0.47961271, -0.45739996,  0.17185016, -0.03916142, -0.04544915,
      0.34947339,  0.04178765,  0.37088165,  0.14284173,  0.03443905,
      0.30170318,  0.05259432, -0.22402297,  0.05495254, -0.46103877,
     -0.22059456, -0.27414244,  0.55484813,  0.1569699 ,  0.35761088,
      0.08712664,  0.23313828, -0.25803107, -0.03343969, -0.14713305,
     -0.0611255 ,  0.17435439, -0.01603068,  0.00526717, -0.08379596,
     -0.08644171, -0.12666632,  0.12955435,  0.48045933, -0.17596652,
     -0.29505005,  0.60152525, -0.01975689,  0.02343576,  0.17027852,
     -0.06638149, -0.10826188, -0.41277543, -0.12114278, -0.01596882,
      0.02660148,  0.22383556, -0.030263  , -0.0768819 , -0.32506746,
     -0.15082234, -0.16559191, -0.08502773, -0.01570902, -0.22921689,
      0.19637343, -0.4993245 ,  0.19670881,  0.17284806,  0.10345648,
      0.45276237, -0.12255403,  0.18032061,  0.05677452,  0.09869532,
     -0.23536956, -0.22449525,  0.51938456,  0.24111946,  0.26022053,
     -0.18190917, -0.01768251,  0.00435291,  0.05820792, -0.46525213,
      0.17490779,  0.15250422, -0.1760795 ,  0.14194083,  0.09954269,
     -0.89346975, -0.11642933,  0.0944154 ,  0.2134015 , -0.01955901,
     -0.02899018,  0.07254739, -0.03995875,  0.39499217, -0.05394226,
     -0.07821836, -0.29973337, -0.11607374, -0.01082127,  0.36769736,
      0.04288069, -0.0461933 ,  0.00675509,  0.25210902, -0.21784271,
     -0.18479778], dtype=float32)
In [46]: new_model.wv["flies|NOUN"]
array([ 0.1304135 ,  0.05724983,  0.06886293, -0.03062466,  0.01640639,
     -0.53799176,  0.10968599, -0.02839088,  0.18814373,  0.00147691,
     -0.11227507, -0.14502132, -0.03685957,  0.06422875,  0.07289967,
     -0.10437401, -0.23557086,  0.00153201,  0.17661473, -0.12828164,
     -0.02789859,  0.35942602, -0.1580196 , -0.13264264,  0.03343309,
      0.10922851,  0.1102568 ,  0.29480889,  0.14417146, -0.07892705,
      0.06608826, -0.14885685,  0.32329369, -0.23263605,  0.11967299,
     -0.23964159, -0.02619613,  0.00930338,  0.31111386, -0.22507732,
      0.32475442, -0.19287167, -0.19306417,  0.10722513,  0.2237518 ,
     -0.06828826,  0.07246322, -0.06233693, -0.31375739,  0.01069155,
      0.04457425, -0.00323939, -0.05079295, -0.02164256, -0.22060572,
     -0.03816675,  0.00503534, -0.10069088,  0.24429323,  0.02505454,
      0.38344654,  0.09145252,  0.11439045, -0.10801487, -0.01075712,
      0.16894275,  0.04799445,  0.3149668 , -0.13885498,  0.02068597,
      0.17856079,  0.11587915, -0.11973458, -0.0896498 ,  0.11993878,
     -0.06647626, -0.15219077,  0.10705566,  0.07842658,  0.31101131,
     -0.12788543,  0.09909476,  0.00878725,  0.1618593 ,  0.22566552,
      0.1297064 , -0.14370884,  0.02069237,  0.08489513,  0.0567583 ,
     -0.21860926,  0.01057386,  0.03844477,  0.06213358, -0.12877114,
      0.02327059, -0.00917741,  0.11733869,  0.35853127, -0.25572705,
      0.30879059, -0.20568153, -0.12405248,  0.03546307, -0.18377842,
     -0.06700096,  0.00626029, -0.42848313, -0.13129929, -0.04215423,
      0.26977378, -0.07725398,  0.01177794,  0.05952175,  0.21516307,
      0.01055368,  0.06727242,  0.05038245,  0.06739338, -0.53844106,
      0.02834721, -0.33890292, -0.02644366,  0.03540507, -0.16382404,
     -0.01353777, -0.02502321,  0.00226415, -0.24348356, -0.12502551,
      0.01489578, -0.37660655, -0.05798845,  0.28748602, -0.18512824,
     -0.06250153, -0.06967189,  0.14023623,  0.09628384, -0.09925015,
     -0.07317897, -0.14045765, -0.14597888,  0.24456802,  0.173549  ,
      0.3357946 ,  0.0424754 ,  0.00723427, -0.02120454, -0.14892557,
     -0.26496273,  0.14844348,  0.06555442,  0.11951103,  0.03691757,
     -0.36404395, -0.32292312,  0.09412326, -0.06377046, -0.02561374,
      0.24361259,  0.02616721,  0.29151902,  0.1178301 ,  0.03284379,
      0.20218852,  0.0337379 , -0.14703217,  0.02869225, -0.31447497,
     -0.15038867, -0.23353554,  0.41700551,  0.11959957,  0.26917797,
      0.04590914,  0.16029988, -0.18795538, -0.01343729, -0.10532234,
     -0.02617499,  0.12019841,  0.00673278, -0.0070972 , -0.03176219,
     -0.07582191, -0.07277017,  0.09928112,  0.36159652, -0.14404564,
     -0.21233276,  0.46463615,  0.01645906,  0.01815237,  0.12149289,
     -0.07040837, -0.06278557, -0.29605272, -0.07451538,  0.00487611,
      0.00313085,  0.13640559, -0.02045129, -0.05790693, -0.22582445,
     -0.10382047, -0.13318184, -0.05160375,  0.01498237, -0.15075362,
      0.14116266, -0.36445442,  0.1420894 ,  0.11182524,  0.10055254,
      0.33450282, -0.08930281,  0.15410167,  0.03961684,  0.06431124,
     -0.15608449, -0.1599745 ,  0.3780185 ,  0.18073064,  0.2190931 ,
     -0.16039631, -0.03769958, -0.00069833,  0.06914425, -0.33746576,
      0.11075038,  0.11626988, -0.12498619,  0.07928085,  0.0636186 ,
     -0.6352759 , -0.10650127,  0.03810085,  0.14585988, -0.01552053,
     -0.01488287,  0.04300846, -0.00500007,  0.26444513, -0.03629581,
     -0.04127173, -0.23304868, -0.08911316,  0.0029219 ,  0.27401808,
      0.00279731, -0.04162024,  0.00214672,  0.15316918, -0.14298579,
     -0.15343791], dtype=float32)


@ines ines added the usage General usage label Apr 8, 2018
@elyase
Copy link

elyase commented Apr 10, 2018

You have a bug in your test code. Instead of:

model.data.similarity(query_vector1, query_vector1)

you should do:

model.data.similarity(query_vector1, query_vector2)

@dbl001
Copy link
Author

dbl001 commented Apr 10, 2018 via email

@elyase
Copy link

elyase commented Apr 10, 2018

You shouldn't be using sputnik to download the reddit vectors (I think it is deprecated). You can get them from here:

https://github.com/explosion/sense2vec/releases

and then follow the updated instructions in the README.

@kssalanio
Copy link

kssalanio commented Sep 28, 2018

Hiya, I'm having a similar problem as well. Is the similarity function below available if sense2vec is pipelined into spacy?
model.data.similarity(query_vector1, query_vector2)

I tried this one below and it's not there:

nlp = spacy.load('en')
s2v = Sense2VecComponent('/Users/ken/eclipse-workspace/data/reddit_vectors-1.1.0')
nlp.add_pipe(s2v)
test_tkn_1 = nlp(u"name")
test_tkn_2 = nlp(u"full name")
qvector_1 = test_tkn_1[0]._.s2v_vec
qvector_2 = test_tkn_2[0]._.s2v_vec
similarity_val = s2v.similarity(qvector1, qvector2)

Using the function below returns NLP's answer, not S2V:
similarity_val = test_tkn_1[0].similarity(test_tkn_2[0])

Should I just use S2V standalone for this function?

@Shinupchandran
Copy link

Shinupchandran commented Aug 19, 2019

While running the below code:

model= sense2vec.load()

and it gives me an error:

AttributeError: module 'sense2vec' has no attribute 'load'

Not sure how to proceed with sense2vec with this error.

@ines
Copy link
Member

ines commented Oct 29, 2019

Closing this, since #77 will address a bunch of interoperability stuff with spaCy. Also, various things have changed under the hood in the meantime.

@ines ines closed this as completed Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General usage
Projects
None yet
Development

No branches or pull requests

5 participants