# Spacy Text Similarity
For text similarity, Spacy uses <b>word vectors</b> which was generated using Word2Vec models and <b>Cosine Similarity</b><br>
For similarity, you can use :<br>
1. Doc.similarity()
2. Span.similarity()
3. Token.similarity()
<br>

### Models available in Spacy are:
- en_core_web_md (medium model)
- en_core_web_lg (large model)
- en_core_web_sm (small model) but not useful for text similarity
<br>


In [6]:
# ! python -m spacy download en_core_web_md

In [2]:
import spacy
import en_core_web_sm

nlp = en_core_web_sm.load()



In [3]:
doc1 = nlp("I like pasta")
token1 = doc1[2]
doc2 = nlp("I like pizza")
token2 = doc2[2]

#### Vector

In [4]:
print("Vector of the word %s: %s"%(doc1[2],str(doc1[2].vector)))


Vector of the word pasta: [ 3.4587069  -5.1740785   1.5724678  -1.8150723   0.16870703 -1.0707431
  4.2217684  -2.3746052  -4.827319    0.8752275  -1.0055934   1.2087524
 -0.49629104 -2.0868046   2.2549772  -1.6175449   1.9484265   1.0280021
 -0.44205624  1.2065959  -1.0452871  -4.4737735  -1.4106083   3.5614092
 -0.03724515 -0.70621526  1.6226947   1.546504    2.4825149   1.8595277
  0.97728765 -2.322699    0.22202742 -0.99675757  0.2517177   1.0331284
  0.019719   -0.6510774  -4.033538   -1.8552463  -2.125887    3.2676394
  0.07941335 -1.8486612  -1.7704881  -0.7598312  -2.0591888  -4.144084
 -3.9161177  -1.5054088  -4.325083    0.29423118 -2.2109983  -1.753673
 -1.811351   -0.59585094  2.7309854  -2.30828     0.02852792  2.5986023
  0.32365662  4.0930204   4.827987   -0.9883788   1.944063    2.099637
  1.7788304  -2.1225457  -0.76004636  1.2609104  -0.243267   -2.2987692
  3.9465563   2.1220016  -1.0930598   4.6695375  -0.3734905   1.9006069
 -2.6517713   1.3784261   2.0190027   2.1

#### Word Similarity

In [5]:
word_sim_score = token1.similarity(token2)
print("word similarity between '%s' and '%s' is '%e'"%(token1,token2,word_sim_score))


word similarity between 'pasta' and 'pizza' is '7.404463e-01'


  "__main__", mod_spec)


#### Doc Similarity

In [7]:
sim_score = doc1.similarity(doc2)
print("Similarity between doc1 and doc2:", sim_score)


Similarity between doc1 and doc2: 0.9119401819026696


  "__main__", mod_spec)


#### similarity between the token and doc

In [8]:
tokenDoc_sim_score= doc1.similarity(token1)
print ("similarity between the token '%s' and doc '%s' is %e"%(token1,doc1,tokenDoc_sim_score))

similarity between the token 'pasta' and doc 'I like pasta' is 5.976120e-01


  "__main__", mod_spec)
