Gensim and spaCy are both popular Python libraries used for natural language processing (NLP) tasks, but they serve different purposes and have different focuses.

1. Functionality:
   - Gensim: Gensim is primarily focused on topic modeling, document similarity, and unsupervised learning for large text corpora. It provides efficient implementations of algorithms like Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Word2Vec.
   
   - spaCy: spaCy, on the other hand, is a more comprehensive NLP library that offers a wide range of functionalities, including tokenization, named entity recognition (NER), part-of-speech (POS) tagging, dependency parsing, and more. It is designed for building practical, production-ready NLP pipelines.

2. Design Philosophy:
   - Gensim: Gensim focuses on providing simple and efficient implementations of algorithms for working with large volumes of text data. It emphasizes scalability, memory efficiency, and ease of use. Gensim is particularly useful for tasks like document similarity, text summarization, and topic modeling.

   - spaCy: spaCy is designed to be a full-featured NLP library that combines efficiency with usability. It aims to provide fast and accurate results while being user-friendly. spaCy includes pre-trained models for many languages and offers convenient features for common NLP tasks, such as named entity recognition, POS tagging, and dependency parsing.

3. Pre-trained Models:
   - Gensim: Gensim does not provide pre-trained models out-of-the-box. Instead, it focuses on providing algorithms and tools for training your own models on custom text corpora.

   - spaCy: spaCy comes with a range of pre-trained models for various NLP tasks, including tokenization, POS tagging, NER, and dependency parsing. These models are trained on large, diverse datasets and can be easily loaded and used for various NLP tasks.

In summary, Gensim is more specialized for topic modeling and unsupervised learning, while spaCy offers a broader set of NLP functionalities and pre-trained models for tasks like tokenization, NER, and dependency parsing. The choice between Gensim and spaCy depends on the specific requirements of your NLP project.

#Word Vectors Overview Using Gensim Library

In [None]:
import gensim.downloader as api

# We are loading an model which is trained on google news
wv = api.load('word2vec-google-news-300')



All gensim models are listed on this page: https://github.com/RaRe-Technologies/gensim-data

In [None]:
# Showing the context similarity
wv.similarity(w1="great",w2="good")

0.729151

In [None]:
wv.similarity(w1="profit",w2="loss")

0.34199455

In [None]:
#words which appear in the similar word context

wv.most_similar("good")

[('great', 0.7291510105133057),
 ('bad', 0.7190051078796387),
 ('terrific', 0.6889115571975708),
 ('decent', 0.6837348341941833),
 ('nice', 0.6836092472076416),
 ('excellent', 0.644292950630188),
 ('fantastic', 0.6407778263092041),
 ('better', 0.6120728850364685),
 ('solid', 0.5806034803390503),
 ('lousy', 0.576420247554779)]

In [None]:
wv.most_similar("cat")

[('cats', 0.8099379539489746),
 ('dog', 0.760945737361908),
 ('kitten', 0.7464985251426697),
 ('feline', 0.7326234579086304),
 ('beagle', 0.7150582671165466),
 ('puppy', 0.7075453400611877),
 ('pup', 0.6934291124343872),
 ('pet', 0.6891531348228455),
 ('felines', 0.6755931973457336),
 ('chihuahua', 0.6709762215614319)]

- King - woman + Man = Queen
- France - Paris +Berlin = Germany

In [None]:
wv.most_similar(positive=["france","berlin"],negative=["paris"])

[('germany', 0.5094343423843384),
 ('european', 0.48650455474853516),
 ('german', 0.4714890420436859),
 ('austria', 0.46964022517204285),
 ('swedish', 0.4645182490348816),
 ('Wissenschaft', 0.4532880485057831),
 ('denmark', 0.4477355182170868),
 ('MÃ¼nchen', 0.4438532590866089),
 ('europe', 0.4420619308948517),
 ('belgium', 0.43769752979278564)]

In [None]:
wv.most_similar(positive=["king","woman"],negative=["man"])

[('queen', 0.7118193507194519),
 ('monarch', 0.6189674139022827),
 ('princess', 0.5902431011199951),
 ('crown_prince', 0.5499460697174072),
 ('prince', 0.5377321839332581),
 ('kings', 0.5236844420433044),
 ('Queen_Consort', 0.5235945582389832),
 ('queens', 0.5181134343147278),
 ('sultan', 0.5098593831062317),
 ('monarchy', 0.5087411999702454)]

In [None]:
wv.doesnt_match(["facebook","cat","lion","microsoft"])


'facebook'

#Gensim: Glove
Stanford's page on GloVe: https://nlp.stanford.edu/projects/glove/

In [None]:
glv = api.load("glove-twitter-25")



In [None]:
glv.most_similar("good")

[('too', 0.9648017287254333),
 ('day', 0.9533665180206299),
 ('well', 0.9503170847892761),
 ('nice', 0.9438973665237427),
 ('better', 0.9425962567329407),
 ('fun', 0.9418926239013672),
 ('much', 0.9413353800773621),
 ('this', 0.9387555122375488),
 ('hope', 0.9383506774902344),
 ('great', 0.9378516674041748)]

- The similarities of words compared using genism using word2vec news data and glove technique twitter data the context is different.