# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- Task 3: Word2Vec with Gensim (you are here)
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph

## Task 3: Word2Vec with Gensim
Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [None]:
# Word2vec High level Overview:it basically learns the representation of each word in a given corpus such that the words 
#     that have some kind of semantic meanings exist near each other in vector space
    
#     What does it means : the words that have some kind of simmilar context in human language
#         will exits close to each other in mathematical vectors representation
        
#         Example : a cat or dog might me closely related two vectors , and a dog or soccer ball litte bit far

In [1]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle

In [2]:
with open('Data/train_data.pkl', 'rb') as f:
    train_data = pickle.load(f)

In [None]:
train_data[:5]

In [3]:
model = Word2Vec()

In [None]:
#model?

In [4]:
model.build_vocab(train_data)

In [5]:
%%time
model.train(train_data, total_examples=model.corpus_count, epochs=model.epochs)

CPU times: user 3min 1s, sys: 1.1 s, total: 3min 2s
Wall time: 1min 38s


(68106197, 81403200)

In [6]:
res = model.wv.most_similar('lemon chicken garlic onion'.split(), topn=20)
res

[('shallot', 0.6713830232620239),
 ('shallots', 0.6090980768203735),
 ('onions', 0.5515364408493042),
 ('curry', 0.5106434226036072),
 ('scallions', 0.5018374919891357),
 ('oregano', 0.5009315013885498),
 ('cumin', 0.485931932926178),
 ('tomato', 0.4836978316307068),
 ('calamansi', 0.4811422824859619),
 ('paprika', 0.4742642641067505),
 ('chile', 0.47341448068618774),
 ('jalapeno', 0.4631919860839844),
 ('chili', 0.46036261320114136),
 ('tomatoes', 0.45945432782173157),
 ('marjoram', 0.4580693542957306),
 ('parsley', 0.45729953050613403),
 ('mushrooms', 0.4546982944011688),
 ('pepper', 0.45367431640625),
 ('fennel', 0.44458097219467163),
 ('thyme', 0.44416365027427673)]

In [7]:
model.save('Data/w2v.model')