# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- Task 3: Word2Vec with Gensim (you are here)
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph

## Task 3: Word2Vec with Gensim
Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [1]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle

In [4]:
with open('Data/train_data.pkl','rb') as f:
    train_data=pickle.load(f)
    

In [6]:
#train_data[:5]

In [7]:
model=Word2Vec()

In [8]:
model.build_vocab(train_data)

In [9]:
%%time
model.train(train_data,total_examples=model.corpus_count,epochs=model.epochs)

(67836400, 81818405)

In [11]:
model.wv.most_similar(['salad','chicken'],topn=20)

[('dressing', 0.6101313829421997),
 ('caesar', 0.5596952438354492),
 ('pheasant', 0.5484460592269897),
 ('mesclun', 0.5353652238845825),
 ('readyr', 0.5255516171455383),
 ('vinaigrette', 0.5215499997138977),
 ('slaw', 0.5203925371170044),
 ('ranch', 0.5169309377670288),
 ('turkey', 0.515552818775177),
 ('romaine', 0.5146552324295044),
 ('island', 0.5140424966812134),
 ('watercress', 0.49467164278030396),
 ('squab', 0.4909105896949768),
 ('lettuces', 0.4863239824771881),
 ('frisee', 0.48447588086128235),
 ('arugula', 0.48353126645088196),
 ('lettuce', 0.4821893572807312),
 ('croutons', 0.4798462986946106),
 ('mizuna', 0.4787168502807617),
 ('salads', 0.476460725069046)]

In [12]:
model.save('Data/w2v.model')