# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- Task 3: Word2Vec with Gensim (you are here)
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph

## Task 3: Word2Vec with Gensim
Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [1]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle

In [2]:
with open('Data/train_data.pkl', 'rb') as f:
    train_data = pickle.load(f)

In [5]:
model = Word2Vec()

In [6]:
model.build_vocab(train_data)

In [7]:
%%time
model.train(train_data, total_examples=model.corpus_count, epochs=model.epochs)

CPU times: user 3min 47s, sys: 2.34 s, total: 3min 49s
Wall time: 1min 34s


(68098858, 81403200)

In [8]:
model.wv.most_similar(['salad'], topn=20)

[('mesclun', 0.7682897448539734),
 ('dressing', 0.7653841972351074),
 ('vinaigrette', 0.7246369123458862),
 ('salads', 0.7182444930076599),
 ('mizuna', 0.7095512747764587),
 ('dressed', 0.6819791793823242),
 ('caesar', 0.6771970987319946),
 ('slaw', 0.6765030026435852),
 ('frisée', 0.6610428690910339),
 ('lettuces', 0.6469773054122925),
 ('tossed', 0.6343929171562195),
 ('tabbouleh', 0.6277706027030945),
 ('thousand', 0.6224751472473145),
 ('dress', 0.6223503351211548),
 ('mache', 0.6222996115684509),
 ('mâche', 0.6171838045120239),
 ('frisee', 0.6140679717063904),
 ('watercress', 0.6120907664299011),
 ('micro', 0.6115063428878784),
 ('arugula', 0.6033350229263306)]

In [9]:
model.save('Data/w2v.model')