# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- Task 3: Word2Vec with Gensim (you are here)
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph

## Task 3: Word2Vec with Gensim
Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [1]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle

In [4]:
with open('Data/train_data.pkl','rb') as f:
    train_data=pickle.load(f)

In [6]:
train_data[0]

['place',
 'chicken',
 'butter',
 'soup',
 'onion',
 'slow',
 'cooker',
 'water',
 'cover',
 'cover',
 'cook',
 'hours',
 'high',
 'minutes',
 'serving',
 'place',
 'torn',
 'biscuit',
 'dough',
 'slow',
 'cooker',
 'cook',
 'dough',
 'longer',
 'raw',
 'center']

In [13]:
model=Word2Vec()

In [14]:
model.build_vocab(train_data)

In [15]:
%%time
model.train(train_data, total_examples=model.corpus_count, epochs=model.epochs)

CPU times: user 3min 12s, sys: 1.26 s, total: 3min 13s
Wall time: 1min 43s


(68100092, 81403200)

In [16]:
model.wv.most_similar(['salad'], topn=20)

[('dressing', 0.7689329385757446),
 ('mesclun', 0.738060474395752),
 ('vinaigrette', 0.734387218952179),
 ('salads', 0.7296278476715088),
 ('mizuna', 0.7230371832847595),
 ('dressed', 0.6885290145874023),
 ('slaw', 0.6703570485115051),
 ('frisée', 0.6523422002792358),
 ('mache', 0.6513055562973022),
 ('tossed', 0.650983989238739),
 ('lettuces', 0.6474379301071167),
 ('frisee', 0.6405917406082153),
 ('caesar', 0.6285038590431213),
 ('mâche', 0.6162219047546387),
 ('tabbouleh', 0.6149650812149048),
 ('zesty', 0.609714150428772),
 ('watercress', 0.6059816479682922),
 ('rocket', 0.6040061712265015),
 ('arugula', 0.6021600961685181),
 ('cress', 0.6012613773345947)]

In [17]:
model.wv.most_similar(['salad','chicken'], topn=20)

[('dressing', 0.6051104068756104),
 ('vinaigrette', 0.54527348279953),
 ('mesclun', 0.5194751024246216),
 ('pheasant', 0.5127019286155701),
 ('mizuna', 0.5076389312744141),
 ('salads', 0.5019259452819824),
 ('turkey', 0.5018468499183655),
 ('zesty', 0.5012830495834351),
 ('watercress', 0.497923344373703),
 ('lettuces', 0.49642324447631836),
 ('caesar', 0.4825786352157593),
 ('greens', 0.48153409361839294),
 ('romaine', 0.4769057631492615),
 ('slaw', 0.47429728507995605),
 ('frisee', 0.4718213677406311),
 ('squab', 0.4694540798664093),
 ('frisée', 0.4687381982803345),
 ('dress', 0.46055471897125244),
 ('lettuce', 0.4598221778869629),
 ('dressed', 0.4592679440975189)]

In [18]:
model.save('Data/w2v.model')