# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- Task 3: Word2Vec with Gensim (you are here)
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph

## Task 3: Word2Vec with Gensim
Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [1]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle

In [3]:
with open('Data/train_data.pkl', 'rb') as f:
    train_data = pickle.load(f)

In [4]:
train_data[0]

['place',
 'chicken',
 'butter',
 'soup',
 'onion',
 'slow',
 'cooker',
 'water',
 'cover',
 'cover',
 'cook',
 'hours',
 'high',
 'minutes',
 'serving',
 'place',
 'torn',
 'biscuit',
 'dough',
 'slow',
 'cooker',
 'cook',
 'dough',
 'longer',
 'raw',
 'center']

In [6]:
model = Word2Vec()

In [8]:
#model?

In [12]:
model.build_vocab(train_data)

In [13]:
%%time
model.train(train_data, total_examples=model.corpus_count, epochs=model.epochs)

CPU times: user 2min 49s, sys: 552 ms, total: 2min 49s
Wall time: 1min 27s


(68096463, 81403200)

In [15]:
model.wv.most_similar(['salad', 'chicken'], topn=20)

[('dressing', 0.6186662316322327),
 ('vinaigrette', 0.5525026321411133),
 ('caesar', 0.537085235118866),
 ('pheasant', 0.534424364566803),
 ('mesclun', 0.5175861120223999),
 ('romaine', 0.5136029720306396),
 ('turkey', 0.5086547136306763),
 ('squab', 0.5047982931137085),
 ('watercress', 0.5045303106307983),
 ('slaw', 0.5032996535301208),
 ('dressed', 0.501069188117981),
 ('thousand', 0.49524760246276855),
 ('ranch', 0.48755571246147156),
 ('lettuces', 0.48373889923095703),
 ('mizuna', 0.48023971915245056),
 ('tabbouleh', 0.4789750874042511),
 ('salads', 0.47481685876846313),
 ('frisee', 0.4722917377948761),
 ('radicchio', 0.4669337868690491),
 ('island', 0.4660421311855316)]

In [16]:
model.save('Data/w2v.model')