# Reading and Exploring the Dataset
The dataset we are using here is a subset of Amazon reviews from the Cell Phones & Accessories category. The data is stored as a JSON file and can be read using pandas.

Link to the Dataset: http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Cell_Phones_and_Accessories_5.json.gz

In [1]:
import pandas as pd
import numpy as np
import gensim

In [2]:
df = pd.read_json('./Cell_Phones_and_Accessories_5.json',lines=True)

In [3]:
df.reviewText[0]

"They look good and stick good! I just don't like the rounded shape because I was always bumping it and Siri kept popping up and it was irritating. I just won't buy a product like this again"

In [4]:
reviewText = df.reviewText.apply(gensim.utils.simple_preprocess)

In [5]:
reviewText

0         [they, look, good, and, stick, good, just, don...
1         [these, stickers, work, like, the, review, say...
2         [these, are, awesome, and, make, my, phone, lo...
3         [item, arrived, in, great, time, and, was, in,...
4         [awesome, stays, on, and, looks, great, can, b...
                                ...                        
194434    [works, great, just, like, my, original, one, ...
194435    [great, product, great, packaging, high, quali...
194436    [this, is, great, cable, just, as, good, as, t...
194437    [really, like, it, becasue, it, works, well, w...
194438    [product, as, described, have, wasted, lot, of...
Name: reviewText, Length: 194439, dtype: object

In [6]:
model = gensim.models.Word2Vec(
    window=10,
    min_count=2,
    workers=8
)

In [7]:
model.build_vocab(reviewText)

In [8]:
model.corpus_count

194439

In [9]:
model.epochs

5

In [10]:
model.train(reviewText,total_examples=model.corpus_count,epochs=model.epochs)

(61507268, 83868975)

In [11]:
model.save('W2V.model')

In [12]:
model.wv.most_similar('bad')

[('terrible', 0.6731961965560913),
 ('horrible', 0.6422681212425232),
 ('shabby', 0.6378927826881409),
 ('good', 0.5926379561424255),
 ('pathetic', 0.5830180048942566),
 ('funny', 0.566127359867096),
 ('disappointing', 0.5600265264511108),
 ('awful', 0.5301342606544495),
 ('cheap', 0.5211032032966614),
 ('ridiculous', 0.5045539140701294)]

# Exercise
Train a word2vec model on the Sports & Outdoors Reviews Dataset Once you train a model on this, find the words most similar to 'awful' and find similarities between the following word tuples: ('good', 'great'), ('slow','steady')

http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Sports_and_Outdoors_5.json.gz



In [29]:
df1 = pd.read_json('./Sports_and_Outdoors_5.json',lines=True)

In [30]:
df1

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime
0,AIXZKN4ACSKI,1881509818,David Briner,"[0, 0]",This came in on time and I am veru happy with ...,5,Woks very good,1390694400,"01 26, 2014"
1,A1L5P841VIO02V,1881509818,Jason A. Kramer,"[1, 1]",I had a factory Glock tool that I was using fo...,5,Works as well as the factory tool,1328140800,"02 2, 2012"
2,AB2W04NI4OEAD,1881509818,J. Fernald,"[2, 2]",If you don't have a 3/32 punch or would like t...,4,"It's a punch, that's all.",1330387200,"02 28, 2012"
3,A148SVSWKTJKU6,1881509818,"Jusitn A. Watts ""Maverick9614""","[0, 0]",This works no better than any 3/32 punch you w...,4,It's a punch with a Glock logo.,1328400000,"02 5, 2012"
4,AAAWJ6LW9WMOO,1881509818,Material Man,"[0, 0]",I purchased this thinking maybe I need a speci...,4,"Ok,tool does what a regular punch does.",1366675200,"04 23, 2013"
...,...,...,...,...,...,...,...,...,...
296332,A2XX2A4OJCDNLZ,B00LFPS0CY,RatherLiveInKeyWest,"[2, 3]",This is a water bottle done right. It is a ver...,5,Hydracentials Sporty 25 Oz Stainless Steel Wat...,1405036800,"07 11, 2014"
296333,A3LGT6UZL99IW1,B00LFPS0CY,"Richard C. Drew ""Anaal Nathra/Uthe vas Bethod...","[0, 0]",If you're looking for an insulated water bottl...,5,"Large, incredibly well made water bottle!",1405641600,"07 18, 2014"
296334,ASKZO80Z1RKTR,B00LFPS0CY,Robin Lee,"[0, 0]","This Hydracentials Sporty 25 OZ, double insula...",5,"""Great Water Bottle For Hot Day""......",1405900800,"07 21, 2014"
296335,APRNS6DB68LLV,B00LFPS0CY,"Rob Slaven ""slavenrm@gmail. com""","[1, 1]",As usual I received this item free in exchange...,5,A pretty impressive water bottle. Best I've s...,1405900800,"07 21, 2014"


In [34]:
df1.reviewTextb

0         This came in on time and I am veru happy with ...
1         I had a factory Glock tool that I was using fo...
2         If you don't have a 3/32 punch or would like t...
3         This works no better than any 3/32 punch you w...
4         I purchased this thinking maybe I need a speci...
                                ...                        
296332    This is a water bottle done right. It is a ver...
296333    If you're looking for an insulated water bottl...
296334    This Hydracentials Sporty 25 OZ, double insula...
296335    As usual I received this item free in exchange...
296336    Hydracentials insulated 25 oz water bottle.Thi...
Name: reviewText, Length: 296337, dtype: object

In [35]:
reviewText1 = df1.reviewText.apply(gensim.utils.simple_preprocess)

In [36]:
model1 = gensim.models.Word2Vec(
    window=8,
    min_count=2,
    workers=30
)

In [37]:
model1.build_vocab(reviewText1)

In [39]:
model1.train(reviewText1,total_examples=model1.corpus_count,epochs=model1.epochs)

(91341881, 121496535)

In [40]:
model1.save('W2V_1.model')

In [42]:
model1

<gensim.models.word2vec.Word2Vec at 0x23f334f1300>

In [43]:
import tensorflow as tf
from tensorflow.keras import layers,models