# Part 02A - NLP Preprocessing of Amazon Reviews (Spacy)

### Amazon Data Intro

In [1]:
from IPython.display import display, Markdown
with open("data/Amazon Product Reviews.md") as f:
    info = f.read()

display(Markdown(info))

# Amazon Product Reviews

- URL: https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews 

## Description

This is a large crawl of product reviews from Amazon. This dataset contains 82.83 million unique reviews, from around 20 million users.

## Basic statistics

| Ratings:  | 82.83 million        |
| --------- | -------------------- |
| Users:    | 20.98 million        |
| Items:    | 9.35 million         |
| Timespan: | May 1996 - July 2014 |

## Metadata

- reviews and ratings
- item-to-item relationships (e.g. "people who bought X also bought Y")
- timestamps
- helpfulness votes
- product image (and CNN features)
- price
- category
- salesRank

## Example

```
{  "reviewerID": "A2SUAM1J3GNN3B",  "asin": "0000013714",  "reviewerName": "J. McDonald",  "helpful": [2, 3],  "reviewText": "I bought this for my husband who plays the piano.  He is having a wonderful time playing these old hymns.  The music  is at times hard to read because we think the book was published for singing from more than playing from.  Great purchase though!",  "overall": 5.0,  "summary": "Heavenly Highway Hymns",  "unixReviewTime": 1252800000,  "reviewTime": "09 13, 2009" }
```

## Download link

See the [Amazon Dataset Page](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/) for download information.

The 2014 version of this dataset is [also available](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon/links.html).

## Citation

Please cite the following if you use the data:

**Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering**

R. He, J. McAuley

*WWW*, 2016
[pdf](https://cseweb.ucsd.edu/~jmcauley/pdfs/www16a.pdf)

**Image-based recommendations on styles and substitutes**

J. McAuley, C. Targett, J. Shi, A. van den Hengel

*SIGIR*, 2015
[pdf](https://cseweb.ucsd.edu/~jmcauley/pdfs/sigir15.pdf)

In [2]:
import os, sys, joblib,json
# sys.path.append(os.path.abspath("../NLP/"))
# sys.path.append(os.path.abspath("../"))
# sys.path.append(os.path.abspath("../../"))
%load_ext autoreload
%autoreload 2
    
# import custom_functions as fn
# import project_functions as pf

# !pip install -U dojo_ds -q
import dojo_ds as ds
ds.__version__

'1.0.9'

In [3]:
import matplotlib.pyplot as plt
import missingno
import matplotlib as mpl
import seaborn as sns
import numpy as np
import pandas as pd

pd.set_option("display.max_columns",50)
# pd.set_option('display.max_colwidth', 250)

fav_style = ('ggplot','tableau-colorblind10')
fav_context  ={'context':'notebook', 'font_scale':1.1}
plt.style.use(fav_style)
sns.set_context(**fav_context)
plt.rcParams['savefig.transparent'] = False
plt.rcParams['savefig.bbox'] = 'tight'

In [4]:
from pprint import pprint
FPATHS_FILE = "config/filepaths.json"
with open(FPATHS_FILE) as f:
    FPATHS = json.load(f)
pprint(FPATHS)

{'data': {'app': {},
          'cleaned': {'asin-id-title-dict_json': 'data/metadata/amazon-groceries-asin-titles-lookup.json',
                      'metadata_csv-gz': 'data/metadata/amazon-metadata-groceries-combined.csv.gz',
                      'reviews-by-years_dict': {'dir': 'data/reviews-by-year/',
                                                'glob': 'data/reviews-by-year/*.*'}},
          'ml-nlp': {'reviews-with-target_json': 'data/modeling/processed-nlp-reviews-for-ml.json',
                     'test_joblib': 'data/modeling/testing-data.joblib',
                     'train_joblib': 'data/modeling/training-data.joblib'},
          'ml-tabular': {'reviews-with-ml-target_json': 'Data/modeling/processed-movie-data-for-ml.json',
                         'test_joblib': 'data/modeling/testing-data.joblib',
                         'train_joblib': 'data/modeling/training-data.joblib'},
          'nn': {'test_dir': 'data/modeling/testing-data-tf/',
                 'train_dir': '

# Load the Data

We will load our **corpus** of Amazon Reviews for Miracle Noodle products.

In [5]:
fpath_reviews = FPATHS['data']['subset']['reviews-subset_selected-brand_csv']
fpath_reviews

'data/subset/amazon-reviews-subset-brand-Miracle Noodle.csv'

In [6]:
df = pd.read_csv(fpath_reviews)#'data/subset/amazon-reviews-subset-brand-Miracle Noodle.csv.gz')
df.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category
0,B007JINB0W,A3Y51NV9HU5T2,"Great pasta taste and feel, but the spell in t...",Four Stars,4.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...
1,B007JINB0W,A3D7EFSRC6Y9MP,The texture just made it a little strange to e...,Okay but don't like texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...
2,B007JINB0W,A4AM5KBP3I2R,The herb flavor makes the odd texture of shira...,Go for the green noodles,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...
3,B007JINB0W,A3GHK4IL78DB7Y,I didn't have a problem at all with a half fil...,Its an awesome substitute.,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...
4,B007JINB0W,AH3B94LQOPPY6,They taste like whatever you cook them with.,Five Stars,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...


In [7]:
df.isna().sum()

asin          0
reviewerID    0
reviewText    0
summary       0
overall       0
year          0
title         0
brand         0
category      0
dtype: int64

In [8]:
# Check for duplicated review text
df.duplicated(subset=['reviewerID','reviewText']).sum()

0

In [9]:
df.shape

(4363, 9)

### Combine All Review Text

- The reviews are split into 2 parts. The reviewText, which is the majority of the review, and the summary, which is a 1-line summary of the review (that often includes the actual rating: e.g., "Fours stars- best vacuum)

In [10]:
df['review-text-full'] = df['summary'] + ": " + df['reviewText']
df.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full
0,B007JINB0W,A3Y51NV9HU5T2,"Great pasta taste and feel, but the spell in t...",Four Stars,4.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"Four Stars: Great pasta taste and feel, but th..."
1,B007JINB0W,A3D7EFSRC6Y9MP,The texture just made it a little strange to e...,Okay but don't like texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Okay but don't like texture: The texture just ...
2,B007JINB0W,A4AM5KBP3I2R,The herb flavor makes the odd texture of shira...,Go for the green noodles,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Go for the green noodles: The herb flavor make...
3,B007JINB0W,A3GHK4IL78DB7Y,I didn't have a problem at all with a half fil...,Its an awesome substitute.,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Its an awesome substitute.: I didn't have a pr...
4,B007JINB0W,AH3B94LQOPPY6,They taste like whatever you cook them with.,Five Stars,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Five Stars: They taste like whatever you cook ...


### Removing HTML/HTTPS (Orig From Notebook 6B)

In [11]:
df['review-text-full_raw'] = df['review-text-full'].copy()
df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
0,B007JINB0W,A3Y51NV9HU5T2,"Great pasta taste and feel, but the spell in t...",Four Stars,4.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"Four Stars: Great pasta taste and feel, but th...","Four Stars: Great pasta taste and feel, but th..."
1,B007JINB0W,A3D7EFSRC6Y9MP,The texture just made it a little strange to e...,Okay but don't like texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Okay but don't like texture: The texture just ...,Okay but don't like texture: The texture just ...
2,B007JINB0W,A4AM5KBP3I2R,The herb flavor makes the odd texture of shira...,Go for the green noodles,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Go for the green noodles: The herb flavor make...,Go for the green noodles: The herb flavor make...
3,B007JINB0W,A3GHK4IL78DB7Y,I didn't have a problem at all with a half fil...,Its an awesome substitute.,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Its an awesome substitute.: I didn't have a pr...,Its an awesome substitute.: I didn't have a pr...
4,B007JINB0W,AH3B94LQOPPY6,They taste like whatever you cook them with.,Five Stars,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Five Stars: They taste like whatever you cook ...,Five Stars: They taste like whatever you cook ...
...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,A73IG1ED6S0JR,Product arrived with two of the bags punctured...,would not recomend,1.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,would not recomend: Product arrived with two o...,would not recomend: Product arrived with two o...
4359,B007JINB0W,A1XZ2H0MYG54M0,Ok.,Five Stars,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Five Stars: Ok.,Five Stars: Ok.
4360,B007JINB0W,A3I2YF0MXB7P0B,I like these noodles but the spinach ones just...,"Not awful, but now I know why these were on sale.",2.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"Not awful, but now I know why these were on sa...","Not awful, but now I know why these were on sa..."
4361,B007JINB0W,A2UELLFLITPMT1,Truly horrific. Like eating dead worms.,Don't even try it.,1.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Don't even try it.: Truly horrific. Like eatin...,Don't even try it.: Truly horrific. Like eatin...


In [12]:
# Checking for links
df.loc[df['review-text-full'].str.contains('http')]


Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
487,B007JINB0W,A162S75UMDTC,I first heard about these Shirataki noodles on...,surprisingly decent,4.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,surprisingly decent: I first heard about these...,surprisingly decent: I first heard about these...
804,B007JINB0W,A25ZES0OTED0S5,"This stuff is repugnant. I cooked the ""Fettuc...",Disgusting,1.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Disgusting: This stuff is repugnant. I cooked...,Disgusting: This stuff is repugnant. I cooked...
1500,B007JINB0W,A25Y0KLV7I19FA,"<div id=""video-block-R2QVYQA389CT7S"" class=""a-...",Family love it !!!,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"Family love it !!!: <div id=""video-block-R2QVY...","Family love it !!!: <div id=""video-block-R2QVY..."
2770,B007JINB0W,A1VDTM4ITCSHQ8,We have eaten shirataki noodles for many years...,Great alternative to heavy strarches!,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Great alternative to heavy strarches!: We have...,Great alternative to heavy strarches!: We have...
2865,B007JINB0W,A3J6ABN4ZOG502,http://www.amazon.com/gp/product/B007JINB0W?re...,http: //www. amazon. com/gp/product/B007JINB0W?,5.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,http: //www. amazon. com/gp/product/B007JINB0W...,http: //www. amazon. com/gp/product/B007JINB0W...
3566,B007JINB0W,A2PIOAUQSBG074,I used to buy yam noodles in the local asian m...,Hard to chew....,2.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Hard to chew....: I used to buy yam noodles in...,Hard to chew....: I used to buy yam noodles in...


In [13]:
# Checking for raw html
df.loc[df['review-text-full_raw'].str.contains('<')]

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
914,B007JINB0W,A237SW9SPH1DAD,"Holly guacamole, I love these things! Follow i...","These make your plate ""full"" and plenty.",5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"These make your plate ""full"" and plenty.: Holl...","These make your plate ""full"" and plenty.: Holl..."
1124,B007JINB0W,A14A4YYKPLYY26,"When I decided to buy this&nbsp;<a data-hook=""...",Meh! Disappointing..............Tastes NOTHING...,2.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Meh! Disappointing..............Tastes NOTHING...,Meh! Disappointing..............Tastes NOTHING...
1142,B007JINB0W,A3UEE22RNGQ2L8,This product has seriously changed my LIFE. I ...,"ZERO CALORIES, ZERO CARBS and EXACTLY like spa...",5.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"ZERO CALORIES, ZERO CARBS and EXACTLY like spa...","ZERO CALORIES, ZERO CARBS and EXACTLY like spa..."
1240,B007JINB0W,A14A4YYKPLYY26,"Earlier this year, I started a wheat-free and ...","I Can Have Noodles Again! Now, If Only There C...",4.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"I Can Have Noodles Again! Now, If Only There C...","I Can Have Noodles Again! Now, If Only There C..."
1500,B007JINB0W,A25Y0KLV7I19FA,"<div id=""video-block-R2QVYQA389CT7S"" class=""a-...",Family love it !!!,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"Family love it !!!: <div id=""video-block-R2QVY...","Family love it !!!: <div id=""video-block-R2QVY..."
1586,B007JINB0W,A14A4YYKPLYY26,"Earlier this year, I started a wheat-free and ...","I Can Have Noodles Again! Now, If Only There C...",5.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"I Can Have Noodles Again! Now, If Only There C...","I Can Have Noodles Again! Now, If Only There C..."
1632,B007JINB0W,AN79B2EUCG5O,bought the variety pack... the rice and the an...,Be prepaired to experement to find the best wa...,3.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Be prepaired to experement to find the best wa...,Be prepaired to experement to find the best wa...
2111,B007JINB0W,A2M9IS41H1HJAI,Quick update on 11/21/14\n\nJust started putti...,"Follow the directions, and these will be reall...",5.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"Follow the directions, and these will be reall...","Follow the directions, and these will be reall..."
2203,B007JINB0W,AD4TI3BYQ6U7I,My daughter swears by this product. She's on ...,Noodles on a low carb diet? Yes !!!,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Noodles on a low carb diet? Yes !!!: My daugh...,Noodles on a low carb diet? Yes !!!: My daugh...
2239,B007JINB0W,A1FFJRP833Y1MH,We love all the Miracle noodles but the&nbsp;<...,Delicious!!,5.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Delicious!!: We love all the Miracle noodles b...,Delicious!!: We love all the Miracle noodles b...


### Remove HTML Tags

In [14]:
import re

# Regular expression to match HTML tags
regex_html = r"<[^>]*>"

# Apply the regex to the DataFrame column using str.replace
df['review-text-full'] = df['review-text-full'].str.replace(regex_html, '', regex=True)
df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
0,B007JINB0W,A3Y51NV9HU5T2,"Great pasta taste and feel, but the spell in t...",Four Stars,4.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"Four Stars: Great pasta taste and feel, but th...","Four Stars: Great pasta taste and feel, but th..."
1,B007JINB0W,A3D7EFSRC6Y9MP,The texture just made it a little strange to e...,Okay but don't like texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Okay but don't like texture: The texture just ...,Okay but don't like texture: The texture just ...
2,B007JINB0W,A4AM5KBP3I2R,The herb flavor makes the odd texture of shira...,Go for the green noodles,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Go for the green noodles: The herb flavor make...,Go for the green noodles: The herb flavor make...
3,B007JINB0W,A3GHK4IL78DB7Y,I didn't have a problem at all with a half fil...,Its an awesome substitute.,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Its an awesome substitute.: I didn't have a pr...,Its an awesome substitute.: I didn't have a pr...
4,B007JINB0W,AH3B94LQOPPY6,They taste like whatever you cook them with.,Five Stars,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Five Stars: They taste like whatever you cook ...,Five Stars: They taste like whatever you cook ...
...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,A73IG1ED6S0JR,Product arrived with two of the bags punctured...,would not recomend,1.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,would not recomend: Product arrived with two o...,would not recomend: Product arrived with two o...
4359,B007JINB0W,A1XZ2H0MYG54M0,Ok.,Five Stars,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Five Stars: Ok.,Five Stars: Ok.
4360,B007JINB0W,A3I2YF0MXB7P0B,I like these noodles but the spinach ones just...,"Not awful, but now I know why these were on sale.",2.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,"Not awful, but now I know why these were on sa...","Not awful, but now I know why these were on sa..."
4361,B007JINB0W,A2UELLFLITPMT1,Truly horrific. Like eating dead worms.,Don't even try it.,1.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodl...,Don't even try it.: Truly horrific. Like eatin...,Don't even try it.: Truly horrific. Like eatin...


In [15]:
# Compare original with cleaned
compare_cols = ['review-text-full_raw','review-text-full']

pd.set_option('display.max_colwidth',250)

In [16]:
df.loc[df['review-text-full_raw'].str.contains('<'), compare_cols]

Unnamed: 0,review-text-full_raw,review-text-full
914,"These make your plate ""full"" and plenty.: Holly guacamole, I love these things! Follow instructions and get creative with spices and sauces. The ""funky"" smell so many have mentioned is no big deal and goes away. Texture is good, specially if you ...","These make your plate ""full"" and plenty.: Holly guacamole, I love these things! Follow instructions and get creative with spices and sauces. The ""funky"" smell so many have mentioned is no big deal and goes away. Texture is good, specially if you ..."
1124,"Meh! Disappointing..............Tastes NOTHING Like Real Rice!!!: When I decided to buy this&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Rice/dp/B00BP36S7U/ref=cm_cr_arp_d_rvw_txt?ie=UTF8"">Miracle Noodle Ri...","Meh! Disappointing..............Tastes NOTHING Like Real Rice!!!: When I decided to buy this&nbsp;Miracle Noodle Rice&nbsp;I did so, after having bought the&nbsp;Miracle Noodle Angel Hair Pasta&nbsp;and ABSOLUTELY LOVING it. Because I am on a&nbs..."
1142,"ZERO CALORIES, ZERO CARBS and EXACTLY like spaghetti. Miracle noodles, indeed. Changed my life!: This product has seriously changed my LIFE. I fight every day to keep my weight at its current level, and I simply must avoid carbs. The conflict is ...","ZERO CALORIES, ZERO CARBS and EXACTLY like spaghetti. Miracle noodles, indeed. Changed my life!: This product has seriously changed my LIFE. I fight every day to keep my weight at its current level, and I simply must avoid carbs. The conflict is ..."
1240,"I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;<a data-hook=""product-link-linked"" cla...","I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;Wheat Belly&nbsp;diet, and among the m..."
1500,"Family love it !!!: <div id=""video-block-R2QVYQA389CT7S"" class=""a-section a-spacing-small a-spacing-top-mini video-block""></div><input type=""hidden"" name="""" value=""https://images-na.ssl-images-amazon.com/images/I/91E2G7ukhBS.mp4"" class=""video-url...",Family love it !!!: &nbsp;Love this stuff !!!! Guilt Free perfect if your in a weight loss journey like I am!!! Easy to cook !!!! Will order more
1586,"I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;<a data-hook=""product-link-linked"" cla...","I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;Wheat Belly&nbsp;diet, and among the m..."
1632,"Be prepaired to experement to find the best way to eat them.: bought the variety pack... the rice and the angle hair are ok. I think for me they are thin/small enough to not be a substantial part of a bite, so less contribution to the mouth feel...","Be prepaired to experement to find the best way to eat them.: bought the variety pack... the rice and the angle hair are ok. I think for me they are thin/small enough to not be a substantial part of a bite, so less contribution to the mouth feel..."
2111,"Follow the directions, and these will be really really good.: Quick update on 11/21/14\n\nJust started putting Old Bay Seasoning in the water that I boil these in. Seems to add some flavor to them but it also changes the color. Looks very close t...","Follow the directions, and these will be really really good.: Quick update on 11/21/14\n\nJust started putting Old Bay Seasoning in the water that I boil these in. Seems to add some flavor to them but it also changes the color. Looks very close t..."
2203,"Noodles on a low carb diet? Yes !!!: My daughter swears by this product. She's on a (very) low carb diet and there are not many noodle like products which she can eat and keep her carbs to a minimum. Yes, these noodles smell fishy upon opening...","Noodles on a low carb diet? Yes !!!: My daughter swears by this product. She's on a (very) low carb diet and there are not many noodle like products which she can eat and keep her carbs to a minimum. Yes, these noodles smell fishy upon opening..."
2239,"Delicious!!: We love all the Miracle noodles but the&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Shirataki-Zero-Carb-Gluten-Free-Pasta-Garlic-and-Herb-Fettuccini-7-Ounce/dp/B01N91YE5A/ref=cm_cr_arp_d_rvw_tx...","Delicious!!: We love all the Miracle noodles but the&nbsp;Miracle Noodle Shirataki Zero Carb Gluten Free Pasta, Garlic and Herb Fettuccini, 7 Ounce&nbsp;is especially great. Marry it to some canned clams and basil and herbs fettuccine sauce and ..."


### Replace Links with `[LINK]`

In [17]:
regex_url = "https?:\/\/(?:www\.)?[^\s]+"
df.loc[df['review-text-full'].str.contains(regex_url), compare_cols]

Unnamed: 0,review-text-full_raw,review-text-full
487,"surprisingly decent: I first heard about these Shirataki noodles on an episode of BEGIN Japanology dealing with potatos: https://www.youtube.com/watch?v=FPwbbdo2p6c\n\nSeemed too good to be true - a food product that's almost entirely fiber, wit...","surprisingly decent: I first heard about these Shirataki noodles on an episode of BEGIN Japanology dealing with potatos: https://www.youtube.com/watch?v=FPwbbdo2p6c\n\nSeemed too good to be true - a food product that's almost entirely fiber, wit..."
804,"Disgusting: This stuff is repugnant. I cooked the ""Fettuccine"" noodles exactly as specified on the Miracle Noodle website, https://www.miraclenoodle.com/t-how-to-cook-shirataki-noodles.aspx - to summarize:\n\n1. Remove from package, rinse for 1-...","Disgusting: This stuff is repugnant. I cooked the ""Fettuccine"" noodles exactly as specified on the Miracle Noodle website, https://www.miraclenoodle.com/t-how-to-cook-shirataki-noodles.aspx - to summarize:\n\n1. Remove from package, rinse for 1-..."
2770,"Great alternative to heavy strarches!: We have eaten shirataki noodles for many years because of my husbands diabetes, but this was our first time trying the shirataki Miracle Rice, and it was fantastic! (I actually like it more than the noodles....","Great alternative to heavy strarches!: We have eaten shirataki noodles for many years because of my husbands diabetes, but this was our first time trying the shirataki Miracle Rice, and it was fantastic! (I actually like it more than the noodles...."
2865,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37
3566,"Hard to chew....: I used to buy yam noodles in the local asian market. I love them and wanted to find them on Amazon, and they are (http://www.amazon.com/JFC-Brown-Shirataki-Yam-Noodles/dp/B002FDW6H0/ref=sr_1_cc_3?s=aps&ie=UTF8&qid=1395427420&sr=...","Hard to chew....: I used to buy yam noodles in the local asian market. I love them and wanted to find them on Amazon, and they are (http://www.amazon.com/JFC-Brown-Shirataki-Yam-Noodles/dp/B002FDW6H0/ref=sr_1_cc_3?s=aps&ie=UTF8&qid=1395427420&sr=..."


In [18]:
df['review-text-full'] = df['review-text-full'].str.replace(regex_url, '[LINK]', regex=True)
df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
0,B007JINB0W,A3Y51NV9HU5T2,"Great pasta taste and feel, but the spell in the packaged is SKRONG!",Four Stars,4.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Four Stars: Great pasta taste and feel, but the spell in the packaged is SKRONG!","Four Stars: Great pasta taste and feel, but the spell in the packaged is SKRONG!"
1,B007JINB0W,A3D7EFSRC6Y9MP,"The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.",Okay but don't like texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Okay but don't like texture: The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.","Okay but don't like texture: The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices."
2,B007JINB0W,A4AM5KBP3I2R,The herb flavor makes the odd texture of shirataki much more palatable.,Go for the green noodles,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Go for the green noodles: The herb flavor makes the odd texture of shirataki much more palatable.,Go for the green noodles: The herb flavor makes the odd texture of shirataki much more palatable.
3,B007JINB0W,A3GHK4IL78DB7Y,I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is to boil it will a cube of bo...,Its an awesome substitute.,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Its an awesome substitute.: I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is ...,Its an awesome substitute.: I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is ...
4,B007JINB0W,AH3B94LQOPPY6,They taste like whatever you cook them with.,Five Stars,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Five Stars: They taste like whatever you cook them with.,Five Stars: They taste like whatever you cook them with.
...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,A73IG1ED6S0JR,Product arrived with two of the bags punctured. Also smells really really bad.,would not recomend,1.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,would not recomend: Product arrived with two of the bags punctured. Also smells really really bad.,would not recomend: Product arrived with two of the bags punctured. Also smells really really bad.
4359,B007JINB0W,A1XZ2H0MYG54M0,Ok.,Five Stars,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Five Stars: Ok.,Five Stars: Ok.
4360,B007JINB0W,A3I2YF0MXB7P0B,"I like these noodles but the spinach ones just taste odd. They have a bitter flavor compared to the other ones. I don't think it tastes like Spinach, it just tastes bitter and odd. Now I understand why these were on sale compared to the other one...","Not awful, but now I know why these were on sale.",2.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Not awful, but now I know why these were on sale.: I like these noodles but the spinach ones just taste odd. They have a bitter flavor compared to the other ones. I don't think it tastes like Spinach, it just tastes bitter and odd. Now I understa...","Not awful, but now I know why these were on sale.: I like these noodles but the spinach ones just taste odd. They have a bitter flavor compared to the other ones. I don't think it tastes like Spinach, it just tastes bitter and odd. Now I understa..."
4361,B007JINB0W,A2UELLFLITPMT1,Truly horrific. Like eating dead worms.,Don't even try it.,1.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Don't even try it.: Truly horrific. Like eating dead worms.,Don't even try it.: Truly horrific. Like eating dead worms.


In [19]:
df.loc[df['review-text-full'].str.contains('http'), compare_cols]

Unnamed: 0,review-text-full_raw,review-text-full
2865,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37,http: //www. amazon. com/gp/product/B007JINB0W?: [LINK]


## Part 2) Spacy Preprocessing for EDA

**1) Data Preprocessing:**

- Load and inspect the dataset.
    - How many reviews?
    - What does the distribution of ratings look like?
    - Any null values?



- Use the rating column to create a new target column with two groups: high-rating and low-rating groups.
    - We recommend defining "High-rating" reviews as any review with a rating >=9; and "Low-rating" reviews as any review with a rating <=4. The middle ratings between 4 and 9 will be excluded from the analysis.
    - You may use an alternative definition for High and Low reviews, but justify your choice in your notebook/README.



- Utilize NLTK and SpaCy for basic text processing, including:

    - removing stopwords
    - tokenization
    - lemmatization
    - Tips:
        - Be sure to create a custom nlp object and disable the named entity recognizer. Otherwise, processing will take a very long time!
        - **You will want to create several versions of the data, lemmatized, tokenized, lemmatized, and joined back to one string per review, and tokenized and joined back to one string per review.** This will be useful for different analysis and modeling techniques.

    

- Save your processed data frame in a **joblib** file saved in the "Data-NLP/" folder for future modeling.

    

In [22]:
# import spacy
# # Disable parser and ner
# nlp_light = spacy.load("en_core_web_sm", disable=['parser','ner'])
# # Print active components
# nlp_light.pipe_names

In [23]:
import spacy
# Custom NLP Object
nlp_custom = ds.nlp.make_custom_nlp(disable=['ner'],#'parser'],
                                contractions=[],
                            stopwords_to_add=["★"])
nlp_custom

<spacy.lang.en.English at 0x2afe8bc70>

> Changed review_text column to remove HTML and URLs as of 01/22/24

In [24]:
%%time
print(f"- Running full spacy preprocessing code (this will take several minutes).")
df = df.copy()
df["tokens-dirty"] = ds.nlp.batch_preprocess_texts(
    df["review-text-full"],
    remove_stopwords=False,
    remove_punct=True,
    use_lemmas=False,
    nlp=nlp_custom,
)
df["tokens"] = ds.nlp.batch_preprocess_texts(
    df["review-text-full"],
    remove_stopwords=True,
    remove_punct=True,
    use_lemmas=False,
    nlp=nlp_custom,
)
df["lemmas"] = ds.nlp.batch_preprocess_texts(
    df["review-text-full"],
    remove_stopwords=True,
    remove_punct=True,
    use_lemmas=True,
    nlp=nlp_custom,
)

## Make string versions of processed text
df["tokens-dirty-joined"] = df["tokens-dirty"].map(lambda x: " ".join(x))
df["tokens-joined"] = df["tokens"].map(lambda x: " ".join(x))
df["lemmas-joined"] = df["lemmas"].map(lambda x: " ".join(x))

df.head()

- Running full spacy preprocessing code (this will take several minutes).


4363it [00:45, 94.99it/s]  
4363it [00:43, 100.92it/s] 
4363it [00:41, 104.01it/s] 

CPU times: user 9.67 s, sys: 1.27 s, total: 10.9 s
Wall time: 2min 11s





Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw,tokens-dirty,tokens,lemmas,tokens-dirty-joined,tokens-joined,lemmas-joined
0,B007JINB0W,A3Y51NV9HU5T2,"Great pasta taste and feel, but the spell in the packaged is SKRONG!",Four Stars,4.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Four Stars: Great pasta taste and feel, but the spell in the packaged is SKRONG!","Four Stars: Great pasta taste and feel, but the spell in the packaged is SKRONG!","[four, stars, great, pasta, taste, and, feel, but, the, spell, in, the, packaged, is, skrong]","[stars, great, pasta, taste, feel, spell, packaged, skrong]","[star, great, pasta, taste, feel, spell, package, skrong]",four stars great pasta taste and feel but the spell in the packaged is skrong,stars great pasta taste feel spell packaged skrong,star great pasta taste feel spell package skrong
1,B007JINB0W,A3D7EFSRC6Y9MP,"The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.",Okay but don't like texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Okay but don't like texture: The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.","Okay but don't like texture: The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.","[okay, but, do, n't, like, texture, the, texture, just, made, it, a, little, strange, to, eat, otherwise, the, flavor, is, okay, very, bland, so, add, spices]","[okay, like, texture, texture, little, strange, eat, flavor, okay, bland, add, spices]","[okay, like, texture, texture, little, strange, eat, flavor, okay, bland, add, spice]",okay but do n't like texture the texture just made it a little strange to eat otherwise the flavor is okay very bland so add spices,okay like texture texture little strange eat flavor okay bland add spices,okay like texture texture little strange eat flavor okay bland add spice
2,B007JINB0W,A4AM5KBP3I2R,The herb flavor makes the odd texture of shirataki much more palatable.,Go for the green noodles,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Go for the green noodles: The herb flavor makes the odd texture of shirataki much more palatable.,Go for the green noodles: The herb flavor makes the odd texture of shirataki much more palatable.,"[go, for, the, green, noodles, the, herb, flavor, makes, the, odd, texture, of, shirataki, much, more, palatable]","[green, noodles, herb, flavor, makes, odd, texture, shirataki, palatable]","[green, noodle, herb, flavor, make, odd, texture, shirataki, palatable]",go for the green noodles the herb flavor makes the odd texture of shirataki much more palatable,green noodles herb flavor makes odd texture shirataki palatable,green noodle herb flavor make odd texture shirataki palatable
3,B007JINB0W,A3GHK4IL78DB7Y,I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is to boil it will a cube of bo...,Its an awesome substitute.,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Its an awesome substitute.: I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is ...,Its an awesome substitute.: I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is ...,"[its, an, awesome, substitute, i, did, n't, have, a, problem, at, all, with, a, half, filled, bag, or, anything, that, other, users, said, i, was, concerned, at, first, but, i, took, a, chance, and, there, were, no, problems, i, find, the, best, ...","[awesome, substitute, problem, half, filled, bag, users, said, concerned, took, chance, problems, find, best, way, rid, initial, smell, boil, cube, bouillon, texture, good, notice, replace, noodle, eat, nt, mind]","[awesome, substitute, problem, half, fill, bag, user, say, concern, take, chance, problem, find, good, way, rid, initial, smell, boil, cube, bouillon, texture, good, notice, replace, noodle, eat, not, mind]",its an awesome substitute i did n't have a problem at all with a half filled bag or anything that other users said i was concerned at first but i took a chance and there were no problems i find the best way to get rid of the initial smell is to b...,awesome substitute problem half filled bag users said concerned took chance problems find best way rid initial smell boil cube bouillon texture good notice replace noodle eat nt mind,awesome substitute problem half fill bag user say concern take chance problem find good way rid initial smell boil cube bouillon texture good notice replace noodle eat not mind
4,B007JINB0W,AH3B94LQOPPY6,They taste like whatever you cook them with.,Five Stars,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Five Stars: They taste like whatever you cook them with.,Five Stars: They taste like whatever you cook them with.,"[five, stars, they, taste, like, whatever, you, cook, them, with]","[stars, taste, like, cook]","[star, taste, like, cook]",five stars they taste like whatever you cook them with,stars taste like cook,star taste like cook


## Save Preprocessed Reviews

### Saving a JSON file

In [25]:
# df = df.set_index("review_id")#, errors='ignore')
# df

In [26]:
# fpath_json = "Data-NLP/processed-nlp-data.json"
fpath_json = FPATHS['data']['processed-nlp']['processed-reviews-spacy_json']
fpath_json

'data/processed/processed-reviews.json'

In [27]:
df.head(2).to_json(orient='index')

'{"0":{"asin":"B007JINB0W","reviewerID":"A3Y51NV9HU5T2","reviewText":"Great pasta taste and feel, but the spell in the packaged is SKRONG!","summary":"Four Stars","overall":4.0,"year":2018,"title":"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)","brand":"Miracle Noodle","category":"Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki","review-text-full":"Four Stars: Great pasta taste and feel, but the spell in the packaged is SKRONG!","review-text-full_raw":"Four Stars: Great pasta taste and feel, but the spell in the packaged is SKRONG!","tokens-dirty":["four","stars","great","pasta","taste","and","feel","but","the","spell","in","the","packaged","is","skrong"],"tokens":["stars","great","pasta","taste","feel","spell","packaged","skrong"],"lemmas":["star","great","pasta","taste","feel","spell","package","skrong"],"tokens-dirty-joined":"four stars great pasta taste and feel but the spell in the packaged is skrong","tokens-joined":"s

In [28]:
# Save to json
df.to_json(fpath_json)

In [29]:
temp_df = pd.read_json(fpath_json)#.reset_index(drop=False)
temp_df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw,tokens-dirty,tokens,lemmas,tokens-dirty-joined,tokens-joined,lemmas-joined
0,B007JINB0W,A3Y51NV9HU5T2,"Great pasta taste and feel, but the spell in the packaged is SKRONG!",Four Stars,4,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Four Stars: Great pasta taste and feel, but the spell in the packaged is SKRONG!","Four Stars: Great pasta taste and feel, but the spell in the packaged is SKRONG!","[four, stars, great, pasta, taste, and, feel, but, the, spell, in, the, packaged, is, skrong]","[stars, great, pasta, taste, feel, spell, packaged, skrong]","[star, great, pasta, taste, feel, spell, package, skrong]",four stars great pasta taste and feel but the spell in the packaged is skrong,stars great pasta taste feel spell packaged skrong,star great pasta taste feel spell package skrong
1,B007JINB0W,A3D7EFSRC6Y9MP,"The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.",Okay but don't like texture,3,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Okay but don't like texture: The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.","Okay but don't like texture: The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.","[okay, but, do, n't, like, texture, the, texture, just, made, it, a, little, strange, to, eat, otherwise, the, flavor, is, okay, very, bland, so, add, spices]","[okay, like, texture, texture, little, strange, eat, flavor, okay, bland, add, spices]","[okay, like, texture, texture, little, strange, eat, flavor, okay, bland, add, spice]",okay but do n't like texture the texture just made it a little strange to eat otherwise the flavor is okay very bland so add spices,okay like texture texture little strange eat flavor okay bland add spices,okay like texture texture little strange eat flavor okay bland add spice
2,B007JINB0W,A4AM5KBP3I2R,The herb flavor makes the odd texture of shirataki much more palatable.,Go for the green noodles,5,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Go for the green noodles: The herb flavor makes the odd texture of shirataki much more palatable.,Go for the green noodles: The herb flavor makes the odd texture of shirataki much more palatable.,"[go, for, the, green, noodles, the, herb, flavor, makes, the, odd, texture, of, shirataki, much, more, palatable]","[green, noodles, herb, flavor, makes, odd, texture, shirataki, palatable]","[green, noodle, herb, flavor, make, odd, texture, shirataki, palatable]",go for the green noodles the herb flavor makes the odd texture of shirataki much more palatable,green noodles herb flavor makes odd texture shirataki palatable,green noodle herb flavor make odd texture shirataki palatable
3,B007JINB0W,A3GHK4IL78DB7Y,I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is to boil it will a cube of bo...,Its an awesome substitute.,5,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Its an awesome substitute.: I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is ...,Its an awesome substitute.: I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is ...,"[its, an, awesome, substitute, i, did, n't, have, a, problem, at, all, with, a, half, filled, bag, or, anything, that, other, users, said, i, was, concerned, at, first, but, i, took, a, chance, and, there, were, no, problems, i, find, the, best, ...","[awesome, substitute, problem, half, filled, bag, users, said, concerned, took, chance, problems, find, best, way, rid, initial, smell, boil, cube, bouillon, texture, good, notice, replace, noodle, eat, nt, mind]","[awesome, substitute, problem, half, fill, bag, user, say, concern, take, chance, problem, find, good, way, rid, initial, smell, boil, cube, bouillon, texture, good, notice, replace, noodle, eat, not, mind]",its an awesome substitute i did n't have a problem at all with a half filled bag or anything that other users said i was concerned at first but i took a chance and there were no problems i find the best way to get rid of the initial smell is to b...,awesome substitute problem half filled bag users said concerned took chance problems find best way rid initial smell boil cube bouillon texture good notice replace noodle eat nt mind,awesome substitute problem half fill bag user say concern take chance problem find good way rid initial smell boil cube bouillon texture good notice replace noodle eat not mind
4,B007JINB0W,AH3B94LQOPPY6,They taste like whatever you cook them with.,Five Stars,5,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Five Stars: They taste like whatever you cook them with.,Five Stars: They taste like whatever you cook them with.,"[five, stars, they, taste, like, whatever, you, cook, them, with]","[stars, taste, like, cook]","[star, taste, like, cook]",five stars they taste like whatever you cook them with,stars taste like cook,star taste like cook
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,A73IG1ED6S0JR,Product arrived with two of the bags punctured. Also smells really really bad.,would not recomend,1,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,would not recomend: Product arrived with two of the bags punctured. Also smells really really bad.,would not recomend: Product arrived with two of the bags punctured. Also smells really really bad.,"[would, not, recomend, product, arrived, with, two, of, the, bags, punctured, also, smells, really, really, bad]","[recomend, product, arrived, bags, punctured, smells, bad]","[recomend, product, arrive, bag, puncture, smell, bad]",would not recomend product arrived with two of the bags punctured also smells really really bad,recomend product arrived bags punctured smells bad,recomend product arrive bag puncture smell bad
4359,B007JINB0W,A1XZ2H0MYG54M0,Ok.,Five Stars,5,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Five Stars: Ok.,Five Stars: Ok.,"[five, stars, ok]","[stars, ok]","[star, ok]",five stars ok,stars ok,star ok
4360,B007JINB0W,A3I2YF0MXB7P0B,"I like these noodles but the spinach ones just taste odd. They have a bitter flavor compared to the other ones. I don't think it tastes like Spinach, it just tastes bitter and odd. Now I understand why these were on sale compared to the other one...","Not awful, but now I know why these were on sale.",2,2013,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Not awful, but now I know why these were on sale.: I like these noodles but the spinach ones just taste odd. They have a bitter flavor compared to the other ones. I don't think it tastes like Spinach, it just tastes bitter and odd. Now I understa...","Not awful, but now I know why these were on sale.: I like these noodles but the spinach ones just taste odd. They have a bitter flavor compared to the other ones. I don't think it tastes like Spinach, it just tastes bitter and odd. Now I understa...","[not, awful, but, now, i, know, why, these, were, on, sale, i, like, these, noodles, but, the, spinach, ones, just, taste, odd, they, have, a, bitter, flavor, compared, to, the, other, ones, i, do, n't, think, it, tastes, like, spinach, it, just,...","[awful, know, sale, like, noodles, spinach, ones, taste, odd, bitter, flavor, compared, ones, think, tastes, like, spinach, tastes, bitter, odd, understand, sale, compared, ones, drawer, fridge, know]","[awful, know, sale, like, noodle, spinach, one, taste, odd, bitter, flavor, compare, one, think, taste, like, spinach, taste, bitter, odd, understand, sale, compare, one, drawer, fridge, know]",not awful but now i know why these were on sale i like these noodles but the spinach ones just taste odd they have a bitter flavor compared to the other ones i do n't think it tastes like spinach it just tastes bitter and odd now i understand why...,awful know sale like noodles spinach ones taste odd bitter flavor compared ones think tastes like spinach tastes bitter odd understand sale compared ones drawer fridge know,awful know sale like noodle spinach one taste odd bitter flavor compare one think taste like spinach taste bitter odd understand sale compare one drawer fridge know
4361,B007JINB0W,A2UELLFLITPMT1,Truly horrific. Like eating dead worms.,Don't even try it.,1,2017,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Don't even try it.: Truly horrific. Like eating dead worms.,Don't even try it.: Truly horrific. Like eating dead worms.,"[do, n't, even, try, it, truly, horrific, like, eating, dead, worms]","[try, truly, horrific, like, eating, dead, worms]","[try, truly, horrific, like, eat, dead, worm]",do n't even try it truly horrific like eating dead worms,try truly horrific like eating dead worms,try truly horrific like eat dead worm


In [30]:
type(temp_df.loc[0, 'tokens'])

list

### Save Joblib

In [31]:
import joblib
fpath_joblib = FPATHS['data']['processed-nlp']['processed-reviews-spacy_joblib']
fpath_joblib

'data/processed/processed-reviews.joblib'

In [32]:
# Dump to selectd fpath
joblib.dump(df, fpath_joblib)

['data/processed/processed-reviews.joblib']

In [33]:
# confirming saved properly
loaded = joblib.load(FPATHS['data']['processed-nlp']['processed-reviews-spacy_joblib'])
loaded.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw,tokens-dirty,tokens,lemmas,tokens-dirty-joined,tokens-joined,lemmas-joined
0,B007JINB0W,A3Y51NV9HU5T2,"Great pasta taste and feel, but the spell in the packaged is SKRONG!",Four Stars,4.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Four Stars: Great pasta taste and feel, but the spell in the packaged is SKRONG!","Four Stars: Great pasta taste and feel, but the spell in the packaged is SKRONG!","[four, stars, great, pasta, taste, and, feel, but, the, spell, in, the, packaged, is, skrong]","[stars, great, pasta, taste, feel, spell, packaged, skrong]","[star, great, pasta, taste, feel, spell, package, skrong]",four stars great pasta taste and feel but the spell in the packaged is skrong,stars great pasta taste feel spell packaged skrong,star great pasta taste feel spell package skrong
1,B007JINB0W,A3D7EFSRC6Y9MP,"The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.",Okay but don't like texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Okay but don't like texture: The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.","Okay but don't like texture: The texture just made it a little strange to eat. Otherwise the flavor is okay, very bland so add spices.","[okay, but, do, n't, like, texture, the, texture, just, made, it, a, little, strange, to, eat, otherwise, the, flavor, is, okay, very, bland, so, add, spices]","[okay, like, texture, texture, little, strange, eat, flavor, okay, bland, add, spices]","[okay, like, texture, texture, little, strange, eat, flavor, okay, bland, add, spice]",okay but do n't like texture the texture just made it a little strange to eat otherwise the flavor is okay very bland so add spices,okay like texture texture little strange eat flavor okay bland add spices,okay like texture texture little strange eat flavor okay bland add spice
2,B007JINB0W,A4AM5KBP3I2R,The herb flavor makes the odd texture of shirataki much more palatable.,Go for the green noodles,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Go for the green noodles: The herb flavor makes the odd texture of shirataki much more palatable.,Go for the green noodles: The herb flavor makes the odd texture of shirataki much more palatable.,"[go, for, the, green, noodles, the, herb, flavor, makes, the, odd, texture, of, shirataki, much, more, palatable]","[green, noodles, herb, flavor, makes, odd, texture, shirataki, palatable]","[green, noodle, herb, flavor, make, odd, texture, shirataki, palatable]",go for the green noodles the herb flavor makes the odd texture of shirataki much more palatable,green noodles herb flavor makes odd texture shirataki palatable,green noodle herb flavor make odd texture shirataki palatable
3,B007JINB0W,A3GHK4IL78DB7Y,I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is to boil it will a cube of bo...,Its an awesome substitute.,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Its an awesome substitute.: I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is ...,Its an awesome substitute.: I didn't have a problem at all with a half filled bag or anything that other users said. I was concerned at first but I took a chance and there were no problems.\nI find the best way to get rid of the initial smell is ...,"[its, an, awesome, substitute, i, did, n't, have, a, problem, at, all, with, a, half, filled, bag, or, anything, that, other, users, said, i, was, concerned, at, first, but, i, took, a, chance, and, there, were, no, problems, i, find, the, best, ...","[awesome, substitute, problem, half, filled, bag, users, said, concerned, took, chance, problems, find, best, way, rid, initial, smell, boil, cube, bouillon, texture, good, notice, replace, noodle, eat, nt, mind]","[awesome, substitute, problem, half, fill, bag, user, say, concern, take, chance, problem, find, good, way, rid, initial, smell, boil, cube, bouillon, texture, good, notice, replace, noodle, eat, not, mind]",its an awesome substitute i did n't have a problem at all with a half filled bag or anything that other users said i was concerned at first but i took a chance and there were no problems i find the best way to get rid of the initial smell is to b...,awesome substitute problem half filled bag users said concerned took chance problems find best way rid initial smell boil cube bouillon texture good notice replace noodle eat nt mind,awesome substitute problem half fill bag user say concern take chance problem find good way rid initial smell boil cube bouillon texture good notice replace noodle eat not mind
4,B007JINB0W,AH3B94LQOPPY6,They taste like whatever you cook them with.,Five Stars,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Five Stars: They taste like whatever you cook them with.,Five Stars: They taste like whatever you cook them with.,"[five, stars, they, taste, like, whatever, you, cook, them, with]","[stars, taste, like, cook]","[star, taste, like, cook]",five stars they taste like whatever you cook them with,stars taste like cook,star taste like cook
