<a href="https://colab.research.google.com/github/kaindoh/sentiment-Analysis-for-Amazon-baby-shop/blob/main/amazon_baby.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [121]:
import numpy as np
import pandas as pd

In [122]:
df = pd.read_csv("amazon_baby.csv")
df.head()

Unnamed: 0,name,review,rating
0,Planetwise Flannel Wipes,"These flannel wipes are OK, but in my opinion ...",3
1,Planetwise Wipe Pouch,it came early and was not disappointed. i love...,5
2,Annas Dream Full Quilt with 2 Shams,Very soft and comfortable and warmer than it l...,5
3,Stop Pacifier Sucking without tears with Thumb...,This is a product well worth the purchase. I ...,5
4,Stop Pacifier Sucking without tears with Thumb...,All of my kids have cried non-stop when I trie...,5


In [123]:
len(df)
df.shape


(183531, 3)

In [124]:
df['name'].value_counts()

Vulli Sophie the Giraffe Teether                                            785
Simple Wishes Hands-Free Breastpump Bra, Pink, XS-L                         562
Infant Optics DXR-5 2.4 GHz Digital Video Baby Monitor with Night Vision    561
Baby Einstein Take Along Tunes                                              547
Cloud b Twilight Constellation Night Light, Turtle                          520
                                                                           ... 
Bebek Silicone Breast Shields, 2 Pack                                         1
Zakeez ZGR Zaky Therapeutic Positioning Pillow- Gray Right                    1
Earth's Best Chlorine Free Diapers Size 4 -- 30 Diapers                       1
MAM First Tooth Brush, Green, 6 Plus Months                                   1
Clek Standard Booster Seat, Julius Red                                        1
Name: name, Length: 32417, dtype: int64

In [125]:
df.groupby('name')['name'].value_counts().sort_values(ascending=False)

name                                                                      name                                                                    
Vulli Sophie the Giraffe Teether                                          Vulli Sophie the Giraffe Teether                                            785
Simple Wishes Hands-Free Breastpump Bra, Pink, XS-L                       Simple Wishes Hands-Free Breastpump Bra, Pink, XS-L                         562
Infant Optics DXR-5 2.4 GHz Digital Video Baby Monitor with Night Vision  Infant Optics DXR-5 2.4 GHz Digital Video Baby Monitor with Night Vision    561
Baby Einstein Take Along Tunes                                            Baby Einstein Take Along Tunes                                              547
Cloud b Twilight Constellation Night Light, Turtle                        Cloud b Twilight Constellation Night Light, Turtle                          520
                                                                                   

In [126]:
df = df[df.groupby('name')['name'].transform('size') >20]

In [127]:
df = df.head(1000)

In [128]:
len(df)

1000

In [129]:
df['review'] = df.review.astype(str)

In [130]:
df['review'].isnull().sum()

0

## Data Cleaning


In [131]:
# Remove special characters
def clean(txt):
    txt = txt.str.replace("(<br/>)", "")
    txt = txt.str.replace('(<a).*(>).*(</a>)', '')
    txt = txt.str.replace('(&amp)', '')
    txt = txt.str.replace('(&gt)', '')
    txt = txt.str.replace('(&lt)', '')
    txt = txt.str.replace('(\xa0)', ' ')  
    return txt
df['review'] = clean(df['review'])

In [132]:
df['review'].head()

153    We bought these for our son when he turned two...
154    My son loves stacking cups, so a friend recomm...
155    My son Cameron just loves these great little s...
156    My one year old son received these as a birthd...
157    I purchased this toy for my great grandson's f...
Name: review, dtype: object

In [133]:
df['review1'] = df['review'].apply(lambda x: " ".join(x.lower() for x in x.split()))
df['review1'].head()

153    we bought these for our son when he turned two...
154    my son loves stacking cups, so a friend recomm...
155    my son cameron just loves these great little s...
156    my one year old son received these as a birthd...
157    i purchased this toy for my great grandson's f...
Name: review1, dtype: object

In [134]:
# Remove punctuations
df['review1'] = df['review1'].str.replace('[^\w\s]', '')
df['review1'].head()

153    we bought these for our son when he turned two...
154    my son loves stacking cups so a friend recomme...
155    my son cameron just loves these great little s...
156    my one year old son received these as a birthd...
157    i purchased this toy for my great grandsons fi...
Name: review1, dtype: object

In [135]:
# Remove stopwords
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stop = stopwords.words('english')
df['review1'] = df['review1'].apply(lambda x: ' '.join(x for x in x.split() if x not in stop))
df['review1'].head()

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


153    bought son turned two seen playmates home love...
154    son loves stacking cups friend recommended toy...
155    son cameron loves great little stacking cars e...
156    one year old son received birthday gift loves ...
157    purchased toy great grandsons first christmas ...
Name: review1, dtype: object

In [136]:
# Remove rare words
freq = pd.Series(' '.join(df['review1']).split()).value_counts()
print(freq)
less_freq = list(freq[freq==1].index)
less_freq

diaper         834
tub            570
one            553
use            508
baby           396
              ... 
neighbors        1
hogan            1
prevents         1
togther          1
orientation      1
Length: 5921, dtype: int64


['thenthe',
 'dozens',
 'convience',
 'before4',
 'dipaers',
 'swinging',
 'consistency',
 'contacts',
 'reign',
 '18monthold',
 'waiting',
 'entering',
 'happiest',
 'stunned',
 'ziplock',
 'ithonestly',
 'usb',
 '3648',
 'results',
 'sanitary',
 'neighber',
 'unlocked',
 'blessing',
 'nowbottom',
 'peeing',
 'opposite',
 'bicycle',
 'territory',
 'bumpkiss',
 'disturbing',
 'horribly',
 'babywell',
 'securing',
 'pottyshe',
 'balancing',
 'transportable',
 'obnoxiously',
 'rides',
 'flipup',
 'worrying',
 'steel',
 'pinehurst',
 'non',
 'disinfectants',
 '36',
 'posting',
 'disinfected',
 'stayathome',
 'disposingalways',
 'ratings',
 'stability',
 'moneys',
 'wks',
 'rub',
 'smellexcuse',
 '11month',
 'compromised',
 'increase',
 'annoyingly',
 'leakinghope',
 'reproductions',
 'afford',
 'himby',
 'hotels',
 'beepquot',
 'affected',
 'moldy',
 'users',
 'allinone',
 'convienence',
 'surround',
 'nj',
 'boos',
 '4month',
 'combat',
 'positivethe',
 'exceptionit',
 'toward',
 'boombo

In [137]:
df['review1'] = df['review1'].apply(lambda x: ' '.join(x for x in x.split() if x not in less_freq))


In [138]:
len(df)

1000

In [139]:
# spelling correction
from textblob import TextBlob, Word, Blobber
df['review1'] = df['review1'].apply(lambda x: str(TextBlob(x).correct()))
df['review1'].head()

153    bought son turned two seen home loved liked bc...
154    son loves sticking cups friend recommended toy...
155    son loves great little sticking cars enjoys st...
156    one year old son received birthday gift loves ...
157    purchased toy great grandson first christmas 6...
Name: review1, dtype: object

In [140]:
from nltk.stem import PorterStemmer
st = PorterStemmer()
df['review1'] = df['review1'].apply(lambda x: " ".join([st.stem(word) for word in x.split()]))

In [141]:
import nltk
nltk.download('wordnet')
df['review1'] = df['review1'].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))
df['review1'].head()

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


153    bought son turn two seen home love like bc unl...
154    son love stick cup friend recommend toy son lo...
155    son love great littl stick car enjoy stick gre...
156    one year old son receiv birthday gift love sta...
157    purchas toy great grandson first christma 6 mo...
Name: review1, dtype: object

In [142]:
# remove punctuations
df['review1'] = df['review1'].str.replace('[^\w\s]','')

In [143]:
df.head()

Unnamed: 0,name,review,rating,review1
153,Fisher Price Nesting Action Vehicles,We bought these for our son when he turned two...,5,bought son turn two seen home love like bc unl...
154,Fisher Price Nesting Action Vehicles,"My son loves stacking cups, so a friend recomm...",5,son love stick cup friend recommend toy son lo...
155,Fisher Price Nesting Action Vehicles,My son Cameron just loves these great little s...,5,son love great littl stick car enjoy stick gre...
156,Fisher Price Nesting Action Vehicles,My one year old son received these as a birthd...,5,one year old son receiv birthday gift love sta...
157,Fisher Price Nesting Action Vehicles,I purchased this toy for my great grandson's f...,5,purchas toy great grandson first christma 6 mo...
