# Amazon Speaker Reviews – Exploratory Data Analysis (EDA) with Pandas

## Step 1: Import Required Libraries

In [None]:
import pandas as pd
from google.colab import files

## Step 2: Upload the Dataset

In [None]:
import pandas as pd
from google.colab import files

uploaded = files.upload()

df = pd.read_csv('Cleaned_Speaker_Reviews_With_NumPy.csv')

df.columns = [str(col).strip().lower().replace(' ', '_') for col in df.columns]

df.head()


Saving Cleaned_Speaker_Reviews_With_NumPy.csv to Cleaned_Speaker_Reviews_With_NumPy (1).csv


Unnamed: 0,unnamed:_0,review_id,product_id,title,author,rating,content,timestamp,profile_id,is_verified,helpful_count,product_attributes,review_length,normalized_rating
0,0,RUE030N50F9EJ,B09PYVXXW5,5.0 out of 5 stars Really good with a couple o...,CTM,5.0,I love TWS earbuds. I have many including buds...,"Reviewed in the United States May 14, 2022",AEGYSY5H3ZUJC4SGGPRM3Z2OE5PA,1,32,Color: Black,410,1.0
1,1,R385JSD6KWP2QU,B09PYVXXW5,4.0 out of 5 stars I wish I could rate 5 stars...,Gianna,4.0,…I just can’t. Because as useful as these earb...,"Reviewed in the United States July 22, 2023",AHINA7A6O2I5RZSNAY4OWYN4QXVA,1,31,Color: Red,657,0.75
2,2,R1UB1V4EPP9MN3,B09PYVXXW5,"5.0 out of 5 stars Basically perfect, fantasti...",Colin M.,5.0,I needed a replacement for my Galaxy buds pro'...,"Reviewed in the United States August 18, 2022",AFG2T5XGMQCACK7JBDRHKEKWJLPA,1,21,Color: Red,579,1.0
3,3,RWYK1GXIVV6H1,B09PYVXXW5,1.0 out of 5 stars Decent but Defective,Aquila,1.0,"UPDATE 6/24/22: As of 6/23/22, my replacement ...","Reviewed in the United States June 3, 2022",AGFUWY2GO4HF5RMLUR7ZOSKID4KA,1,9,Color: Black,578,0.0
4,4,R3FBQBGQM3II4W,B09PYVXXW5,3.0 out of 5 stars I went with soundcore instead,Frankie,3.0,So out of the box these tiny buds surprised me...,"Reviewed in the United States October 28, 2022",AEOFU2SCDWYLS6DTSXIR6FWMRQMQ,1,7,Color: Black,276,0.5


## Step 3: Display the First and Last Few Rows

In [None]:
print(df.head())
print(df.tail())

   unnamed:_0       review_id  product_id  \
0           0   RUE030N50F9EJ  B09PYVXXW5   
1           1  R385JSD6KWP2QU  B09PYVXXW5   
2           2  R1UB1V4EPP9MN3  B09PYVXXW5   
3           3   RWYK1GXIVV6H1  B09PYVXXW5   
4           4  R3FBQBGQM3II4W  B09PYVXXW5   

                                               title    author  rating  \
0  5.0 out of 5 stars Really good with a couple o...       CTM     5.0   
1  4.0 out of 5 stars I wish I could rate 5 stars...    Gianna     4.0   
2  5.0 out of 5 stars Basically perfect, fantasti...  Colin M.     5.0   
3            1.0 out of 5 stars Decent but Defective    Aquila     1.0   
4   3.0 out of 5 stars I went with soundcore instead   Frankie     3.0   

                                             content  \
0  I love TWS earbuds. I have many including buds...   
1  …I just can’t. Because as useful as these earb...   
2  I needed a replacement for my Galaxy buds pro'...   
3  UPDATE 6/24/22: As of 6/23/22, my replacement ...   
4  S

## Step 4: Descriptive Statistics for Numerical Columns

In [None]:
# Replace 'rating' with any column you wish to explore
print('Mean:', df['rating'].mean())
print('Median:', df['rating'].median())
print('Mode:', df['rating'].mode()[0])
print('Variance:', df['rating'].var())
print('Standard Deviation:', df['rating'].std())

Mean: 4.291203235591507
Median: 5.0
Mode: 5.0
Variance: 1.1328210640671061
Standard Deviation: 1.064340671057489


## Step 5: Identify Missing Values

In [None]:
missing_values = df.isnull().sum()
print('Missing values per column:\n', missing_values)

Missing values per column:
 unnamed:_0              0
review_id               0
product_id              0
title                   0
author                  1
rating                  0
content                 8
timestamp               0
profile_id              0
is_verified             0
helpful_count           0
product_attributes    224
review_length           0
normalized_rating       0
dtype: int64


## Step 6: Handle Missing Values

In [None]:
# Example: Fill missing ratings with the column mean
df['rating'] = pd.to_numeric(df['rating'], errors='coerce')
df['rating'].fillna(df['rating'].mean(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['rating'].fillna(df['rating'].mean(), inplace=True)


## Step 7: Detect Outliers Using IQR Method

In [None]:
Q1 = df['rating'].quantile(0.25)
Q3 = df['rating'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['rating'] < (Q1 - 1.5 * IQR)) | (df['rating'] > (Q3 + 1.5 * IQR))]
print('Outliers based on rating:\n', outliers)

Outliers based on rating:
       unnamed:_0       review_id  product_id  \
3              3   RWYK1GXIVV6H1  B09PYVXXW5   
5              5  R2XA1MES3TJ3W5  B09PYVXXW5   
14            14  R3H14FR01F7EJW  B09PYVXXW5   
20            20  R2J00QXBAYM6W9  B09PYVXXW5   
23            23  R1QYCYXB7TOTGQ  B09PYVXXW5   
...          ...             ...         ...   
4939        5859   RMJNP14ENJLS4  B0CY6S748H   
4941        5889   RERGCQ3PLJUTJ  B0CY6S748H   
4942        5890  R25VEKRF5EV836  B0CY6S748H   
4943        5891  R34PNR72XAULLC  B0CY6S748H   
4944        5892  R1GHYH6N4V9BRB  B0CY6S748H   

                                                  title       author  rating  \
3               1.0 out of 5 stars Decent but Defective       Aquila     1.0   
5     2.0 out of 5 stars Wish I had believed the neg...        Devin     2.0   
14                         2.0 out of 5 stars Not great          Ben     2.0   
20                     2.0 out of 5 stars Dissapointing    J. Gracia     2.0

## Step 8: Save the Cleaned Dataset

In [None]:
df.to_csv('Cleaned_Speaker_Reviews_EDA.csv', index=False)
files.download('Cleaned_Speaker_Reviews_EDA.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>