# Product Comments Data Exploratory Data Analysis

## Table of Contents:
* [Goal](#goal)
* [Dataset](#dataset)
    * [Importing the Libraries](#import)
    * [Reading and Viewing the Dataset](#reading)
    * [Renaming Columns](#rename)
    * [Dropping Redundant Columns](#drop)
    * [Detecting Missing Values](#missing)
    * [Preprocessing Rows and Columns](#preprocessing)
* [Conclusion](#conc)

***

## Goal <a class="anchor" id="goal"></a>

We are going to be performing Exploratory Data Analysis on the Product Comments Data to determine characteristics and have a better understanding of the said data.

***

## Dataset <a class="anchor" id="dataset"></a>

### Importing Necessary Libraries<a class="anchor" id="import"></a>

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

### Reading and Viewing the Dataset <a class="anchor" id="reading"></a>

In [None]:
df = pd.read_excel('Urun yorumlari.xlsx')

In [None]:
df.shape

(561, 25)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 561 entries, 0 to 560
Data columns (total 25 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   order_id                515 non-null    object        
 1   review_title            0 non-null      float64       
 2   comments                456 non-null    object        
 3   rating                  561 non-null    int64         
 4   status                  561 non-null    object        
 5   date_created            561 non-null    datetime64[ns]
 6   sku                     561 non-null    object        
 7   Customer ID             561 non-null    object        
 8   address                 0 non-null      float64       
 9   product_sku             558 non-null    object        
 10  product_name            558 non-null    object        
 11  product_link            558 non-null    object        
 12  video_review_prompt_id  193 non-null    float64   

### Renaming Columns<a class="anchor" id="rename"></a>

In [None]:
df.rename(columns={"order_id" : "Order_id",
                   "Customer ID" : "CustomerID",
                   }, inplace=True)                           # renaming columns

### Dropping Redundant Columns<a class="anchor" id="drop"></a>

In [None]:
df = df.drop(['review_title',
              'address', 
              'reply',
              'reply_private',
              'reply_date',
              'video_first_campaign',
              'tags',
              'unpublished_images',
              'published_images',
              'published_videos', 
              'unpublished_videos',
              'timeago',
              'video_review_prompt_id'], axis=1)              # dropping redundant columns

### Detecting Missing Values <a class="anchor" id="missing"></a>

In [None]:
df.isnull().sum()                                             # checking the missing values

Order_id         46
comments        105
rating            0
status            0
date_created      0
sku               0
CustomerID        0
product_sku       3
product_name      3
product_link      3
source           88
location        205
dtype: int64

### Preprocessing Rows and Columns <a class="anchor" id="preprocessing"></a>

In [None]:
df.location = df.location.fillna('unknown')                    # filling missing values in location column with unknown

In [None]:
df.location.value_counts()                                     # viewing the location column

unknown                        205
, United Kingdom                72
Glasgow, United Kingdom          8
Nottingham, United Kingdom       8
Birmingham, United Kingdom       6
                              ... 
Watford, United Kingdom          1
Flitwick, United Kingdom         1
Dunfermline, United Kingdom      1
Chester, United Kingdom          1
Crowborough, United Kingdom      1
Name: location, Length: 178, dtype: int64

***

In [None]:
df['Country'] = df.location.str.split(', ').apply(lambda x: x[-1])  # creating a country column off of location column

In [None]:
df.Country = df.Country.replace(['United Kingdom"', 'Germany"'],    # correcting typos in country column
                                ['United Kingdom', 'Germany'])

In [None]:
df.Country.value_counts(dropna=False)                               # viewing the country column

United Kingdom    344
unknown           205
United States       5
Sweden              2
Germany             2
Spain               1
Denmark             1
Turkey              1
Name: Country, dtype: int64

***

In [None]:
df['City'] = df.location.str.split(', ').apply(lambda x: x[0])      # creating a city column off of location column

In [None]:
df.City.replace(['u', ''], ['unknown', 'unknown']).value_counts()   # viewing the city column

unknown        282
Glasgow          8
Nottingham       8
Birmingham       6
Liverpool        6
              ... 
Watford          1
Flitwick         1
Dunfermline      1
Chester          1
Crowborough      1
Name: City, Length: 174, dtype: int64

***

In [None]:
df.comments.fillna('No Comment', inplace=True)         # filling missing values in comments column with no comment

***

In [None]:
df.source.fillna('unknown', inplace=True)              # labelling missing values in source column as unknown

***

## Conclusion <a class="anchor" id="conc"></a>

We, as a team, went over every single feature throughtout our Exploratory Data Analysis. We cleaned, adjusted and organized the data. It is now ready for further analyses.

In [None]:
df

Unnamed: 0,Order_id,comments,rating,status,date_created,sku,CustomerID,product_sku,product_name,product_link,source,location,Country,City
0,44840,Very happy with thisğŸ˜Š,5,active,2022-11-28 19:20:34,3108,C0000921,3108,Carina Collection Modern Washable Rugs in Pink...,https://www.the-rugs.com/washable-rugs/carina-...,woocom,unknown,unknown,unknown
1,44840,No Comment,5,active,2022-11-28 19:20:34,18185,C0000921,18185,Muslera Collection Faux Fur Rugs in Light Grey...,https://www.the-rugs.com/bathroom-rugs/muslera...,woocom,unknown,unknown,unknown
2,47071,Quality meets expectation,5,active,2022-11-26 21:06:08,2199,C0001003,2199,Montana Collection Modern Rugs in Blue | 3762N...,https://www.the-rugs.com/modern-rugs/montana-c...,woocom,"Coventry, United Kingdom",United Kingdom,Coventry
3,44681,"Just as the picture, creases came out quite qu...",5,active,2022-11-25 14:51:50,2169,C0020878,2169,Montana Collection Modern Rugs in Cream | 3716...,https://www.the-rugs.com/vintage-rugs/montana-...,woocom,"Cardiff, United Kingdom",United Kingdom,Cardiff
4,44836,Beautiful Rug!!,5,active,2022-11-25 14:22:34,2470,C0009678,2470,Rhapsody Collection Berber Design Shaggy Rugs ...,https://www.the-rugs.com/shaggy-rugs/rhapsody-...,woocom,"Slough, United Kingdom",United Kingdom,Slough
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
556,13584,Absolutely lovely at extremely low price A* hi...,5,active,2020-05-06 20:18:47,48060,C0024000,48060,MyShaggy Collection Shaggy Rugs in Off-White |...,https://www.the-rugs.com/shaggy-rugs/myshaggy-...,booster,unknown,unknown,unknown
557,13996,"Really nice rug, exactly as picture and great ...",5,active,2020-05-06 19:39:02,47480-my,C0008828,47480-my,TREND Collection Modern Rugs in Beige | 7406B ...,https://www.the-rugs.com/striped-rugs/trend-co...,booster,unknown,unknown,unknown
558,14013,"Amazing rug! I loved the quality and softness,...",5,active,2020-05-06 19:33:21,49478,C0032499,49478,Caimas Collection Washable Rugs in Blue | 6000...,https://www.the-rugs.com/washable-rugs/caimas-...,booster,unknown,unknown,unknown
559,13762,Good service and excellent quality products,5,active,2020-05-06 19:31:49,49206-my,C0000019,49206-my,Caimas Collection Washable Rugs in Beige | 299...,https://www.the-rugs.com/abstract-rugs/caimas-...,booster,unknown,unknown,unknown
