In [1]:
import pandas as pd
import os

# Data Products

## Extracting Data from Source Folder

In [2]:
# Creating a list with file names in the data_products folder
destination_path = 'data/data_products'
all_files = [file for file in os.listdir(destination_path) if file.endswith('.csv')]

# Create a list of dataframes for each files in the data_products folder
dataframes = []
for i in all_files:
    file = os.path.join(destination_path, i)
    df = pd.read_csv(file)
    dataframes.append(df)

# Concate all dataframes in the 'dataframes' lists
df_products_original = pd.concat(dataframes, ignore_index=True)

In [3]:
# Creating a copy to preserve original dataframes
df_products = df_products_original.copy()

df_products.head()

Unnamed: 0.1,name,main_category,sub_category,image,link,ratings,no_of_ratings,discount_price,actual_price,Unnamed: 0
0,Lloyd 1.5 Ton 3 Star Inverter Split Ac (5 In 1...,appliances,Air Conditioners,https://m.media-amazon.com/images/I/31UISB90sY...,https://www.amazon.in/Lloyd-Inverter-Convertib...,4.2,2255,"₹32,999","₹58,990",
1,LG 1.5 Ton 5 Star AI DUAL Inverter Split AC (C...,appliances,Air Conditioners,https://m.media-amazon.com/images/I/51JFb7FctD...,https://www.amazon.in/LG-Convertible-Anti-Viru...,4.2,2948,"₹46,490","₹75,990",
2,LG 1 Ton 4 Star Ai Dual Inverter Split Ac (Cop...,appliances,Air Conditioners,https://m.media-amazon.com/images/I/51JFb7FctD...,https://www.amazon.in/LG-Inverter-Convertible-...,4.2,1206,"₹34,490","₹61,990",
3,LG 1.5 Ton 3 Star AI DUAL Inverter Split AC (C...,appliances,Air Conditioners,https://m.media-amazon.com/images/I/51JFb7FctD...,https://www.amazon.in/LG-Convertible-Anti-Viru...,4.0,69,"₹37,990","₹68,990",
4,Carrier 1.5 Ton 3 Star Inverter Split AC (Copp...,appliances,Air Conditioners,https://m.media-amazon.com/images/I/41lrtqXPiW...,https://www.amazon.in/Carrier-Inverter-Split-C...,4.1,630,"₹34,490","₹67,790",


## Cleaning and Transforming Data

In [4]:
df_products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1103170 entries, 0 to 1103169
Data columns (total 10 columns):
 #   Column          Non-Null Count    Dtype  
---  ------          --------------    -----  
 0   name            1103170 non-null  object 
 1   main_category   1103170 non-null  object 
 2   sub_category    1103170 non-null  object 
 3   image           1103170 non-null  object 
 4   link            1103170 non-null  object 
 5   ratings         751582 non-null   object 
 6   no_of_ratings   751582 non-null   object 
 7   discount_price  980844 non-null   object 
 8   actual_price    1067544 non-null  object 
 9   Unnamed: 0      551585 non-null   float64
dtypes: float64(1), object(9)
memory usage: 84.2+ MB


Penyelidikan terhadap data. Untuk sejauh ini kita bisa lihat bahwa terdapat beberapa null value dalam beberapa kolom itu. Kita bisa melihat bahwa untuk kolom ```name, main_category, sub_category, image, link``` seharusnya memiliki tipe data string. Selain itu untuk kolom ```ratings, no_of_ratings, discount_price, actual_price``` yang seharusnya memiliki tipe data interger atau float. Terakhir, terdapat kolom ```Unnamed: 0``` yang perlu kita selidiki.

In [5]:
# Mengubah kolom name, main_category, sub_category, image, dan link menjadi string

df_products['name'] = df_products['name'].astype('string')
df_products['main_category'] = df_products['main_category'].astype('string')
df_products['sub_category'] = df_products['sub_category'].astype('string')
df_products['image'] = df_products['image'].astype('string')
df_products['link'] = df_products['link'].astype('string')

In [6]:
# Untuk melihat bentuk data tpe masing-masing kolom dan jumlah non-null values

df_products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1103170 entries, 0 to 1103169
Data columns (total 10 columns):
 #   Column          Non-Null Count    Dtype  
---  ------          --------------    -----  
 0   name            1103170 non-null  string 
 1   main_category   1103170 non-null  string 
 2   sub_category    1103170 non-null  string 
 3   image           1103170 non-null  string 
 4   link            1103170 non-null  string 
 5   ratings         751582 non-null   object 
 6   no_of_ratings   751582 non-null   object 
 7   discount_price  980844 non-null   object 
 8   actual_price    1067544 non-null  object 
 9   Unnamed: 0      551585 non-null   float64
dtypes: float64(1), object(4), string(5)
memory usage: 84.2+ MB


Lalu kita akan lanjut untuk melihat bagaimana cara menghandle numeric data type. Setelah diselidiki ditemukan beberapa non-numeric values dimasing-masing kolom. Oleh karena itu, melainkan menggunakan ```astype()``` method, kita menggunakan ```apply()``` method untuk mengubah values dari string (atau tipe data yang lain) menjadi ```int/float```. Untuk non-numeric kita akan ubah valuenya menjadi ```0``` tapi juga akan tetap melihat kasus unik masing-masing kolom.

In [7]:
# Membuat fungsi untuk mengubah menjadi interger atau float

def getFloat(number):
    try:
        number = float(number)
        return number
    except:
        return None   

In [8]:
df_products['ratings'].unique()

array(['4.2', '4.0', '4.1', '4.3', '3.9', '3.8', '3.5', nan, '4.6', '3.3',
       '3.4', '3.7', '2.9', '5.0', '4.4', '3.6', '2.7', '4.5', '3.0',
       '3.1', '3.2', '4.8', '4.7', '2.5', '1.0', '2.6', '2.8', '2.3',
       '1.7', 'Get', '1.8', '2.4', '2.0', '1.5', '4.9', '1.9', '2.2',
       '1.2', '2.1', '1.4', '1.6', '1.3', 'FREE', '₹2.99', '1.1', '₹70',
       4.5, 3.3, 3.7, 4.2, 4.3, 4.1, 3.8, 4.4, 4.0, 3.5, 3.9, '₹99', 3.4,
       3.6, 3.2, 5.0, 2.6, 4.7, 3.0, 3.1, 4.6, 1.0, 4.9, 2.9, 2.7, 2.2,
       2.5, 4.8, 2.0, 1.7, 1.9, 2.8, 2.4, 1.6, 2.3, 1.8, 1.4, 1.3, 2.1,
       1.5, '₹100', '₹68.99', '₹65'], dtype=object)

In [9]:
df_products['ratings'] = df_products['ratings'].apply(getFloat)

In [10]:
# Menghilangkan ',' agar dapat diubah menjadi interger
df_products['no_of_ratings'] = df_products['no_of_ratings'].str.replace(',', '')

# Untuk bantu check non number valuesnya seperti apa. karena kolom 'no_of_ratings' memiliki unique values sangat banyak
def getNonNumber(number):
    try:
        number = float(number)
        return 0
    except:
        return number

test_norating = df_products['no_of_ratings'].apply(getNonNumber)
test_norating.unique()

# Karena mengetahui bahwa angka non-numeric values merupaka string error yang tidak dimiliki kolom lain, kita dapat mengubahnya menjadi nilai 0

array([0, 'Only 2 left in stock.', 'Only 1 left in stock.',
       'FREE Delivery by Amazon', 'Usually dispatched in 3 to 4 weeks.',
       'Usually dispatched in 5 to 6 days.',
       'Usually dispatched in 4 to 5 days.',
       'Usually dispatched in 6 to 7 days.',
       'Usually dispatched in 7 to 8 days.',
       'Usually dispatched in 11 to 12 days.',
       'Usually dispatched in 4 to 5 weeks.', 'Only 4 left in stock.',
       'Only 3 left in stock.', 'Usually dispatched in 1 to 2 months.',
       'Only 5 left in stock.',
       'This item will be released on August 14 2023.',
       'Usually dispatched in 3 to 5 days.',
       'Usually dispatched in 2 to 3 days.',
       'Usually dispatched in 9 to 10 days.',
       'Usually dispatched in 2 to 3 weeks.',
       'Usually dispatched in 8 to 9 days.'], dtype=object)

In [11]:
df_products['no_of_ratings'] = df_products['no_of_ratings'].apply(getFloat)
df_products['no_of_ratings'] = df_products['no_of_ratings'].astype('Int64')

In [12]:
# Bersihkan string agar dapat diubah
df_products['discount_price'] = df_products['discount_price'].str.replace('₹', '')
df_products['discount_price'] = df_products['discount_price'].str.replace(',', '')

# Karena bekerja dengan currency kita akan menggunaka datatype float
df_products['discount_price'] = df_products['discount_price'].astype(float)

In [13]:
# Bersihkan string agar dapat diubah
df_products['actual_price'] = df_products['actual_price'].str.replace('₹', '')
df_products['actual_price'] = df_products['actual_price'].str.replace(',', '')

# Karena bekerja dengan currency kita akan menggunaka datatype float
df_products['actual_price'] = df_products['actual_price'].astype(float)

In [14]:
# Kita juga membuat kolom baru untuk menyatukan harga yang sudah didiskon dan values yang tidak ada diskonnya.

def getCopyPrice(disc, actual):
    if pd.isnull(disc) and pd.notnull(actual):
        return actual
    else:
        return disc
    

df_products['current_price'] = df_products.apply(lambda x: getCopyPrice(x['discount_price'], x['actual_price']), axis=1)

In [15]:
df_products[df_products['discount_price'].isnull()].head()

Unnamed: 0.1,name,main_category,sub_category,image,link,ratings,no_of_ratings,discount_price,actual_price,Unnamed: 0,current_price
76,LG 1.5 Ton 3 Star Hot & Cold DUAL Inverter Spl...,appliances,Air Conditioners,https://m.media-amazon.com/images/I/51hbo8yQ1E...,https://www.amazon.in/LG-Inverter-Convertible-...,4.0,265,,,,
100,Hitachi 1.5 Ton 5 Star Inverter Split AC (Copp...,appliances,Air Conditioners,https://m.media-amazon.com/images/I/41AY1pk5oR...,https://www.amazon.in/Hitachi-Inverter-Copper-...,3.8,748,,,,
114,Panasonic 1.5 Ton 4 Star Wi-Fi Twin-Cool Inver...,appliances,Air Conditioners,https://m.media-amazon.com/images/I/41Edvsb7Gh...,https://www.amazon.in/Panasonic-Conditioner-An...,4.5,195,,,,
126,"Hitachi 1.5 Ton 5 Star Window AC (Copper, Dust...",appliances,Air Conditioners,https://m.media-amazon.com/images/I/81Ei0pUgd7...,https://www.amazon.in/Hitachi-Window-Copper-Fi...,2.7,10,,49300.0,,49300.0
137,"Portable Air Conditioner,Office Air Conditione...",appliances,Air Conditioners,https://m.media-amazon.com/images/I/61zHq-twHL...,https://www.amazon.in/Portable-Conditioner-Eva...,3.2,303,,7250.0,,7250.0


Karena kita sudah mentransformasi masing-masing kolom menjadi data type yang sesuai kita akan menyingkirkan kolom ```Unnamed: 0``` karena dinilai tidak relevan dengan kebutuh analisis data.

In [16]:
df_products = df_products.drop('Unnamed: 0', axis=1)

In [17]:
df_products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1103170 entries, 0 to 1103169
Data columns (total 10 columns):
 #   Column          Non-Null Count    Dtype  
---  ------          --------------    -----  
 0   name            1103170 non-null  string 
 1   main_category   1103170 non-null  string 
 2   sub_category    1103170 non-null  string 
 3   image           1103170 non-null  string 
 4   link            1103170 non-null  string 
 5   ratings         739116 non-null   float64
 6   no_of_ratings   739053 non-null   Int64  
 7   discount_price  980844 non-null   float64
 8   actual_price    1067544 non-null  float64
 9   current_price   1067544 non-null  float64
dtypes: Int64(1), float64(4), string(5)
memory usage: 85.2 MB


## Data Demography

Dari kolom yang kita miliki, terdapat 2 kolom kategori yang bisa digunakan untuk melihat pembagian kelompok dengan aggregate functions yaitu `main_category` dan `sub_category`. Lalu kita memiliki kolom values yang bisa mendapatkan insight dari nilai `ratings`, `discount_price` yang akan kita olah untuk melihat besar diskon, dan `actual_price` untuk mendapatkan insight mengenai harga.

In [18]:
df_products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1103170 entries, 0 to 1103169
Data columns (total 10 columns):
 #   Column          Non-Null Count    Dtype  
---  ------          --------------    -----  
 0   name            1103170 non-null  string 
 1   main_category   1103170 non-null  string 
 2   sub_category    1103170 non-null  string 
 3   image           1103170 non-null  string 
 4   link            1103170 non-null  string 
 5   ratings         739116 non-null   float64
 6   no_of_ratings   739053 non-null   Int64  
 7   discount_price  980844 non-null   float64
 8   actual_price    1067544 non-null  float64
 9   current_price   1067544 non-null  float64
dtypes: Int64(1), float64(4), string(5)
memory usage: 85.2 MB


In [19]:
# Melihat unique kolom 'main_category'

df_products['main_category'].unique()

<StringArray>
[             'appliances',    'toys & baby products',
             "men's shoes",          'bags & luggage',
         'car & motorbike',                  'stores',
          "men's clothing",          'home & kitchen',
        "women's clothing",             'accessories',
     'tv, audio & cameras',         'beauty & health',
        'sports & fitness',           "kids' fashion",
                   'music', 'grocery & gourmet foods',
            'pet supplies',           "women's shoes",
     'home, kitchen, pets',     'industrial supplies']
Length: 20, dtype: string

In [20]:
# Melihat unique kolom 'sub_category'

df_products['sub_category'].unique()

<StringArray>
[        'Air Conditioners',        'Nursing & Feeding',
             'Casual Shoes',                  'Wallets',
          'Car Accessories',                  'Diapers',
           'Amazon Fashion',         'T-shirts & Polos',
            'Bedroom Linen',             'Toys & Games',
 ...
             'Sports Shoes',          'STEM Toys Store',
        'Strollers & Prams', 'Suitcases & Trolley Bags',
              'Televisions',  'Test, Measure & Inspect',
    'The Designer Boutique',             'Value Bazaar',
             'Western Wear',                     'Yoga']
Length: 112, dtype: string

In [21]:
# Untuk melihat secara general, dataframe berbentuk seperti apa

df_products.describe()

Unnamed: 0,ratings,no_of_ratings,discount_price,actual_price,current_price
count,739116.0,739053.0,980844.0,1067544.0,1067544.0
mean,3.832311,840.847039,2623.161,23111.28,2872.724
std,0.756101,8651.895695,9458.191,13550810.0,9565.771
min,1.0,1.0,8.0,0.0,0.0
25%,3.5,4.0,389.0,990.0,399.0
50%,3.9,20.0,679.0,1599.0,699.0
75%,4.3,133.0,1399.0,2999.0,1549.0
max,5.0,589547.0,1249990.0,9900000000.0,1249990.0


### Analisis ratings

In [22]:
# Mengagregat value ratings dengan pembagian kelompok kolom 'main_category'

df_products.groupby('main_category')['ratings'].agg(['mean', 'min', 'max', 'count']).sort_values(by='mean', ascending=False)

Unnamed: 0_level_0,mean,min,max,count
main_category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
grocery & gourmet foods,4.072739,1.0,5.0,6038
toys & baby products,4.030417,1.0,5.0,10514
pet supplies,4.029325,1.0,5.0,2844
beauty & health,4.002085,1.0,5.0,15446
home & kitchen,3.977118,1.0,5.0,26554
music,3.964741,1.0,5.0,1696
stores,3.946108,1.0,5.0,56728
bags & luggage,3.939759,1.0,5.0,7314
accessories,3.904224,1.0,5.0,134838
kids' fashion,3.861064,1.0,5.0,12672


In [23]:
# Mengagregat value ratings dengan pembagian kelompok kolom 'main_category' & sub category

df_products.groupby(['main_category', 'sub_category'])['ratings'].agg(['mean', 'min', 'max', 'count']).sort_values(by='mean', ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,min,max,count
main_category,sub_category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
toys & baby products,International Toy Store,4.578947,4.0,4.9,38
beauty & health,Value Bazaar,4.304545,3.7,4.6,132
grocery & gourmet foods,All Grocery & Gourmet Foods,4.176296,2.0,5.0,1890
beauty & health,Health & Personal Care,4.174317,2.4,5.0,2196
toys & baby products,Strollers & Prams,4.127092,1.0,5.0,502
...,...,...,...,...,...
"tv, audio & cameras",Headphones,3.503177,1.0,5.0,13786
car & motorbike,Car Electronics,3.497718,1.0,5.0,1490
beauty & health,Personal Care Appliances,3.455056,1.0,5.0,712
"tv, audio & cameras",Security Cameras,3.319143,1.0,5.0,8118


Dengan dua analisis diatas, kita menemukan bahwa 'grocery & gourmet foods' dalam `main_category` memiliki rata2 `rating` terbesar diikuti oleh dengan 'toys & baby products' dan 'beauty & health'.

Melihat lebih detail ke `sub_category`, kita menemukan bahwa hasil tetap konsisten dengan `main_category` sebelumnya.  Dipimpin 'international toy store', 'value bazaar;, dan 'all grocery & gourmet foods' dalam `sub_category`. 

kedua `main_category` dan `sub_category` konsisten dengan urutan terakhir `main_category` dengan 'home, kitchen, pets 'Refurbished & Open Box'.

### Prices

In [24]:
# Mengagregat value current_price dengan pembagian kelompok kolom 'main_category'.

df_products.groupby('main_category')['current_price'].agg(['mean', 'min', 'max', 'count']).sort_values(by='mean', ascending=False)

Unnamed: 0_level_0,mean,min,max,count
main_category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
accessories,5615.944755,4.0,499999.0,222342
appliances,5463.66877,20.0,230000.0,62652
music,4004.035453,50.0,76900.0,2076
stores,3922.920501,0.0,406009.0,64670
"home, kitchen, pets",3643.705882,1199.0,8541.0,34
"tv, audio & cameras",3427.44535,0.0,1249990.0,132714
men's shoes,2600.654225,25.0,66649.0,109726
bags & luggage,2317.375915,10.0,140000.0,19964
industrial supplies,2309.831312,35.94,144856.0,8020
sports & fitness,1750.67304,50.0,225250.0,24506


In [25]:
# Mengagregat value 'current_prices' dengan menambahkan pembagian kolom 'sub_category'

df_products.groupby(['main_category', 'sub_category'])['current_price'].agg(['mean', 'min', 'max', 'count']).sort_values(by='mean', ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,min,max,count
main_category,sub_category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"tv, audio & cameras",Televisions,39849.728827,100.0,1249990.0,1466
appliances,Air Conditioners,38725.549240,199.0,128800.0,1000
accessories,Gold & Diamond Jewellery,29929.124631,110.0,468012.0,31486
sports & fitness,Cardio Equipment,28846.721122,235.0,225250.0,392
appliances,Refrigerators,19792.077584,99.0,189200.0,3278
...,...,...,...,...,...
grocery & gourmet foods,"Coffee, Tea & Beverages",498.725428,8.0,7900.0,2568
beauty & health,Beauty & Grooming,481.695149,19.0,9200.0,3830
home & kitchen,Sewing & Craft Supplies,438.800707,35.0,15999.0,2518
grocery & gourmet foods,Snack Foods,392.350987,9.0,3999.0,2088


Dengan dua analisis diatas, kita dapat lihat bahwa walaupun 'appliences' dan 'accesories' merupakan `main_category` dengan nilai rata-rata `current_price` paling tinggi. Cukup mengejutkan dalam `sub_category` dipuncaki oleh 'televisions' dimana `main_category`nya merupakan urutan ketujuh dalam `main_category`.

Hal ini bisa kita lihat karena memang 'televisions' memiliki price max yang sangat besar dengan 1249990. 

### Discount

In [26]:
# Membuat kolom discount

df_products['discount'] = df_products['actual_price'] - df_products['current_price']

In [27]:
df_products.groupby('main_category')['discount'].agg(['mean', 'max']).sort_values(by='mean', ascending=False)

Unnamed: 0_level_0,mean,max
main_category,Unnamed: 1_level_1,Unnamed: 2_level_1
home & kitchen,684923.029268,9899999000.0
sports & fitness,6333.442432,61082300.0
appliances,2668.417563,370000.0
accessories,2520.361441,793991.0
"tv, audio & cameras",2298.101095,344910.0
bags & luggage,1730.531382,96000.0
music,1617.782216,52441.0
men's shoes,1252.831183,26956.0
industrial supplies,1136.832185,45050.0
women's clothing,1110.078051,27000.0


In [28]:
df_products.groupby(['main_category', 'sub_category'])['discount'].agg(['mean', 'sum', 'count']).sort_values(by='mean', ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,sum,count
main_category,sub_category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
home & kitchen,Garden & Outdoors,9.017007e+06,1.980135e+10,2196
sports & fitness,Running,6.834569e+04,1.237057e+08,1810
"tv, audio & cameras",Televisions,2.238134e+04,3.281104e+07,1466
sports & fitness,Cardio Equipment,1.857530e+04,7.281519e+06,392
appliances,Air Conditioners,1.608968e+04,1.608968e+07,1000
...,...,...,...,...
beauty & health,Health & Personal Care,2.381061e+02,5.228810e+05,2196
grocery & gourmet foods,Snack Foods,1.970310e+02,4.114006e+05,2088
beauty & health,Value Bazaar,1.579241e+02,2.084598e+04,132
grocery & gourmet foods,All Grocery & Gourmet Foods,1.519794e+02,2.899768e+05,1908


Hasil yang ditemuka dari analisis diatas konsisten untuk jumlah rata-rata diskon diuruti paling tinggi dalam `main_category` adalah 'home & kitchen' dan 'sport & fitness' dan juga konsisten dalam `sub_category` yang dipimpin oleh 'Garden & Outdoors' dan 'Running.

'televisions' muncul kembali dalam `sub_category` urutan ketiga.

Kita juga akan melihat bagaimana tidak hanya diskon, tetapi juga persentase diskon dalam suatu produk.

In [29]:
# Membuat kolom 'discount percentage'

df_products['discount_percentage'] = (df_products['actual_price'] - df_products['current_price'])/df_products['actual_price']

In [30]:
df_products.groupby('main_category')['discount_percentage'].agg(['mean', 'max']).sort_values(by='mean', ascending=False)

Unnamed: 0_level_0,mean,max
main_category,Unnamed: 1_level_1,Unnamed: 2_level_1
women's clothing,0.565146,0.949703
accessories,0.505547,0.999
"tv, audio & cameras",0.470715,0.991559
home & kitchen,0.458333,1.0
bags & luggage,0.457284,0.999
sports & fitness,0.445736,0.999989
car & motorbike,0.433778,0.979098
kids' fashion,0.42209,0.959111
men's clothing,0.412937,0.945541
stores,0.406195,0.99


In [31]:
df_products.groupby(['main_category', 'sub_category'])['discount_percentage'].agg(['mean', 'sum']).sort_values(by='mean', ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,sum
main_category,sub_category,Unnamed: 2_level_1,Unnamed: 3_level_1
accessories,Fashion & Silver Jewellery,0.643045,24289.093307
women's clothing,Ethnic Wear,0.627406,23551.569500
women's clothing,Clothing,0.595726,22676.908288
accessories,Jewellery,0.575426,21667.077817
women's clothing,Western Wear,0.558816,21251.765064
...,...,...,...
grocery & gourmet foods,All Grocery & Gourmet Foods,0.176166,336.124467
"home, kitchen, pets",Refurbished & Open Box,0.172337,5.859442
stores,Sportswear,0.148848,2118.100738
grocery & gourmet foods,"Coffee, Tea & Beverages",0.147204,378.018619


Hasil yang ditemukan cukup mengejutkan dikarenakan hasil analisis `discount_percentage` dan `discount` cukup berbeda.

Perbedaan itu dapat dilihat dengan bagaiaman `main_category` dipimpin oleh 'women's clothing', 'accesories', dan 'tv, audio, & cameras' dalam sisi `discount_percentage`. Hal ini konsisten dengan analisis `sub_category` dipimpin dengan sub-kategori yang memiliki `main_category` yang konsisten dengan analisis. Dipimpin oleh 'Fashing & Silver Jewellery', 'Ethnic Wear', & 'Clothing'.