# CS 4342 Final Project - Amazon Sales Marketing Data

Analyzing the ratings of Amazon products

## Using Kagglehub to import the dataset
You may need to use `pip install kagglehub` to run this section.
The dataset is relatively small, so this should not take a long time.
We will print a couple example rows to see what each row looks like.

In [18]:
import kagglehub
import csv

# Download latest version
folder_path = kagglehub.dataset_download("karkavelrajaj/amazon-sales-dataset")
print(folder_path)
file_path = folder_path + "\\amazon.csv"

fields = []
text_rows = []

with open(file_path, "r", encoding="utf-8") as csvfile:
    csvreader = csv.reader(csvfile)

    fields = next(csvreader)
    for text_row in csvreader:
        text_rows.append(text_row)

print(fields)
print(text_rows[0])
print(text_rows[1])
print(text_rows[2])

C:\Users\asant\.cache\kagglehub\datasets\karkavelrajaj\amazon-sales-dataset\versions\1
['product_id', 'product_name', 'category', 'discounted_price', 'actual_price', 'discount_percentage', 'rating', 'rating_count', 'about_product', 'user_id', 'user_name', 'review_id', 'review_title', 'review_content', 'img_link', 'product_link']
['B07JW9H4J1', 'Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini (3 FT Pack of 1, Grey)', 'Computers&Accessories|Accessories&Peripherals|Cables&Accessories|Cables|USBCables', '₹399', '₹1,099', '64%', '4.2', '24,269', "High Compatibility : Compatible With iPhone 12, 11, X/XsMax/Xr ,iPhone 8/8 Plus,iPhone 7/7 Plus,iPhone 6s/6s Plus,iPhone 6/6 Plus,iPhone 5/5s/5c/se,iPad Pro,iPad Air 1/2,iPad mini 1/2/3,iPod nano7,iPod touch and more apple devices.|Fast Charge&Data Sync : It can charge and sync simultaneously at a rapid speed, Compatible with any charging adaptor, multi-port

## Preprocessing the data

We notice that some fields are strings depicting numbers, which is not desirable. Specifically fields 3-7 which are:
- Discounted price
- Actual price
- Percent discount
- Average rating
- Rating count
For our preprocessing we will convert these to numbers (all floats except rating count.)
Also, some numeric strings are in rupees, but we are more familiar with US dollars, so we will convert.

A few rows have problems, so we print out the undesirable sections. This is a small proportion (3 of >1k) of the dataset - we will leave these rows out.

We print out some of the corrected rows and see they now contain numbers with prices converted to USD.

In [21]:
rows = []

def rupee_str_to_usd_float(str):
    rupee_float = float(str[1:].replace(",", ""))
    usd_float = rupee_float * 0.011 # conversion rate
    return usd_float

print('Problematic rows:')
for text_row in text_rows:
    try:
        row = text_row
        row[3] = rupee_str_to_usd_float(text_row[3]) # process discount price (string) into number
        row[4] = rupee_str_to_usd_float(text_row[4]) # process full price (string) into number
        row[5] = float(text_row[5][:-1]) # process discount percent into number
        row[6] = float(text_row[6]) # process avg rating into number
        row[7] = int(text_row[7].replace(",", "")) # process int 
        rows.append(row)
    except ValueError:
        print(text_row[3:8])
        continue

print('Fixed rows:')
print(rows[0])
print(rows[1])
print(rows[2])

Problematic rows:
[2.189, 10.988999999999999, 80.0, 3.0, '']
[2.739, 10.988999999999999, 75.0, 5.0, '']
[23.089, 27.488999999999997, 16.0, '|', '992']
Fixed rows:
['B07JW9H4J1', 'Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini (3 FT Pack of 1, Grey)', 'Computers&Accessories|Accessories&Peripherals|Cables&Accessories|Cables|USBCables', 4.388999999999999, 12.088999999999999, 64.0, 4.2, 24269, "High Compatibility : Compatible With iPhone 12, 11, X/XsMax/Xr ,iPhone 8/8 Plus,iPhone 7/7 Plus,iPhone 6s/6s Plus,iPhone 6/6 Plus,iPhone 5/5s/5c/se,iPad Pro,iPad Air 1/2,iPad mini 1/2/3,iPod nano7,iPod touch and more apple devices.|Fast Charge&Data Sync : It can charge and sync simultaneously at a rapid speed, Compatible with any charging adaptor, multi-port charging station or power bank.|Durability : Durable nylon braided design with premium aluminum housing and toughened nylon fiber wound tightly around t