# Data Exploration Notebook

This notebook performs basic data exploration on a dataset containing information about products listed on Amazon. The dataset includes various attributes such as product name, category, price, ratings, and more.

## Objective

The objective of this notebook is to gain a better understanding of the dataset by exploring its structure, contents, and basic statistics. We'll check the column names, data types, summary statistics, missing values, and unique values in categorical columns.

## Dataset Description

The dataset contains information about products listed on Amazon. It is stored in a CSV file named "amazon.csv" and is loaded into a pandas DataFrame for analysis.

## Approach

1. **Column Names**: We'll start by examining the names of all columns in the dataset to understand its structure.
2. **Basic Information**: Next, we'll display basic information about the dataset, including the number of non-null values and data types of each column.
3. **Summary Statistics**: We'll calculate summary statistics for numerical columns to get insights into the distribution of values.
4. **Missing Values**: We'll check for missing values in each column to assess data completeness.
5. **Unique Values**: Finally, we'll count the number of unique values in each categorical column to understand the diversity of categories.

By performing these exploratory analyses, we aim to gain insights into the dataset's characteristics and identify potential areas for further investigation or preprocessing.

Let's get started with the exploration!


In [1]:
# Import necessary libraries
import pandas as pd

# Load the dataset
data = pd.read_csv(r"C:\Users\KONZA-VDI\Desktop\DS\G00dlife-datascience\dataset\amazon.csv")

# Display the first few rows of the dataset
data.head()


Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹199,₹349,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,₹199,"₹1,899",90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,₹329,₹699,53%,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,₹154,₹399,61%,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


- Display Column names

In [2]:
# Check column names
column_names = data.columns
print("Column names:\n", column_names)




Column names:
 Index(['product_id', 'product_name', 'category', 'discounted_price',
       'actual_price', 'discount_percentage', 'rating', 'rating_count',
       'about_product', 'user_id', 'user_name', 'review_id', 'review_title',
       'review_content', 'img_link', 'product_link'],
      dtype='object')


In [3]:
# Display basic information about the dataset
data.info()

# Display summary statistics for numerical columns
summary_stats = data.describe()
print("\nSummary statistics:\n", summary_stats)



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1465 entries, 0 to 1464
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1465 non-null   object
 1   product_name         1465 non-null   object
 2   category             1465 non-null   object
 3   discounted_price     1465 non-null   object
 4   actual_price         1465 non-null   object
 5   discount_percentage  1465 non-null   object
 6   rating               1465 non-null   object
 7   rating_count         1463 non-null   object
 8   about_product        1465 non-null   object
 9   user_id              1465 non-null   object
 10  user_name            1465 non-null   object
 11  review_id            1465 non-null   object
 12  review_title         1465 non-null   object
 13  review_content       1465 non-null   object
 14  img_link             1465 non-null   object
 15  product_link         1465 non-null   object
dtypes: obj

In [4]:
# Check for missing values
missing_values = data.isnull().sum()
print("\nMissing values:\n", missing_values)




Missing values:
 product_id             0
product_name           0
category               0
discounted_price       0
actual_price           0
discount_percentage    0
rating                 0
rating_count           2
about_product          0
user_id                0
user_name              0
review_id              0
review_title           0
review_content         0
img_link               0
product_link           0
dtype: int64


In [5]:
# Check for unique values in categorical columns
unique_values = {}
for column in data.select_dtypes(include=['object']):
    unique_values[column] = data[column].nunique()

print("\nUnique values in categorical columns:\n", unique_values)


Unique values in categorical columns:
 {'product_id': 1351, 'product_name': 1337, 'category': 211, 'discounted_price': 550, 'actual_price': 449, 'discount_percentage': 92, 'rating': 28, 'rating_count': 1143, 'about_product': 1293, 'user_id': 1194, 'user_name': 1194, 'review_id': 1194, 'review_title': 1194, 'review_content': 1212, 'img_link': 1412, 'product_link': 1465}


In [6]:
# Check the number of rows and columns
num_rows, num_columns = data.shape
print("Number of rows:", num_rows)
print("Number of columns:", num_columns)


Number of rows: 1465
Number of columns: 16
