# E-Commerce Sales Analysis
### Problem Statement: 
    -Analyze and clean the e-commerce sales dataset. Handle missing values in Product_Category, impute Order_Amount, and analyze payment types across categories.
    -Key Pandas/NumPy Concepts: .isnull(), .fillna(), .groupby(), .value_counts(), .pivot_table(), .map(), np.where(), np.nan

In [28]:
import numpy as np
import pandas as pd

In [41]:
#load the data
df = pd.read_csv("ecommerce_sales.csv")
df = df.copy()
df

Unnamed: 0,Order_ID,Customer_ID,Order_Date,Product_Category,Payment_Type,Order_Amount,Delivery_Status
0,1001,2038,2023-01-01,Clothing,,,Cancelled
1,1002,2044,2023-01-02,Clothing,Credit Card,250.0,Pending
2,1003,2013,2023-01-03,Beauty,Cash,450.0,Returned
3,1004,2009,2023-01-04,,Credit Card,600.0,Delivered
4,1005,2010,2023-01-05,Electronics,,100.0,Cancelled
5,1006,2012,2023-01-06,Clothing,,,Cancelled
6,1007,2006,2023-01-07,Beauty,Debit Card,,Cancelled
7,1008,2016,2023-01-08,,Credit Card,250.0,Delivered
8,1009,2001,2023-01-09,Home,Debit Card,450.0,Pending
9,1010,2017,2023-01-10,,Credit Card,250.0,Returned


# EDA

In [43]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Order_ID          50 non-null     int64  
 1   Customer_ID       50 non-null     int64  
 2   Order_Date        50 non-null     object 
 3   Product_Category  41 non-null     object 
 4   Payment_Type      41 non-null     object 
 5   Order_Amount      35 non-null     float64
 6   Delivery_Status   50 non-null     object 
dtypes: float64(1), int64(2), object(4)
memory usage: 2.9+ KB


In [44]:
df.isnull().sum()

Order_ID             0
Customer_ID          0
Order_Date           0
Product_Category     9
Payment_Type         9
Order_Amount        15
Delivery_Status      0
dtype: int64

In [45]:
df['Product_Category'].value_counts()

Product_Category
Clothing       11
Beauty         11
Electronics    11
Home            8
Name: count, dtype: int64

In [46]:
df.fillna({"Product_Category":"Home"}, inplace=True)

In [47]:
df.isnull().sum()

Order_ID             0
Customer_ID          0
Order_Date           0
Product_Category     0
Payment_Type         9
Order_Amount        15
Delivery_Status      0
dtype: int64

In [48]:
df

Unnamed: 0,Order_ID,Customer_ID,Order_Date,Product_Category,Payment_Type,Order_Amount,Delivery_Status
0,1001,2038,2023-01-01,Clothing,,,Cancelled
1,1002,2044,2023-01-02,Clothing,Credit Card,250.0,Pending
2,1003,2013,2023-01-03,Beauty,Cash,450.0,Returned
3,1004,2009,2023-01-04,Home,Credit Card,600.0,Delivered
4,1005,2010,2023-01-05,Electronics,,100.0,Cancelled
5,1006,2012,2023-01-06,Clothing,,,Cancelled
6,1007,2006,2023-01-07,Beauty,Debit Card,,Cancelled
7,1008,2016,2023-01-08,Home,Credit Card,250.0,Delivered
8,1009,2001,2023-01-09,Home,Debit Card,450.0,Pending
9,1010,2017,2023-01-10,Home,Credit Card,250.0,Returned


In [49]:
df['Payment_Type'].value_counts()

Payment_Type
Credit Card    18
Cash           14
Debit Card      9
Name: count, dtype: int64

In [50]:
df.fillna({"Payment_Type":"Credit Card"}, inplace=True)

In [51]:
df

Unnamed: 0,Order_ID,Customer_ID,Order_Date,Product_Category,Payment_Type,Order_Amount,Delivery_Status
0,1001,2038,2023-01-01,Clothing,Credit Card,,Cancelled
1,1002,2044,2023-01-02,Clothing,Credit Card,250.0,Pending
2,1003,2013,2023-01-03,Beauty,Cash,450.0,Returned
3,1004,2009,2023-01-04,Home,Credit Card,600.0,Delivered
4,1005,2010,2023-01-05,Electronics,Credit Card,100.0,Cancelled
5,1006,2012,2023-01-06,Clothing,Credit Card,,Cancelled
6,1007,2006,2023-01-07,Beauty,Debit Card,,Cancelled
7,1008,2016,2023-01-08,Home,Credit Card,250.0,Delivered
8,1009,2001,2023-01-09,Home,Debit Card,450.0,Pending
9,1010,2017,2023-01-10,Home,Credit Card,250.0,Returned


In [52]:
df.isnull().sum()

Order_ID             0
Customer_ID          0
Order_Date           0
Product_Category     0
Payment_Type         0
Order_Amount        15
Delivery_Status      0
dtype: int64

In [53]:
df['Order_Amount'].mean()

np.float64(297.14285714285717)

In [56]:
df["Order_Amount"].fillna(df["Order_Amount"].mean().round(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["Order_Amount"].fillna(df["Order_Amount"].mean().round(), inplace=True)


In [57]:
df

Unnamed: 0,Order_ID,Customer_ID,Order_Date,Product_Category,Payment_Type,Order_Amount,Delivery_Status
0,1001,2038,2023-01-01,Clothing,Credit Card,297.0,Cancelled
1,1002,2044,2023-01-02,Clothing,Credit Card,250.0,Pending
2,1003,2013,2023-01-03,Beauty,Cash,450.0,Returned
3,1004,2009,2023-01-04,Home,Credit Card,600.0,Delivered
4,1005,2010,2023-01-05,Electronics,Credit Card,100.0,Cancelled
5,1006,2012,2023-01-06,Clothing,Credit Card,297.0,Cancelled
6,1007,2006,2023-01-07,Beauty,Debit Card,297.0,Cancelled
7,1008,2016,2023-01-08,Home,Credit Card,250.0,Delivered
8,1009,2001,2023-01-09,Home,Debit Card,450.0,Pending
9,1010,2017,2023-01-10,Home,Credit Card,250.0,Returned


In [60]:
df.isnull().sum()

Order_ID            0
Customer_ID         0
Order_Date          0
Product_Category    0
Payment_Type        0
Order_Amount        0
Delivery_Status     0
dtype: int64

In [58]:
df.to_csv("cleaned_ecommerce_data.csv", index = False)

In [59]:
df

Unnamed: 0,Order_ID,Customer_ID,Order_Date,Product_Category,Payment_Type,Order_Amount,Delivery_Status
0,1001,2038,2023-01-01,Clothing,Credit Card,297.0,Cancelled
1,1002,2044,2023-01-02,Clothing,Credit Card,250.0,Pending
2,1003,2013,2023-01-03,Beauty,Cash,450.0,Returned
3,1004,2009,2023-01-04,Home,Credit Card,600.0,Delivered
4,1005,2010,2023-01-05,Electronics,Credit Card,100.0,Cancelled
5,1006,2012,2023-01-06,Clothing,Credit Card,297.0,Cancelled
6,1007,2006,2023-01-07,Beauty,Debit Card,297.0,Cancelled
7,1008,2016,2023-01-08,Home,Credit Card,250.0,Delivered
8,1009,2001,2023-01-09,Home,Debit Card,450.0,Pending
9,1010,2017,2023-01-10,Home,Credit Card,250.0,Returned
