# **The Analysis of Sales Dataset**

## **About Data**

Title       : Sales Dataset

Dataset     : [link](https://www.kaggle.com/datasets/sahilislam007/sales-dataset)

## **Import Libraries**

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## **Data Exploration**

### **Download and Load CSV**

In [2]:
# Download csv

import kagglehub

path = kagglehub.dataset_download("sahilislam007/sales-dataset")



In [3]:
# Load csv
df = pd.read_csv(path + "/Sales Dataset.csv")

### **Sneak Peak Data**

In [4]:
# See the top 10 of the data
df.head(10)

Unnamed: 0.1,Unnamed: 0,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount
0,0,2023-11-24,Male,34,Beauty,3,50,150
1,1,2023-02-27,Female,26,Clothing,2,500,1000
2,2,2023-01-13,Male,50,Electronics,1,30,30
3,3,2023-05-21,Male,37,Clothing,1,500,500
4,4,2023-05-06,Male,30,Beauty,2,50,100
5,5,2023-04-25,Female,45,Beauty,1,30,30
6,6,2023-03-13,Male,46,Clothing,2,25,50
7,7,2023-02-22,Male,30,Electronics,4,25,100
8,8,2023-12-13,Male,63,Electronics,2,300,600
9,9,2023-10-07,Female,52,Clothing,4,50,200


In [5]:
# See the columns name
df.columns

Index(['Unnamed: 0', 'Date', 'Gender', 'Age', 'Product Category', 'Quantity',
       'Price per Unit', 'Total Amount'],
      dtype='object')

In [6]:
# See the data's shape
print(f"There are {df.shape[0]} rows and {df.shape[1]} columns") 

There are 1000 rows and 8 columns


In [7]:
# See the columns details
df.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Unnamed: 0        1000 non-null   int64 
 1   Date              1000 non-null   object
 2   Gender            1000 non-null   object
 3   Age               1000 non-null   int64 
 4   Product Category  1000 non-null   object
 5   Quantity          1000 non-null   int64 
 6   Price per Unit    1000 non-null   int64 
 7   Total Amount      1000 non-null   int64 
dtypes: int64(5), object(3)
memory usage: 62.6+ KB


In [8]:
# Count null values
df.isna().sum()

Unnamed: 0          0
Date                0
Gender              0
Age                 0
Product Category    0
Quantity            0
Price per Unit      0
Total Amount        0
dtype: int64

### **Findings**
1. There are 1000 rows and 8 columns
2. The columns of the dataset are: 
      
      (['Unnamed: 0', 'Date', 'Gender', 'Age', 'Product Category', 'Quantity',
       'Price per Unit', 'Total Amount'])
3. There are some columns that have wrong datatype
4. There is no missing or null value 
5. There is a unknown column's name

### **Change Column Name**

In [9]:
# Copy the original table
df_sales = df.copy()

In [10]:
# Changing unknown column's name
df_sales.rename(columns={'Unnamed: 0' : 'Row Number'}, inplace=True)

In [11]:
# Check changing
df_sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Row Number        1000 non-null   int64 
 1   Date              1000 non-null   object
 2   Gender            1000 non-null   object
 3   Age               1000 non-null   int64 
 4   Product Category  1000 non-null   object
 5   Quantity          1000 non-null   int64 
 6   Price per Unit    1000 non-null   int64 
 7   Total Amount      1000 non-null   int64 
dtypes: int64(5), object(3)
memory usage: 62.6+ KB


In [12]:
df_sales.head()

Unnamed: 0,Row Number,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount
0,0,2023-11-24,Male,34,Beauty,3,50,150
1,1,2023-02-27,Female,26,Clothing,2,500,1000
2,2,2023-01-13,Male,50,Electronics,1,30,30
3,3,2023-05-21,Male,37,Clothing,1,500,500
4,4,2023-05-06,Male,30,Beauty,2,50,100


### **Change Columns Datatype**

In [13]:
# Change Date column datatype from object to datetime
df_sales['Date'] = pd.to_datetime(df_sales['Date'])

In [14]:
# Check changing
df_sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Row Number        1000 non-null   int64         
 1   Date              1000 non-null   datetime64[ns]
 2   Gender            1000 non-null   object        
 3   Age               1000 non-null   int64         
 4   Product Category  1000 non-null   object        
 5   Quantity          1000 non-null   int64         
 6   Price per Unit    1000 non-null   int64         
 7   Total Amount      1000 non-null   int64         
dtypes: datetime64[ns](1), int64(5), object(2)
memory usage: 62.6+ KB


### **Change Column Values**
The row number start with 0, so I want to change the number with +1 for all the row number

In [15]:
# Change row number values
df_sales['Row Number'] = df_sales['Row Number'] + 1

In [16]:
# Check row number changing
df_sales.head()

Unnamed: 0,Row Number,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount
0,1,2023-11-24,Male,34,Beauty,3,50,150
1,2,2023-02-27,Female,26,Clothing,2,500,1000
2,3,2023-01-13,Male,50,Electronics,1,30,30
3,4,2023-05-21,Male,37,Clothing,1,500,500
4,5,2023-05-06,Male,30,Beauty,2,50,100


### **Check Other Columns For Possible Error**

In [17]:
# Check the unique values in gender column
df_sales['Gender'].unique()

array(['Male', 'Female'], dtype=object)

In [18]:
# Check the unique values in age column
df_sales['Age'].unique()

array([34, 26, 50, 37, 30, 45, 46, 63, 52, 23, 35, 22, 64, 42, 19, 27, 47,
       62, 18, 49, 28, 38, 43, 39, 44, 51, 58, 48, 55, 20, 40, 54, 36, 31,
       21, 57, 25, 56, 29, 61, 32, 41, 59, 60, 33, 53, 24])

In [19]:
# Check the unique values in product category column
df_sales['Product Category'].unique()

array(['Beauty', 'Clothing', 'Electronics'], dtype=object)

### **Duplicates**

In [20]:
# Show duplicate
df_sales_duplicate = df_sales.duplicated()

In [21]:
# Show all duplicates
print(df_sales_duplicate)

0      False
1      False
2      False
3      False
4      False
       ...  
995    False
996    False
997    False
998    False
999    False
Length: 1000, dtype: bool


In [22]:
# Count duplications
df_sales_duplicate.value_counts()

False    1000
Name: count, dtype: int64

## **Data Transformation**

In [23]:
df_sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Row Number        1000 non-null   int64         
 1   Date              1000 non-null   datetime64[ns]
 2   Gender            1000 non-null   object        
 3   Age               1000 non-null   int64         
 4   Product Category  1000 non-null   object        
 5   Quantity          1000 non-null   int64         
 6   Price per Unit    1000 non-null   int64         
 7   Total Amount      1000 non-null   int64         
dtypes: datetime64[ns](1), int64(5), object(2)
memory usage: 62.6+ KB


In [24]:
df_sales.head(10)

Unnamed: 0,Row Number,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount
0,1,2023-11-24,Male,34,Beauty,3,50,150
1,2,2023-02-27,Female,26,Clothing,2,500,1000
2,3,2023-01-13,Male,50,Electronics,1,30,30
3,4,2023-05-21,Male,37,Clothing,1,500,500
4,5,2023-05-06,Male,30,Beauty,2,50,100
5,6,2023-04-25,Female,45,Beauty,1,30,30
6,7,2023-03-13,Male,46,Clothing,2,25,50
7,8,2023-02-22,Male,30,Electronics,4,25,100
8,9,2023-12-13,Male,63,Electronics,2,300,600
9,10,2023-10-07,Female,52,Clothing,4,50,200


### **Add Days Name Columns**

In [25]:
# Add days name
df_sales['Day'] = df_sales['Date'].dt.day_name()

In [26]:
# Check the new column
df_sales.head()

Unnamed: 0,Row Number,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount,Day
0,1,2023-11-24,Male,34,Beauty,3,50,150,Friday
1,2,2023-02-27,Female,26,Clothing,2,500,1000,Monday
2,3,2023-01-13,Male,50,Electronics,1,30,30,Friday
3,4,2023-05-21,Male,37,Clothing,1,500,500,Sunday
4,5,2023-05-06,Male,30,Beauty,2,50,100,Saturday


In [60]:
df_sales_days = df_sales.groupby(df_sales['Day'], as_index=False)[['Quantity', 'Total Amount']].sum()

  df_sales_days = df_sales.groupby(df_sales['Day'], as_index=False)[['Quantity', 'Total Amount']].sum()


In [61]:
df_sales_days.head(7)

Unnamed: 0,Day,Quantity,Total Amount
0,Monday,385,70250
1,Tuesday,397,69440
2,Wednesday,356,58770
3,Thursday,301,53835
4,Friday,373,66290
5,Saturday,373,78815
6,Sunday,329,58600


### **Add Age Segment Column**

In [27]:
# Add age segment column
df_sales['Age Segment'] = np.where(df_sales['Age'] <= 17, 'Youth',
                            np.where(df_sales['Age'] <= 34, 'Young Adult',
                            np.where(df_sales['Age'] <= 54, 'Adult','Senior')))

In [28]:
# Check the new column
df_sales.head()

Unnamed: 0,Row Number,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount,Day,Age Segment
0,1,2023-11-24,Male,34,Beauty,3,50,150,Friday,Young Adult
1,2,2023-02-27,Female,26,Clothing,2,500,1000,Monday,Young Adult
2,3,2023-01-13,Male,50,Electronics,1,30,30,Friday,Adult
3,4,2023-05-21,Male,37,Clothing,1,500,500,Sunday,Adult
4,5,2023-05-06,Male,30,Beauty,2,50,100,Saturday,Young Adult


### **Add Month and Monthly Sales Columns**

In [29]:
# Add month column
df_sales['Month'] = df_sales['Date'].dt.to_period('M')

In [30]:
# Check new column
df_sales.head()

Unnamed: 0,Row Number,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount,Day,Age Segment,Month
0,1,2023-11-24,Male,34,Beauty,3,50,150,Friday,Young Adult,2023-11
1,2,2023-02-27,Female,26,Clothing,2,500,1000,Monday,Young Adult,2023-02
2,3,2023-01-13,Male,50,Electronics,1,30,30,Friday,Adult,2023-01
3,4,2023-05-21,Male,37,Clothing,1,500,500,Sunday,Adult,2023-05
4,5,2023-05-06,Male,30,Beauty,2,50,100,Saturday,Young Adult,2023-05


In [None]:
# Add monthly sales column
df_sales['Monthly Sales'] = df_sales.groupby('Month')['Total Amount'].transform('sum')

In [32]:
# Check new column
df_sales.head()

Unnamed: 0,Row Number,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount,Day,Age Segment,Month,Monthly Sales
0,1,2023-11-24,Male,34,Beauty,3,50,150,Friday,Young Adult,2023-11,34920
1,2,2023-02-27,Female,26,Clothing,2,500,1000,Monday,Young Adult,2023-02,44060
2,3,2023-01-13,Male,50,Electronics,1,30,30,Friday,Adult,2023-01,35450
3,4,2023-05-21,Male,37,Clothing,1,500,500,Sunday,Adult,2023-05,53150
4,5,2023-05-06,Male,30,Beauty,2,50,100,Saturday,Young Adult,2023-05,53150


### **Add Quarter and Quarter Sales Columns**

In [33]:
# Add quarter column
df_sales['Quarter'] = 'Q' + df_sales['Date'].dt.quarter.astype(str)

In [34]:
# Check new column
df_sales.head()

Unnamed: 0,Row Number,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount,Day,Age Segment,Month,Monthly Sales,Quarter
0,1,2023-11-24,Male,34,Beauty,3,50,150,Friday,Young Adult,2023-11,34920,Q4
1,2,2023-02-27,Female,26,Clothing,2,500,1000,Monday,Young Adult,2023-02,44060,Q1
2,3,2023-01-13,Male,50,Electronics,1,30,30,Friday,Adult,2023-01,35450,Q1
3,4,2023-05-21,Male,37,Clothing,1,500,500,Sunday,Adult,2023-05,53150,Q2
4,5,2023-05-06,Male,30,Beauty,2,50,100,Saturday,Young Adult,2023-05,53150,Q2


In [35]:
# Add quarter sales column
df_sales['Quarter Sales'] = df_sales.groupby('Quarter')['Total Amount'].transform('sum')

In [36]:
# Check new column
df_sales.head()

Unnamed: 0,Row Number,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount,Day,Age Segment,Month,Monthly Sales,Quarter,Quarter Sales
0,1,2023-11-24,Male,34,Beauty,3,50,150,Friday,Young Adult,2023-11,34920,Q4,126190
1,2,2023-02-27,Female,26,Clothing,2,500,1000,Monday,Young Adult,2023-02,44060,Q1,110030
2,3,2023-01-13,Male,50,Electronics,1,30,30,Friday,Adult,2023-01,35450,Q1,110030
3,4,2023-05-21,Male,37,Clothing,1,500,500,Sunday,Adult,2023-05,53150,Q2,123735
4,5,2023-05-06,Male,30,Beauty,2,50,100,Saturday,Young Adult,2023-05,53150,Q2,123735


### **Add Monthly Growth Percentation**

#### **Create New Data Frame**

In [37]:
# New data frame
df_month = df_sales.groupby('Month', as_index=False)['Monthly Sales'].sum()

In [38]:
# Check changing
df_month.head()

Unnamed: 0,Month,Monthly Sales
0,2023-01,2694200
1,2023-02,3745100
2,2023-03,2116270
3,2023-04,2912820
4,2023-05,5580750


#### **Search Monthly Growt Precentation**

In [39]:
# Add previous month column
df_month['Previous Monthly Sales'] = df_month['Monthly Sales'].shift(1)

In [40]:
# Check changing
df_month.head()

Unnamed: 0,Month,Monthly Sales,Previous Monthly Sales
0,2023-01,2694200,
1,2023-02,3745100,2694200.0
2,2023-03,2116270,3745100.0
3,2023-04,2912820,2116270.0
4,2023-05,5580750,2912820.0


In [41]:
# Search %growth
# And round the value with only 2 decimals
df_month['Growt Percentation'] = (
    (df_month['Monthly Sales'] - df_month['Previous Monthly Sales']) 
    / df_month['Previous Monthly Sales'] * 100).round(2)

In [42]:
# Check changing
df_month.head()

Unnamed: 0,Month,Monthly Sales,Previous Monthly Sales,Growt Percentation
0,2023-01,2694200,,
1,2023-02,3745100,2694200.0,39.01
2,2023-03,2116270,3745100.0,-43.49
3,2023-04,2912820,2116270.0,37.64
4,2023-05,5580750,2912820.0,91.59


## **Pivot Table**

In [43]:
df_sales.head()

Unnamed: 0,Row Number,Date,Gender,Age,Product Category,Quantity,Price per Unit,Total Amount,Day,Age Segment,Month,Monthly Sales,Quarter,Quarter Sales
0,1,2023-11-24,Male,34,Beauty,3,50,150,Friday,Young Adult,2023-11,34920,Q4,126190
1,2,2023-02-27,Female,26,Clothing,2,500,1000,Monday,Young Adult,2023-02,44060,Q1,110030
2,3,2023-01-13,Male,50,Electronics,1,30,30,Friday,Adult,2023-01,35450,Q1,110030
3,4,2023-05-21,Male,37,Clothing,1,500,500,Sunday,Adult,2023-05,53150,Q2,123735
4,5,2023-05-06,Male,30,Beauty,2,50,100,Saturday,Young Adult,2023-05,53150,Q2,123735


### **Age Segment**

#### **Age Segment - Total Spending**

In [44]:
# Total spending
pd.pivot_table(
    data=df_sales,
    index=['Age Segment'],
    values='Total Amount',
    aggfunc=['sum', 'count', 'mean'],
    margins=True,
    margins_name='Total'
)

Unnamed: 0_level_0,sum,count,mean
Unnamed: 0_level_1,Total Amount,Total Amount,Total Amount
Age Segment,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Adult,194070,432,449.236111
Senior,90190,216,417.546296
Young Adult,171740,352,487.897727
Total,456000,1000,456.0


#### **Age Segment - Product Category**

In [45]:
# Product Category
pd.pivot_table(
    data=df_sales,
    index=['Age Segment'],
    values='Total Amount',
    columns='Product Category',
    aggfunc=['sum', 'count', 'mean'],
    margins=True,
    margins_name='Total'
)

Unnamed: 0_level_0,sum,sum,sum,sum,count,count,count,count,mean,mean,mean,mean
Product Category,Beauty,Clothing,Electronics,Total,Beauty,Clothing,Electronics,Total,Beauty,Clothing,Electronics,Total
Age Segment,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
Adult,65400,60470,68200,194070,124,153,155,432,527.419355,395.228758,440.0,449.236111
Senior,20670,31310,38210,90190,62,80,74,216,333.387097,391.375,516.351351,417.546296
Young Adult,57445,63800,50495,171740,121,118,113,352,474.752066,540.677966,446.858407,487.897727
Total,143515,155580,156905,456000,307,351,342,1000,467.47557,443.247863,458.78655,456.0


#### **Age Segment - Quarter**

In [46]:
# Quarter
pd.pivot_table(
    data=df_sales,
    index=['Age Segment'],
    values='Total Amount',
    columns='Quarter',
    aggfunc=['sum', 'count', 'mean'],
    margins=True,
    margins_name='Total'
)

Unnamed: 0_level_0,sum,sum,sum,sum,sum,count,count,count,count,count,mean,mean,mean,mean,mean
Quarter,Q1,Q2,Q3,Q4,Total,Q1,Q2,Q3,Q4,Total,Q1,Q2,Q3,Q4,Total
Age Segment,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2
Adult,48350,52770,35330,57620,194070,96,124,96,116,432,503.645833,425.564516,368.020833,496.724138,449.236111
Senior,16990,24880,24490,23830,90190,48,61,56,51,216,353.958333,407.868852,437.321429,467.254902,417.546296
Young Adult,44690,46085,36225,44740,171740,92,83,79,98,352,485.76087,555.240964,458.544304,456.530612,487.897727
Total,110030,123735,96045,126190,456000,236,268,231,265,1000,466.228814,461.697761,415.779221,476.188679,456.0


### **Gender Segment**

#### **Gender Segment - Total Spending**

In [47]:
# Total spending
pd.pivot_table(
    data=df_sales,
    index=['Gender'],
    values='Total Amount',
    aggfunc=['sum', 'count', 'mean'],
    margins=True,
    margins_name='Total'
)

Unnamed: 0_level_0,sum,count,mean
Unnamed: 0_level_1,Total Amount,Total Amount,Total Amount
Gender,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Female,232840,510,456.54902
Male,223160,490,455.428571
Total,456000,1000,456.0


#### **Gender Segment - Product Category**

In [48]:
# Product category
pd.pivot_table(
    data=df_sales,
    index=['Gender'],
    values=['Total Amount'],
    columns='Product Category',
    aggfunc=['sum', 'count', 'mean'],
    margins=True,
    margins_name='Total'
)

Unnamed: 0_level_0,sum,sum,sum,sum,count,count,count,count,mean,mean,mean,mean
Unnamed: 0_level_1,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount
Product Category,Beauty,Clothing,Electronics,Total,Beauty,Clothing,Electronics,Total,Beauty,Clothing,Electronics,Total
Gender,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3
Female,74830,81275,76735,232840,166,174,170,510,450.783133,467.097701,451.382353,456.54902
Male,68685,74305,80170,223160,141,177,172,490,487.12766,419.80226,466.104651,455.428571
Total,143515,155580,156905,456000,307,351,342,1000,467.47557,443.247863,458.78655,456.0


#### **Gender Segment - Quarter**

In [49]:
# Quarter
pd.pivot_table(
    data=df_sales,
    index=['Gender'],
    values=['Total Amount'],
    columns='Quarter',
    aggfunc=['sum', 'count', 'mean'],
    margins=True,
    margins_name='Total'
)

Unnamed: 0_level_0,sum,sum,sum,sum,sum,count,count,count,count,count,mean,mean,mean,mean,mean
Unnamed: 0_level_1,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount,Total Amount
Quarter,Q1,Q2,Q3,Q4,Total,Q1,Q2,Q3,Q4,Total,Q1,Q2,Q3,Q4,Total
Gender,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3
Female,52440,58105,55500,66795,232840,117,134,122,137,510,448.205128,433.619403,454.918033,487.554745,456.54902
Male,57590,65630,40545,59395,223160,119,134,109,128,490,483.94958,489.776119,371.972477,464.023438,455.428571
Total,110030,123735,96045,126190,456000,236,268,231,265,1000,466.228814,461.697761,415.779221,476.188679,456.0


### **Day Segment**

In [50]:
# Make days order
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

# Change days order
df_sales['Day'] = pd.Categorical(df_sales['Day'], categories=day_order, ordered=True)

#### **Day Segment - Total Spending**

In [51]:
# Total spending
pd.pivot_table(
    data=df_sales,
    index=['Day'],
    values='Total Amount',
    aggfunc=['sum', 'count', 'mean'],
    margins=True,
    margins_name='Total'
)

  pd.pivot_table(
  pd.pivot_table(
  pd.pivot_table(


Unnamed: 0_level_0,sum,count,mean
Unnamed: 0_level_1,Total Amount,Total Amount,Total Amount
Day,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Monday,70250,146,481.164384
Tuesday,69440,161,431.304348
Wednesday,58770,139,422.805755
Thursday,53835,123,437.682927
Friday,66290,143,463.566434
Saturday,78815,150,525.433333
Sunday,58600,138,424.637681
Total,456000,1000,456.0


#### **Day Segment - Product Category**

In [52]:
# Product Category
pd.pivot_table(
    data=df_sales,
    index=['Day'],
    values='Total Amount',
    columns='Product Category',
    aggfunc=['sum', 'count', 'mean'],
    margins=True,
    margins_name='Total'
)

  pd.pivot_table(
  pd.pivot_table(
  pd.pivot_table(


Unnamed: 0_level_0,sum,sum,sum,sum,count,count,count,count,mean,mean,mean,mean
Product Category,Beauty,Clothing,Electronics,Total,Beauty,Clothing,Electronics,Total,Beauty,Clothing,Electronics,Total
Day,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
Monday,28685,18275,23290,70250,46,46,54,146,623.586957,397.282609,431.296296,481.164384
Tuesday,20355,23725,25360,69440,56,55,50,161,363.482143,431.363636,507.2,431.304348
Wednesday,15285,23260,20225,58770,42,50,47,139,363.928571,465.2,430.319149,422.805755
Thursday,18380,21190,14265,53835,39,49,35,123,471.282051,432.44898,407.571429,437.682927
Friday,25395,23455,17440,66290,44,50,49,143,577.159091,469.1,355.918367,463.566434
Saturday,23205,23480,32130,78815,49,44,57,150,473.571429,533.636364,563.684211,525.433333
Sunday,12210,22195,24195,58600,31,57,50,138,393.870968,389.385965,483.9,424.637681
Total,143515,155580,156905,456000,307,351,342,1000,467.47557,443.247863,458.78655,456.0


1. Rearrange rows based date **DONE**
2. Search %growth = (thismonth - lastmonth)/lastmonth **DONE**
3. lastmonth using .shift() **DONE**
4. create pivot table
    - Age segment behavior (total spending, product category, when) **DONE**
    - days **DONE**, monthly, and quarter customer behavior
    - Gender spending behavior **DONE**

## **Download Final Dataframe**

In [61]:
# Download dataframe to local
df_sales.to_csv("final_sales_dataframe.csv", index=False)

In [62]:
# Download df_sales_days to local
df_sales_days.to_csv("df_sales_days.csv", index=False)