<a href="https://colab.research.google.com/github/satyakala-teja/analytics-capstone-satyakala/blob/main/notebooks/03_trend_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Trend Analysis for Sales Data

This notebook analyzes trends in sales, revenue, profit, category performance, and customer/order behavior.


## 1. Month-wise Sales Trend
Analyze how sales vary across different months.


In [1]:
import pandas as pd

df = pd.read_csv('/content/data/sales_data_cleaned.csv')
df.head()


Unnamed: 0,order_id,order_date,customer_id,category,sub_category,product,quantity,unit_price,sales,region,...,month_name,quarter,order_year,order_month,order_day,order_weekday,is_weekend,category_full,revenue_per_unit,high_value_order
0,1001,2023-01-02,C001,Office Supplies,Binders,Elastic Binder,2,5.0,10.0,East,...,January,1,2023,1,2,0,0,Office Supplies - Binders,5.0,No
1,1002,2023-01-03,C002,Furniture,Chairs,Ergo Chair,1,150.0,150.0,West,...,January,1,2023,1,3,1,0,Furniture - Chairs,150.0,No
2,1003,2023-01-04,C003,Technology,Phones,SmartPhone X,1,700.0,700.0,North,...,January,1,2023,1,4,2,0,Technology - Phones,700.0,Yes
3,1004,2023-01-05,C001,Office Supplies,Paper,Copy Paper,10,3.5,35.0,East,...,January,1,2023,1,5,3,0,Office Supplies - Paper,3.5,No
4,1005,2023-01-06,C004,Technology,Laptops,UltraBook Pro,1,1200.0,1200.0,South,...,January,1,2023,1,6,4,0,Technology - Laptops,1200.0,Yes


In [None]:
monthly_trend = df.groupby('month_name')['total_revenue'].sum().sort_values(ascending=False)
monthly_trend


Unnamed: 0_level_0,total_revenue
month_name,Unnamed: 1_level_1
January,2095.0


## 2. Region-wise Revenue Analysis
Understand which regions contribute the most to total revenue.


In [None]:
region_trend = df.groupby('region')['total_revenue'].sum().sort_values(ascending=False)
region_trend


Unnamed: 0_level_0,total_revenue
region,Unnamed: 1_level_1
South,1200.0
North,700.0
West,150.0
East,45.0


## 3. Category-wise Revenue Analysis
Analyze which product categories generate the highest revenue.


In [None]:
category_trend = df.groupby('category')['total_revenue'].sum().sort_values(ascending=False)
category_trend


Unnamed: 0_level_0,total_revenue
category,Unnamed: 1_level_1
Technology,1900.0
Furniture,150.0
Office Supplies,45.0


## 4. Sub-Category Revenue Analysis
Break down revenue by sub-categories for detailed product insights.


In [None]:
sub_category_trend = df.groupby('sub_category')['total_revenue'].sum().sort_values(ascending=False)
sub_category_trend


Unnamed: 0_level_0,total_revenue
sub_category,Unnamed: 1_level_1
Laptops,1200.0
Phones,700.0
Chairs,150.0
Paper,35.0
Binders,10.0


## 5. Top Products by Revenue
Identify the products generating the highest revenue.


In [None]:
top_products = df.groupby('product')['total_revenue'].sum().sort_values(ascending=False).head(5)
top_products


Unnamed: 0_level_0,total_revenue
product,Unnamed: 1_level_1
UltraBook Pro,1200.0
SmartPhone X,700.0
Ergo Chair,150.0
Copy Paper,35.0
Elastic Binder,10.0


## 6. Top Customers by Revenue
Identify customers who contribute the highest revenue.


In [2]:
top_customers = df.groupby('customer_id')['total_revenue'].sum().sort_values(ascending=False).head(5)
top_customers


Unnamed: 0_level_0,total_revenue
customer_id,Unnamed: 1_level_1
C004,1200.0
C003,700.0
C002,150.0
C001,45.0


## 7. Weekend vs Weekday Performance
Analyze how revenue differs between weekends and weekdays.


In [3]:
weekday_trend = df.groupby('is_weekend')['total_revenue'].sum()
weekday_trend


Unnamed: 0_level_0,total_revenue
is_weekend,Unnamed: 1_level_1
0,2095.0


## 8. High-Value Orders Analysis
Analyze how high-value orders (above threshold) are distributed across regions and categories.


In [4]:
high_value_counts = df['high_value_order'].value_counts()
high_value_counts


Unnamed: 0_level_0,count
high_value_order,Unnamed: 1_level_1
No,3
Yes,2


In [5]:
high_value_region = df.groupby('region')['high_value_order'].sum().sort_values(ascending=False)
high_value_region


Unnamed: 0_level_0,high_value_order
region,Unnamed: 1_level_1
North,Yes
South,Yes
East,NoNo
West,No


In [6]:
high_value_category = df.groupby('category')['high_value_order'].sum().sort_values(ascending=False)
high_value_category


Unnamed: 0_level_0,high_value_order
category,Unnamed: 1_level_1
Technology,YesYes
Office Supplies,NoNo
Furniture,No


## 9. Profit Trend Analysis
Understand how profit varies across time, categories, and regions.


In [7]:
monthly_profit = df.groupby('month_name')['profit'].sum().sort_values(ascending=False)
monthly_profit


Unnamed: 0_level_0,profit
month_name,Unnamed: 1_level_1
January,628.5


In [9]:
category_profit = df.groupby('category')['profit'].sum().sort_values(ascending=False)
category_profit


Unnamed: 0_level_0,profit
category,Unnamed: 1_level_1
Technology,570.0
Furniture,45.0
Office Supplies,13.5


In [10]:
region_profit = df.groupby('region')['profit'].sum().sort_values(ascending=False)
region_profit


Unnamed: 0_level_0,profit
region,Unnamed: 1_level_1
South,360.0
North,210.0
West,45.0
East,13.5


## 10. Revenue Per Unit Analysis
Identify which products generate the highest revenue per unit sold.


In [11]:
rpu_analysis = df.groupby('product')['revenue_per_unit'].mean().sort_values(ascending=False)
rpu_analysis


Unnamed: 0_level_0,revenue_per_unit
product,Unnamed: 1_level_1
UltraBook Pro,1200.0
SmartPhone X,700.0
Ergo Chair,150.0
Elastic Binder,5.0
Copy Paper,3.5


## 11. Region vs Category Revenue (Pivot Table)
Analyze combined performance of region and category.


In [12]:
pivot_region_category = df.pivot_table(
    values='total_revenue',
    index='region',
    columns='category',
    aggfunc='sum',
    fill_value=0
)
pivot_region_category


category,Furniture,Office Supplies,Technology
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
East,0.0,45.0,0.0
North,0.0,0.0,700.0
South,0.0,0.0,1200.0
West,150.0,0.0,0.0
