# üìä Sales Data Analysis (Beginner Example)

**Business Question:** üëâ _What are the monthly sales trends for our products?_

### Steps:
1Ô∏è‚É£ Define problem  
2Ô∏è‚É£ Get CSV from Kaggle  
3Ô∏è‚É£ Clean data  
4Ô∏è‚É£ Plot monthly sales  
5Ô∏è‚É£ Plot top products  
6Ô∏è‚É£ Correlation between discount % & volume  
7Ô∏è‚É£ Visualization with matplotlib  
8Ô∏è‚É£ Recommendation  
9Ô∏è‚É£ Automation option  

## 1Ô∏è‚É£ Import Libraries & Load Data

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load CSV
df = pd.read_csv('../data/sample_sales_data.csv', parse_dates=['ORDERDATE'])

# Display first 5 rows
df.head()

## 2Ô∏è‚É£ Clean Data
- Handle missing dates
- Drop duplicates
- Create YearMonth column

In [2]:
# Drop rows with missing dates
df = df.dropna(subset=['ORDERDATE'])

# Drop duplicates
df = df.drop_duplicates()

# Create YearMonth for trend analysis
df['YearMonth'] = df['ORDERDATE'].dt.to_period('M')

# Save cleaned data
df.to_csv('../output/cleaned_sales_data.csv', index=False)

df.head()

## 3Ô∏è‚É£ Monthly Sales Trend

In [3]:
monthly_sales = df.groupby('YearMonth')['SALES'].sum().reset_index()
monthly_sales['YearMonth'] = monthly_sales['YearMonth'].dt.to_timestamp()

plt.figure(figsize=(12, 6))
sns.lineplot(data=monthly_sales, x='YearMonth', y='SALES', marker='o')
plt.title('Monthly Sales Trend')
plt.xlabel('Month')
plt.ylabel('Total Sales ($)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('../output/monthly_sales.png')
plt.show()

## 4Ô∏è‚É£ Top 10 Products by Sales

In [4]:
top_products = df.groupby('PRODUCTCODE')['SALES'].sum().sort_values(ascending=False).head(10)

plt.figure(figsize=(10, 6))
sns.barplot(x=top_products.values, y=top_products.index)
plt.title('Top 10 Products by Sales')
plt.xlabel('Total Sales ($)')
plt.tight_layout()
plt.savefig('../output/top_products.png')
plt.show()

## 5Ô∏è‚É£ Correlation: Discount % vs Quantity Ordered

In [5]:
# Simulate discount %
df['discount_percent'] = 100 - (df['PRICEEACH'] / df['MSRP'] * 100)

# Correlation
correlation = df['discount_percent'].corr(df['QUANTITYORDERED'])

print(f"Correlation between discount % and quantity ordered: {correlation:.2f}")

## 6Ô∏è‚É£ Business Recommendation

- Increase inventory of **top products** for **Q4 (October - December)**
- Use trend line to forecast inventory needs
- Monitor correlation of discount % to adjust promotional campaigns