üìò Final Capstone Project ‚Äî ‚ÄúRetail & Student Analytics Dashboard‚Äù
# Final Capstone Project: Retail & Student Analytics Dashboard

---

## üìå Project Objective
This project integrates multiple datasets to perform **end-to-end data analysis**.  
You will:

1. Analyze **sales data** (retail business insights)
2. Analyze **student marks** (education insights)
3. Use Python, NumPy, Pandas, Matplotlib, Seaborn
4. Apply statistical analysis and visualization
5. Generate a **combined dashboard of insights**

---


## üìÇ Datasets Used
1Ô∏è‚É£ `sales_data.csv` ‚Äî Retail sales dataset  
2Ô∏è‚É£ `student_marks.csv` ‚Äî Student performance dataset

# 1Ô∏è‚É£ Import Libraries

In [None]:




import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")


# 2Ô∏è‚É£ Load Datasets

In [None]:


sales_df = pd.read_csv("datasets/sales_data.csv")
student_df = pd.read_csv("datasets/student_Marks.csv")

sales_df.head()

student_df.head()



## 3Ô∏è‚É£ Sales Data Analysis

In [None]:


# Convert ORDERDATE to datetime
sales_df['ORDERDATE'] = pd.to_datetime(sales_df['ORDERDATE'])

# Extract Year & Month
sales_df['Year'] = sales_df['ORDERDATE'].dt.year
sales_df['Month'] = sales_df['ORDERDATE'].dt.month

# KPI calculation
total_sales = sales_df['SALES'].sum()
avg_order_value = sales_df['SALES'].mean()
total_orders = sales_df['ORDERNUMBER'].nunique()
total_customers = sales_df['CUSTOMERNAME'].nunique()

print("Total Sales:", total_sales)
print("Average Order Value:", avg_order_value)
print("Total Orders:", total_orders)
print("Total Customers:", total_customers)

# Monthly sales trend
monthly_sales = sales_df.groupby(['Year','Month'])['SALES'].sum().reset_index()

plt.figure(figsize=(10,5))
sns.lineplot(x='Month', y='SALES', hue='Year', data=monthly_sales)
plt.title("Monthly Sales Trend by Year")
plt.show()

# Product Line Sales
product_sales = sales_df.groupby('PRODUCTLINE')['SALES'].sum().sort_values(ascending=False)

plt.figure(figsize=(10,6))
sns.barplot(x=product_sales.values, y=product_sales.index)
plt.title("Total Sales by Product Line")
plt.show()



## 4Ô∏è‚É£ Student Marks Analysis

In [None]:


# Distribution of Marks
plt.figure(figsize=(8,5))
sns.histplot(student_df['Marks'], bins=15, kde=True)
plt.title("Distribution of Student Marks")
plt.show()

# Study Time vs Marks
plt.figure(figsize=(8,5))
sns.scatterplot(x='time_study', y='Marks', data=student_df)
plt.title("Study Time vs Marks")
plt.show()

# Regression plot
plt.figure(figsize=(8,5))
sns.regplot(x='time_study', y='Marks', data=student_df)
plt.title("Regression: Study Time vs Marks")
plt.show()

# Number of Courses vs Marks
plt.figure(figsize=(8,5))
sns.boxplot(x='number_courses', y='Marks', data=student_df)
plt.title("Marks by Number of Courses")
plt.show()



## 5Ô∏è‚É£ Combined Insights

### Retail Dataset Insights:
- Monthly sales trends & seasonality
- Top performing product lines
- Deal size & customer segmentation insights

### Student Dataset Insights:
- Study time strongly correlates with marks
- Number of courses less impactful
- High-performing students identified

## 6Ô∏è‚É£ Optional Advanced Analysis

In [None]:


# Correlation heatmaps for both datasets
plt.figure(figsize=(5,4))
sns.heatmap(sales_df[['QUANTITYORDERED','PRICEEACH','SALES']].corr(), annot=True, cmap='coolwarm')
plt.title("Sales Correlation Heatmap")
plt.show()

plt.figure(figsize=(5,4))
sns.heatmap(student_df.corr(), annot=True, cmap='coolwarm')
plt.title("Student Marks Correlation Heatmap")
plt.show()



## 7Ô∏è‚É£ Summary & Learnings

- Combined business and academic datasets for **end-to-end analytics**
- Used Python, NumPy, Pandas, Matplotlib, Seaborn
- Calculated KPIs, correlations, distributions, trends
- Built a mini-dashboard for insights
- Prepared for **real-world projects**

# ‚úÖ Final Capstone Project Completed