# üöÄ Data Engineering Project: Sales Data with Profit Margin KPI
This notebook demonstrates how to build an **end-to-end data engineering project** using Python, SQL, and visualization techniques.

**Goals:**
- Load and clean sales data
- Create a new KPI column: `Profit Margin (%)`
- Simulate a data pipeline and warehouse
- Visualize insights

Author: **Kynm Kumalo**


## üß± Step 1: Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3

## üì• Step 2: Load Dataset

In [None]:
# Load sales data
# Replace with your own path if running locally

file_path = 'sales_data_sample.csv'

df = pd.read_csv(file_path, encoding='latin1')
print(f"‚úÖ Data loaded successfully: {df.shape} rows")
df.head()

## ‚öôÔ∏è Step 3: Data Transformation & KPI Creation

In [None]:
# Create Profit Margin KPI
# Assuming cost = 70% of (PriceEach * QuantityOrdered)

df['COST'] = df['PRICEEACH'] * df['QUANTITYORDERED'] * 0.7
df['PROFIT_MARGIN'] = round(((df['SALES'] - df['COST']) / df['SALES']) * 100, 2)

# Save cleaned version
df.to_csv('sales_data_cleaned.csv', index=False)
print('‚úÖ Cleaned dataset with Profit Margin KPI saved!')
df[['ORDERNUMBER','SALES','COST','PROFIT_MARGIN']].head()

## üßä Step 4: Simulate Data Warehouse Using SQLite

In [None]:
# Create SQLite database to simulate a data warehouse
conn = sqlite3.connect('sales_dw.db')
df.to_sql('sales_data', conn, if_exists='replace', index=False)

query = '''
SELECT COUNTRY, YEAR_ID, ROUND(AVG(PROFIT_MARGIN), 2) AS AVG_MARGIN, SUM(SALES) AS TOTAL_SALES
FROM sales_data
GROUP BY COUNTRY, YEAR_ID
ORDER BY TOTAL_SALES DESC;
'''

summary = pd.read_sql(query, conn)
summary.head()

## üìä Step 5: Data Visualization

In [None]:
# Average Profit Margin by Country
plt.figure(figsize=(10,5))
sns.barplot(data=df.groupby('COUNTRY')['PROFIT_MARGIN'].mean().reset_index(),
            x='COUNTRY', y='PROFIT_MARGIN', palette='Blues_d')
plt.title('Average Profit Margin by Country')
plt.xticks(rotation=45)
plt.show()

## ‚úÖ Step 6: Summary
- Created a **Profit Margin KPI** using sales data
- Simulated ETL and data warehousing using SQLite
- Generated visual insights with Matplotlib and Seaborn

Next Steps:
- Deploy this on AWS or Snowflake
- Connect Power BI for dashboard reporting
- Upload the notebook and cleaned dataset to GitHub or Kaggle