<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img
 src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/alx-courses/aice/assets/Content_page_banner_blue_dots.png"
 alt="ALX Content Header"
 class="full-width-image"
/>
</div>

# **Regional Showdown: 2014 Business Performance Battle**

## **Background**

Superstore HQ's executive leadership is preparing for a strategic review of regional performance. As the company approaches the end of the fiscal year, key decisions must be made about future investments, regional support, and operational efficiency. However, the current performance data is fragmented across spreadsheets, and informal reporting has led to inconsistent interpretations of success across regions.
To move forward with clarity and confidence, leadership requires an evidence-based performance report one that consolidates reliable metrics and provides objective comparisons between regions.





![Regional Managers Planning](https://raw.githubusercontent.com/DesmondMokhali/sql_assessment/main/manager_meeting.jpg)


## **Problem**

There is currently no standardized or centralized method to assess and compare regional performance. Without clear insights, the leadership team cannot make informed decisions about where to allocate resources or identify top-performing regions.
As a **Data Analyst**, you have been formally tasked with producing a detailed analysis that addresses this gap. You must extract meaningful trends from the raw data, identify performance indicators, and deliver a structured set of insights. Your analysis will be the foundation for critical decision-making at the executive level.

## **Learning objectives**
This exercise is designed to simulate the responsibilities of a data analyst working in a business environment where data integrity, strategic reporting, and analytical precision are essential.

You will learn how to:

- Write production-ready SQL to answer real business questions.
- Query and join tables effectively in notebooks.
- Use aggregations and numeric functions to compute KPIs.
- Build region-level summaries based on sales, profit, order volumes, and other performance indicators,
- Translate raw transactional data into actionable insights for stakeholders.

By the end of this project, you will have demonstrated how SQL can be used as a powerful tool to support strategic business decisions.

### 1. Install the required libraries (if not yet installed)
In a Jupyter notebook cell, run:

In [1]:
# !pip install pandas sqlalchemy ipython-sql

### 2. Load CSV and Set Up SQLite Database for SQL Queries in Jupyter

In [2]:
import pandas as pd
from sqlalchemy import create_engine

# Step 1: Load CSV
df = pd.read_csv("superstore.csv")  # Make sure this file is in your folder

# Step 2: Save to file-based SQLite DB
engine = create_engine("sqlite:///superstore.db")
df.to_sql("superstore", con=engine, index=False, if_exists="replace")

# Step 3: Use SQL magic
%load_ext sql
%sql sqlite:///superstore.db

### 3. Review the first 5 rows

In [3]:
%%sql
SELECT * FROM superstore LIMIT 5;

 * sqlite:///superstore.db
Done.


Category,City,Country,Customer.ID,Customer.Name,Discount,Market,ËÆ∞ÂΩïÊï∞,Order.Date,Order.ID,Order.Priority,Product.ID,Product.Name,Profit,Quantity,Region,Row.ID,Sales,Segment,Ship.Date,Ship.Mode,Shipping.Cost,State,Sub.Category,Year,Market2,weeknum
Office Supplies,Los Angeles,United States,LS-172304,Lycoris Saunders,0.0,US,1,2011-01-07 00:00:00.000,CA-2011-130813,High,OFF-PA-10002005,Xerox 225,9.3312,3,West,36624,19,Consumer,2011-01-09 00:00:00.000,Second Class,4.37,California,Paper,2011,North America,2
Office Supplies,Los Angeles,United States,MV-174854,Mark Van Huff,0.0,US,1,2011-01-21 00:00:00.000,CA-2011-148614,Medium,OFF-PA-10002893,"Wirebound Service Call Books, 5 1/2"" x 4""",9.2928,2,West,37033,19,Consumer,2011-01-26 00:00:00.000,Standard Class,0.94,California,Paper,2011,North America,4
Office Supplies,Los Angeles,United States,CS-121304,Chad Sievert,0.0,US,1,2011-08-05 00:00:00.000,CA-2011-118962,Medium,OFF-PA-10000659,"Adams Phone Message Book, Professional, 400 Message Capacity, 5 3/6‚Äù x 11‚Äù",9.8418,3,West,31468,21,Consumer,2011-08-09 00:00:00.000,Standard Class,1.81,California,Paper,2011,North America,32
Office Supplies,Los Angeles,United States,CS-121304,Chad Sievert,0.0,US,1,2011-08-05 00:00:00.000,CA-2011-118962,Medium,OFF-PA-10001144,Xerox 1913,53.2608,2,West,31469,111,Consumer,2011-08-09 00:00:00.000,Standard Class,4.59,California,Paper,2011,North America,32
Office Supplies,Los Angeles,United States,AP-109154,Arthur Prichep,0.0,US,1,2011-09-29 00:00:00.000,CA-2011-146969,High,OFF-PA-10002105,Xerox 223,3.1104,1,West,32440,6,Consumer,2011-10-03 00:00:00.000,Standard Class,1.32,California,Paper,2011,North America,40


# Integrated project notebook

## 1. SQL in Production 

### **Task 1.** 
### Which region generated the highest total sales in year 2014?

In [None]:
%%sql

# Add your code here


### **Task 2.**

### What is the average order quantity by region for the past quarter (Q4 2014)?

In [None]:
%%sql

# Add your code here


<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>Hold on...</strong> Sales alone don‚Äôt tell the full story. What if a region makes fewer sales but has better margins?
</div>

### **Task 3.**
### If one region consistently has lower sales but higher profit margins, how might that affect decisions about future investments in that region?

A region with lower sales but higher profit margins may indicate a smaller, wealthier customer base or efficient operations focused on premium products. Socio-economic factors like income levels and infrastructure (reliable transport, digital connectivity, and utilities) may influence operational efficiency and market reach. If the market is less mature, lacking modern retail channels, strong competition, or broad consumer awareness, these limitations can restrict sales despite high profitability. Future investment decisions might focus on sustainable growth that aligns with the region‚Äôs economic context without eroding margins.

## 2. Querying in Notebooks

### **Task 4.** 
### List the top 3 Region by number of orders placed.


In [None]:
%%sql

# Add your code here


### **Task 5.**
### Which product category have the least contribution to overall profit?

In [None]:
%%sql

# Add your code here


<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>Hmm... </strong> Some categories might be profitable, but are those profits sustainable?  
What if many customers are returning those products due to dissatisfaction or mismatch?  
Even if return data isn't available, it's important to consider how returns could impact long-term profitability and reputation.
</div>

### **Task 6.** 
### If you observed a high number of returns in a specific product category, what factors would you investigate, and what actions would you recommend as a regional manager?

If I observed a high number of returns in a specific product category, I would first investigate the reasons behind them, such as product defects, inaccurate descriptions, unmet expectations, or delivery issues. I would also review customer feedback, assess supplier quality, and check if certain regions or customer segments are more affected. Based on these findings, I‚Äôd recommend improving quality control, clarifying product listings, and offering staff training on product fit. If necessary, I‚Äôd remove or replace underperforming items to protect customer trust and long-term profitability.

## 3. Numeric Functions & Aggregations

### **Task 7.**
### Calculate the average delivery delay per region (Order Date vs Ship Date).

In [None]:
%%sql

# Add your code here


### **Task 8.** 
### Which customer segment yields the highest average order value?

In [None]:
%%sql

# Add your code here

<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>Efficiency and value matter.</strong> While understanding delivery delays helps reveal operational efficiency by region, analyzing customer segments shows where the most valuable orders come from.  
  Even if two regions achieve similar sales, differences in delivery performance and customer mix can lead to very different costs and profitability.
</div>


### **Task 9.**
### Two regions have similar total sales, but one region has significantly higher shipping costs. What factors could explain this difference, and how might you address it as a regional manager?

## 4. Window Functions

### **Task 10.**
### Rank customers within each region by total sales and find the top customer per region.

In [None]:
%%sql

# Add your code here

### **Task 11.**
### Show the year-over-year sales growth per region.

In [None]:
%%sql
WITH YearlySales AS (
  SELECT 
    Region,
    strftime('%Y', "Order.Date") AS Year,
    SUM(Sales) AS Total_Sales
  # Add code here
)

SELECT 
  cur.Region,
  cur.Year AS Current_Year,
  cur.Total_Sales AS Current_Year_Sales,
  prev.Total_Sales AS Previous_Year_Sales,
  ROUND(
    CASE 
      WHEN prev.Total_Sales IS NULL OR prev.Total_Sales = 0 THEN NULL
      ELSE ((cur.Total_Sales - prev.Total_Sales) * 100.0 / prev.Total_Sales)
    END, 2
  ) AS YoY_Growth_Percent
# Add your code here
  ON cur.Region = prev.Region
  AND CAST(cur.Year AS INTEGER) = CAST(prev.Year AS INTEGER) + 1
ORDER BY cur.Region, cur.Year;



<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>Trends tell a story.</strong> Ranking customers by sales highlights key contributors in each region, while year-over-year sales growth reveals how regions perform over time.  
  Sudden changes, like a sharp growth in Q3 followed by a decline in Q4, could signal potential issues affecting future performance.
</div>


### **Task 12.**
### If a region experienced strong sales growth in Q3 but saw a decline in Q4, how should this pattern affect their ranking on the final leaderboard? What factors would you consider in your evaluation?

<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>Loyalty check! </strong> Let's find out which region's customers keep coming back for more.
Repeat business might just be the secret weapon of the winning region.
</div>

### **Task 13.** 
### Which region has the highest number of repeat customers (customers who ordered more than once)

In [None]:
%%sql

 
# Add your code here


<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>And now‚Ä¶ the moment of truth.</strong> You‚Äôve uncovered key insights by analyzing repeat customers and other metrics.  
  It‚Äôs time to combine Sales, Profit, Delivery, and Loyalty into one overall score for each region.  
  Who will be crowned the Ultimate Regional Manager of the year?
</div>


### **Task 14**

![Regional Winner Celebration](https://raw.githubusercontent.com/DesmondMokhali/sql_assessment/main/regional_winner.jpg)


### Which regions rank highest overall in 2014 based on total sales, profit, quantity, delivery delay, and shipping cost? 

In [None]:
%%sql
WITH SalesProfit AS (
    SELECT 
        Region,
        ROUND(SUM(Sales), 2) AS Total_Sales,
        ROUND(SUM(Profit), 2) AS Total_Profit,
        ROUND(AVG(Quantity), 2) AS Avg_Quantity,
        ROUND(AVG(julianday("Ship.Date") - julianday("Order.Date")), 2) AS Avg_Delivery_Delay,
        ROUND(SUM("Shipping.Cost"), 2) AS Total_Shipping_Cost
    # Add your code here
),
RankedRegions AS (
    SELECT *,
        RANK() OVER (ORDER BY Total_Sales DESC) AS Sales_Rank,
        RANK() OVER (ORDER BY Total_Profit DESC) AS Profit_Rank,
        RANK() OVER (ORDER BY Avg_Delivery_Delay ASC) AS Delivery_Rank,
        RANK() OVER (ORDER BY Total_Shipping_Cost ASC) AS Shipping_Rank
    FROM SalesProfit
),
Final AS (
    SELECT *,
        (Sales_Rank + Profit_Rank + Delivery_Rank + Shipping_Rank) AS Overall_Rank_Score
    FROM RankedRegions
)
# Add your code here
;


### **Task 15**

In our current ranking, we treated all performance factors equally:  
- **Total Sales**  
- **Total Profit**  
- **Average Delivery Delay**  
- **Total Shipping Cost**  

Each rank was summed to create an overall score, where a **lower score indicates better performance**.

But what if we had **used a weighted scoring matrix** ‚Äî assigning importance (weights) to each metric?

![Regional Manager Score Matrix](https://raw.githubusercontent.com/DesmondMokhali/sql_assessment/main/Regional_Manager_Score_Matrix.png)



### Would the Winner Still Be the Same?


## üèÜ Congratulations!

You've worked through all critical SQL concepts in a real business scenario. Not only have you sharpened your SQL skills, but you‚Äôve also learned how to:

- Drive business decisions with data
- Use notebook-based SQL workflows
- Analyze performance across multiple KPIs


#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/refs/heads/master/ALX_banners/ALX_Navy.png"  style="width:100px"  ;/>
</div>