<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img
 src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/alx-courses/aice/assets/Content_page_banner_blue_dots.png"
 alt="ALX Content Header"
 class="full-width-image"
/>
</div>

# **Regional Showdown: 2014 Business Performance Battle**

## **Background**
The executive board at **Superstore HQ** is gearing up for the **end-of-year awards**. Four regional managers are vying for the title of **Top Performer**, and you're the data analyst trusted to deliver the truth.





![Regional Managers Planning](https://raw.githubusercontent.com/DesmondMokhali/sql_assessment/main/manager_meeting.jpg)


## **Problem**
There's no clear way to compare performance across regions using just spreadsheets — and rumors are flying between regional offices about who's doing better. The board needs a **data-backed report** to settle the debate and guide next quarter's investment. 



You, the data analyst, have been asked to gather key insights across all regions. Your SQL queries will feed into the final showdown metrics.

## **Learning objectives**

- Write production-ready SQL to answer real business questions.

- Query table effectively in notebooks.

- Use aggregations and numeric functions to compute KPIs.

- Apply window functions to analyze trends and rank data.

- Think critically to support data-driven decisions.

### 1. Install the required libraries (if not yet installed)
In a Jupyter notebook cell, run:

In [1]:
# !pip install pandas sqlalchemy ipython-sql

### 2. Load CSV and Set Up SQLite Database for SQL Queries in Jupyter

In [2]:
import pandas as pd
from sqlalchemy import create_engine

# Step 1: Load CSV
df = pd.read_csv("superstore.csv")  # Make sure this file is in your folder

# Step 2: Save to file-based SQLite DB
engine = create_engine("sqlite:///superstore.db")
df.to_sql("superstore", con=engine, index=False, if_exists="replace")

# Step 3: Use SQL magic
%load_ext sql
%sql sqlite:///superstore.db

### 3. Review the first 5 rows

In [3]:
%%sql
SELECT * FROM superstore LIMIT 5;

 * sqlite:///superstore.db
Done.


Category,City,Country,Customer.ID,Customer.Name,Discount,Market,记录数,Order.Date,Order.ID,Order.Priority,Product.ID,Product.Name,Profit,Quantity,Region,Row.ID,Sales,Segment,Ship.Date,Ship.Mode,Shipping.Cost,State,Sub.Category,Year,Market2,weeknum
Office Supplies,Los Angeles,United States,LS-172304,Lycoris Saunders,0.0,US,1,2011-01-07 00:00:00.000,CA-2011-130813,High,OFF-PA-10002005,Xerox 225,9.3312,3,West,36624,19,Consumer,2011-01-09 00:00:00.000,Second Class,4.37,California,Paper,2011,North America,2
Office Supplies,Los Angeles,United States,MV-174854,Mark Van Huff,0.0,US,1,2011-01-21 00:00:00.000,CA-2011-148614,Medium,OFF-PA-10002893,"Wirebound Service Call Books, 5 1/2"" x 4""",9.2928,2,West,37033,19,Consumer,2011-01-26 00:00:00.000,Standard Class,0.94,California,Paper,2011,North America,4
Office Supplies,Los Angeles,United States,CS-121304,Chad Sievert,0.0,US,1,2011-08-05 00:00:00.000,CA-2011-118962,Medium,OFF-PA-10000659,"Adams Phone Message Book, Professional, 400 Message Capacity, 5 3/6” x 11”",9.8418,3,West,31468,21,Consumer,2011-08-09 00:00:00.000,Standard Class,1.81,California,Paper,2011,North America,32
Office Supplies,Los Angeles,United States,CS-121304,Chad Sievert,0.0,US,1,2011-08-05 00:00:00.000,CA-2011-118962,Medium,OFF-PA-10001144,Xerox 1913,53.2608,2,West,31469,111,Consumer,2011-08-09 00:00:00.000,Standard Class,4.59,California,Paper,2011,North America,32
Office Supplies,Los Angeles,United States,AP-109154,Arthur Prichep,0.0,US,1,2011-09-29 00:00:00.000,CA-2011-146969,High,OFF-PA-10002105,Xerox 223,3.1104,1,West,32440,6,Consumer,2011-10-03 00:00:00.000,Standard Class,1.32,California,Paper,2011,North America,40


# Integrated project notebook

## 1. SQL in Production 

### **Task 1.** 
### Which region generated the highest total sales in year 2014?

In [4]:
%%sql

SELECT Region, SUM(Sales) AS Total_Sales
FROM superstore
WHERE strftime('%Y', "Order.Date") = '2014'
GROUP BY Region
ORDER BY Total_Sales DESC
;


 * sqlite:///superstore.db
Done.


Region,Total_Sales
Central,939062
South,545272
North,429178
Oceania,362440
Southeast Asia,323068
EMEA,301702
Africa,283034
North Asia,264560
Central Asia,259142
West,250664


### **Task 2.**

### What is the average order quantity by region for the past quarter (Q4 2014)?

In [5]:
%%sql
SELECT Region, ROUND(AVG(Quantity)) AS Avg_Order_Quantity
FROM superstore
WHERE DATE("Order.Date") BETWEEN '2014-10-01' AND '2014-12-31'
GROUP BY Region
ORDER BY Avg_Order_Quantity DESC;


 * sqlite:///superstore.db
Done.


Region,Avg_Order_Quantity
West,4.0
Southeast Asia,4.0
South,4.0
Oceania,4.0
North Asia,4.0
North,4.0
East,4.0
Central Asia,4.0
Central,4.0
Caribbean,4.0


<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>Hold on...</strong> Sales alone don’t tell the full story. What if a region makes fewer sales but has better margins?
</div>

### **Task 3.**
### If one region consistently has lower sales but higher profit margins, how might that affect decisions about future investments in that region?

A region with lower sales but higher profit margins may indicate a smaller, wealthier customer base or efficient operations focused on premium products. Socio-economic factors like income levels and infrastructure (reliable transport, digital connectivity, and utilities) may influence operational efficiency and market reach. If the market is less mature, lacking modern retail channels, strong competition, or broad consumer awareness, these limitations can restrict sales despite high profitability. Future investment decisions might focus on sustainable growth that aligns with the region’s economic context without eroding margins.

## 2. Querying in Notebooks

### **Task 4.** 
### List the top 3 Region by number of orders placed.


In [5]:
%%sql
SELECT Region, COUNT("Order.ID") AS Number_Of_Orders
FROM superstore
GROUP BY Region
ORDER BY Number_Of_Orders DESC
LIMIT 3;

 * sqlite:///superstore.db
Done.


Region,Number_Of_Orders
Central,11117
South,6645
EMEA,5029



### **Task 5.**
### Which product category have the least contribution to overall profit?

In [13]:
%%sql
SELECT
  Category,
  ROUND(SUM(Profit),2) AS total_profit
FROM
  superstore
GROUP BY
  Category
ORDER BY
  total_profit DESC


 * sqlite:///superstore.db
Done.


Category,total_profit
Technology,663778.73
Office Supplies,518473.83
Furniture,285204.72


<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>Hmm... </strong> Some categories might be profitable, but are those profits sustainable?  
What if many customers are returning those products due to dissatisfaction or mismatch?  
Even if return data isn't available, it's important to consider how returns could impact long-term profitability and reputation.
</div>

### **Task 6.** 
### If you observed a high number of returns in a specific product category, what factors would you investigate, and what actions would you recommend as a regional manager?

If I observed a high number of returns in a specific product category, I would first investigate the reasons behind them, such as product defects, inaccurate descriptions, unmet expectations, or delivery issues. I would also review customer feedback, assess supplier quality, and check if certain regions or customer segments are more affected. Based on these findings, I’d recommend improving quality control, clarifying product listings, and offering staff training on product fit. If necessary, I’d remove or replace underperforming items to protect customer trust and long-term profitability.

## 3. Numeric Functions & Aggregations

### **Task 7.**
### Calculate the average delivery delay per region (Order Date vs Ship Date).

In [19]:
%%sql
SELECT Region,
       ROUND(AVG(JULIANDAY("Ship.Date") - JULIANDAY("Order.Date")), 2) AS Avg_Delivery_Delay_Days
FROM superstore
GROUP BY Region
ORDER BY Avg_Delivery_Delay_Days ASC;


 * sqlite:///superstore.db
Done.


Region,Avg_Delivery_Delay_Days
Canada,3.68
Africa,3.91
East,3.91
North Asia,3.91
EMEA,3.93
Oceania,3.93
West,3.93
South,3.94
Caribbean,3.97
Central Asia,4.01


### **Task 8.** 
### Which customer segment yields the highest average order value?

In [20]:
%%sql
SELECT Segment, ROUND(AVG(Sales)) AS Avg_Order_Value
FROM superstore
GROUP BY Segment
ORDER BY Avg_Order_Value DESC
;


 * sqlite:///superstore.db
Done.


Segment,Avg_Order_Value
Corporate,248.0
Home Office,247.0
Consumer,245.0


<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>Efficiency and value matter.</strong> While understanding delivery delays helps reveal operational efficiency by region, analyzing customer segments shows where the most valuable orders come from.  
  Even if two regions achieve similar sales, differences in delivery performance and customer mix can lead to very different costs and profitability.
</div>


### **Task 9.**
### Two regions have similar total sales, but one region has significantly higher shipping costs. What factors could explain this difference, and how might you address it as a regional manager?

## 4. Window Functions

### **Task 10.**
### Rank customers within each region by total sales and find the top customer per region.

In [10]:
%%sql
SELECT Region, "Customer.Name", MAX(Total_Sales) AS Max_Sales 
FROM (
  SELECT Region, "Customer.Name", SUM(Sales) AS Total_Sales
  FROM superstore
  GROUP BY Region, "Customer.Name"
) GROUP BY Region;

 * sqlite:///superstore.db
Done.


Region,Customer.Name,Max_Sales
Africa,Barry Weirich,8957
Canada,Stuart Van,4009
Caribbean,Frank Merwin,4656
Central,Tamara Chand,27345
Central Asia,Cynthia Arntzen,10462
EMEA,Sally Hughsby,7537
East,Tom Ashbrook,13724
North,Fred Hopkins,11122
North Asia,Carol Adams,9055
Oceania,Dave Poirier,11865


### **Task 11.**
### Show the year-over-year sales growth per region.

In [11]:
%%sql
WITH YearlySales AS (
  SELECT 
    Region,
    strftime('%Y', "Order.Date") AS Year,
    SUM(Sales) AS Total_Sales
  FROM superstore
  GROUP BY Region, Year
)

SELECT 
  cur.Region,
  cur.Year AS Current_Year,
  cur.Total_Sales AS Current_Year_Sales,
  prev.Total_Sales AS Previous_Year_Sales,
  ROUND(
    CASE 
      WHEN prev.Total_Sales IS NULL OR prev.Total_Sales = 0 THEN NULL
      ELSE ((cur.Total_Sales - prev.Total_Sales) * 100.0 / prev.Total_Sales)
    END, 2
  ) AS YoY_Growth_Percent
FROM YearlySales cur
LEFT JOIN YearlySales prev
  ON cur.Region = prev.Region
  AND CAST(cur.Year AS INTEGER) = CAST(prev.Year AS INTEGER) + 1
ORDER BY cur.Region, cur.Year;



 * sqlite:///superstore.db
Done.


Region,Current_Year,Current_Year_Sales,Previous_Year_Sales,YoY_Growth_Percent
Africa,2011,127186,,
Africa,2012,144487,127186.0,13.6
Africa,2013,229069,144487.0,58.54
Africa,2014,283034,229069.0,23.56
Canada,2011,8507,,
Canada,2012,16099,8507.0,89.24
Canada,2013,19162,16099.0,19.03
Canada,2014,23164,19162.0,20.89
Caribbean,2011,57043,,
Caribbean,2012,64149,57043.0,12.46


<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>Trends tell a story.</strong> Ranking customers by sales highlights key contributors in each region, while year-over-year sales growth reveals how regions perform over time.  
  Sudden changes, like a sharp growth in Q3 followed by a decline in Q4, could signal potential issues affecting future performance.
</div>


### **Task 12.**
### If a region experienced strong sales growth in Q3 but saw a decline in Q4, how should this pattern affect their ranking on the final leaderboard? What factors would you consider in your evaluation?

<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>Loyalty check! </strong> Let's find out which region's customers keep coming back for more.
Repeat business might just be the secret weapon of the winning region.
</div>

### **Task 13.** 
### Which region has the highest number of repeat customers (customers who ordered more than once)

In [22]:
%%sql
SELECT Region, COUNT(DISTINCT "Customer.ID") AS Repeat_Customers
FROM superstore
WHERE "Customer.ID" IN (
    SELECT "Customer.ID"
    FROM superstore
    GROUP BY "Customer.ID"
    HAVING COUNT(DISTINCT "Order.ID") > 1
)
GROUP BY Region
ORDER BY Repeat_Customers DESC
LIMIT 1;


 * sqlite:///superstore.db
Done.


Region,Repeat_Customers
Central,2067


<!-- Note -->
<div style="background-color:#f5f5f5; border-left:5px solid rgb(180, 180, 180); padding:10px; border-radius:4px;">
  <strong>And now… the moment of truth.</strong> You’ve uncovered key insights by analyzing repeat customers and other metrics.  
  It’s time to combine Sales, Profit, Delivery, and Loyalty into one overall score for each region.  
  Who will be crowned the Ultimate Regional Manager of the year?
</div>


### **Task 14**

![Regional Winner Celebration](https://raw.githubusercontent.com/DesmondMokhali/sql_assessment/main/regional_winner.jpg)


### Which regions rank highest overall in 2014 based on total sales, profit, quantity, delivery delay, and shipping cost? 

In [13]:
%%sql
WITH SalesProfit AS (
    SELECT 
        Region,
        ROUND(SUM(Sales), 2) AS Total_Sales,
        ROUND(SUM(Profit), 2) AS Total_Profit,
        ROUND(AVG(Quantity), 2) AS Avg_Quantity,
        ROUND(AVG(julianday("Ship.Date") - julianday("Order.Date")), 2) AS Avg_Delivery_Delay,
        ROUND(SUM("Shipping.Cost"), 2) AS Total_Shipping_Cost
    FROM superstore
    WHERE strftime('%Y', "Order.Date") = '2014'
    GROUP BY Region
),
RankedRegions AS (
    SELECT *,
        RANK() OVER (ORDER BY Total_Sales DESC) AS Sales_Rank,
        RANK() OVER (ORDER BY Total_Profit DESC) AS Profit_Rank,
        RANK() OVER (ORDER BY Avg_Delivery_Delay ASC) AS Delivery_Rank,
        RANK() OVER (ORDER BY Total_Shipping_Cost ASC) AS Shipping_Rank
    FROM SalesProfit
),
Final AS (
    SELECT *,
        (Sales_Rank + Profit_Rank + Delivery_Rank + Shipping_Rank) AS Overall_Rank_Score
    FROM RankedRegions
)
SELECT 
    Region, 
    Total_Sales, 
    Total_Profit, 
    Avg_Quantity, 
    Avg_Delivery_Delay, 
    Total_Shipping_Cost, 
    Overall_Rank_Score
FROM Final
ORDER BY Overall_Rank_Score
;


 * sqlite:///superstore.db
Done.


Region,Total_Sales,Total_Profit,Avg_Quantity,Avg_Delivery_Delay,Total_Shipping_Cost,Overall_Rank_Score
North Asia,264560.0,52770.44,3.8,3.94,28013.83,21
West,250664.0,43900.63,3.9,3.82,26320.47,22
North,429178.0,56658.35,3.8,3.96,46422.47,23
Central Asia,259142.0,47547.48,3.75,3.89,28147.1,23
South,545272.0,51776.16,3.74,3.95,56937.41,24
Africa,283034.0,39331.47,2.31,3.93,30082.93,25
Central,939062.0,97723.57,3.71,3.98,101607.15,27
Canada,23164.0,5993.01,2.17,3.75,2498.4,28
Oceania,362440.0,31431.58,3.63,3.96,38488.33,30
EMEA,301702.0,22600.32,2.29,3.96,33366.04,31


### **Task 15**

In our current ranking, we treated all performance factors equally:  
- **Total Sales**  
- **Total Profit**  
- **Average Delivery Delay**  
- **Total Shipping Cost**  

Each rank was summed to create an overall score, where a **lower score indicates better performance**.

But what if we had **used a weighted scoring matrix** — assigning importance (weights) to each metric?

![Regional Manager Score Matrix](https://raw.githubusercontent.com/DesmondMokhali/sql_assessment/main/Regional_Manager_Score_Matrix.png)



### Would the Winner Still Be the Same?


## 🏆 Congratulations!

You've worked through all critical SQL concepts in a real business scenario. Not only have you sharpened your SQL skills, but you’ve also learned how to:

- Drive business decisions with data
- Use notebook-based SQL workflows
- Analyze performance across multiple KPIs


#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/refs/heads/master/ALX_banners/ALX_Navy.png"  style="width:100px"  ;/>
</div>