## **DSCI 504**

### **Week 8: Corporate Backbrief Supporting Queries**

**Student Name:** SEIF KUNGULIO

### **Assignment Overview:**

The Outdoor Performance Center (OPC) is evaluating the potential acquisition of **Ord Cycles**, a mountain bike manufacturer. As a data analyst on the OPC Analytics Team, I was tasked with determining whether this acquisition is a sound strategic move. This analysis uses data housed in **PostgreSQL** and queried via **Azure Data Studio** to assess business growth, operational capacity, fraud risks, and configuration preferences based on historical mountain bike sales data from 2001 to 2020.

### **Understanding PostgreSQL’s Role in Analytics:**

PostgreSQL played a critical role in enabling efficient data exploration and analysis. Its support for relational structures, advanced querying (e.g., window functions, CTEs), and data integrity mechanisms made it an ideal platform for evaluating historical trends and generating actionable insights. However, PostgreSQL's lack of built-in visualization tools and the need for query optimization skills are noteworthy constraints.

### **1\. Adequacy of Mountain Bike Sales for Growth**

To determine if OPC has sufficient momentum in mountain bike sales to justify growth, I analyzed yearly sales totals filtered by the "mountain" description. The SQL query below aggregates total order value by year:

In [1]:
-- -- Aggregates total order value per year for 'mountain' bikes.
SELECT DATE_PART('year', ord_date) AS year,
       SUM(order_tot) AS total_sales
FROM dsci_504.orders o
JOIN dsci_504.order_items oi ON o.ord_id = oi.ord_id
JOIN dsci_504.products p ON oi.prod_id = p.prod_id
WHERE LOWER(p.prod_description) ILIKE '%mountain%'
GROUP BY year
ORDER BY year;

year,total_sales
2000,728573.98
2001,696021.07
2002,565160.48
2003,523462.78
2004,577862.23
2005,635804.33
2006,754774.44
2007,598244.74
2008,607583.88
2009,651427.8


The query reveals annual revenue from mountain bike sales, showing a stable performance from 2000 to 2016 with revenues in the range of $500K–$750K. However, a significant surge appears from 2017 onward, with sales jumping to over $100 million per year and peaking at $251M in 2020. This sharp uptick likely indicates an increased consumer interest or business expansion, making a strong case for OPC's strategic growth and Ord Cycles' acquisition.  

Additionally, I evaluated product descriptions to identify top-performing bike types:

In [2]:
-- Identifies top mountain bike descriptions by total revenue and count of orders.
SELECT p.prod_description, COUNT(*) AS orders, SUM(oi.line_total) AS revenue
FROM dsci_504.order_items oi
JOIN dsci_504.products p ON oi.prod_id = p.prod_id
WHERE LOWER(p.prod_description) ILIKE '%mountain%'
GROUP BY p.prod_description
ORDER BY revenue DESC;

prod_description,orders,revenue
Full-Suspension Mountain Bike,30857,293985124.14
Hardtail Mountain Bike,1894,8464768.19


Two mountain bike types dominate the sales:

- <span style="color: var(--vscode-foreground);">Full-Suspension Mountain Bike: $293M revenue from 30,857 orders.</span>
- Hardtail Mountain Bike: $8.4M from 1,894 orders

<span style="color: var(--vscode-foreground);">This suggests customers overwhelmingly prefer Full-Suspension builds, which should guide future inventory, marketing, and product alignment with Ord Cycles.</span>

### **2\. Fraud Risk Analysis**

To ensure due diligence, I evaluated potential fraud by identifying customers with a high frequency of returns and high refund totals. Customers flagged by the following query could indicate abusive return behavior:

In [3]:
-- Flags customers with more than 3 returns and refund total above $500.
SELECT c.cus_id, c.cus_first_name, c.cus_last_name, COUNT(*) AS return_count,
       SUM(r.return_amount) AS total_returned
FROM dsci_504.return_items r
JOIN dsci_504.returns ret ON r.rac_id = ret.rac_id
JOIN dsci_504.customers c ON ret.cus_id = c.cus_id
GROUP BY c.cus_id, c.cus_first_name, c.cus_last_name
HAVING COUNT(*) > 3 AND SUM(r.return_amount) > 500
ORDER BY total_returned DESC;

cus_id,cus_first_name,cus_last_name,return_count,total_returned
125,Harper,Robinson,7,51041.0
1393,Erin,Manning,7,48308.33
1821,Tristan,Bangeson,6,48255.84
1997,Alister,Connery,6,44394.0
2582,Carl,Stevens,7,43977.08
1675,Lee,Reynolds,7,43770.58
1648,Mads,Thomason,7,43770.58
1967,Margo,Brooks,5,42900.0
1163,Brian,Hart,5,42240.0
373,Carl,Saylor,7,42094.95


The results flagged over 600 customers with high return counts and refund values exceeding $500. For example:

- <span style="color: var(--vscode-foreground);">Harper Robinson had 7 returns totaling over $51,000.</span>
- Erin Manning and Tristan Bangeson had similar patterns

<span style="color: var(--vscode-foreground);">Such behavior could indicate abuse of the return policy, resulting in potential financial loss and operational inefficiency. This insight supports implementing stricter return controls and monitoring, especially important during and after the acquisition.</span>

### **3\. Warehouse Capacity to Meet Future Demand**

I explored warehouse performance and logistical readiness to determine if OPC must scale operations to accommodate Ord Cycles. The number of orders shipped per month offers insight into fulfillment volume:

In [4]:
-- Summarizes number of orders shipped per month for logistics trend analysis.
SELECT DATE_TRUNC('month', ord_ship_date) AS month,
       COUNT(ord_id) AS orders_shipped
FROM dsci_504.orders
WHERE ord_ship_date IS NOT NULL
GROUP BY month
ORDER BY month;

month,orders_shipped
2000-01-01 00:00:00-06,9
2000-02-01 00:00:00-06,8
2000-03-01 00:00:00-06,14
2000-04-01 00:00:00-06,12
2000-05-01 00:00:00-05,13
2000-06-01 00:00:00-05,16
2000-07-01 00:00:00-05,19
2000-08-01 00:00:00-05,14
2000-09-01 00:00:00-05,13
2000-10-01 00:00:00-05,14


Order fulfillment shows a clear growth trajectory:

- <span style="color: var(--vscode-foreground);">From 2000 to 2016: monthly shipping volumes hover between 5 to 20 orders.</span>
- In 2017 and beyond: a surge to 90–220+ shipments per month, peaking in 2020

<span style="color: var(--vscode-foreground);">This pattern suggests an increasing strain on warehouse operations, reinforcing the need to assess and potentially expand fulfillment infrastructure to meet growing demand, especially post-acquisition.</span>  

Warehouse order distribution was also assessed:

In [5]:
-- Counts total orders processed by each warehouse.
SELECT w.warehouse_name, COUNT(*) AS total_orders
FROM dsci_504.orders o
JOIN dsci_504.warehouses w ON o.warehouse_id = w.warehouse_id
GROUP BY w.warehouse_name
ORDER BY total_orders DESC;

warehouse_name,total_orders
Sacramento,4251
Dallas,4243
Columbus,4105


Order processing is evenly split across three main warehouses:

- <span style="color: var(--vscode-foreground);">Sacramento: 4,251 orders.</span>
- Dallas: 4,243 orders.
- Columbus: 4,105 orders.

<span style="color: var(--vscode-foreground);">Although the distribution is relatively balanced, the high volumes indicate that any further load due to Ord Cycles could require logistical scaling or redistribution strategies to prevent bottlenecks.</span>

**4\. Preferred Mountain Bike Configurations**

Understanding which bike configurations are most popular helps align Ord Cycles' product offerings with customer preferences. The following query identifies the most frequently built configurations based on component count:

In [6]:
-- Determines most popular bike builds based on component count and usage.
SELECT b.build_name, COUNT(DISTINCT bc.comp_id) AS component_count, COUNT(*) AS popularity
FROM dsci_504.builds b
JOIN dsci_504.build_components bc ON b.build_id = bc.build_id
GROUP BY b.build_name
ORDER BY popularity DESC
LIMIT 5;

build_name,component_count,popularity
wild ride,11,11
pinnacle,10,10
bull,10,10
loam assault,10,10
max traction,10,10


The most popular bike builds include:

- <span style="color: var(--vscode-foreground);">Wild Ride (11 components, 11 builds).</span>
- Pinnacle, Bull, Loam Assault, Max Traction (10 components each, 10 builds).

<span style="color: var(--vscode-foreground);">These configurations reflect consumer preferences, and serve as a guide for product development alignment post-acquisition. Ord Cycles should consider focusing on builds with 10–11 components, which are trending in OPC’s history.</span>

### **5\. Projected Growth Over the Next Quarter**

To anticipate demand in the near term, I used the average of recent quarters' sales as a forecast for upcoming growth:

In [7]:
-- Retrieve total order values grouped by quarter for short-term sales forecasting.
SELECT DATE_TRUNC('quarter', ord_date) AS quarter,
       SUM(order_tot) AS total_sales
FROM dsci_504.orders
GROUP BY quarter
ORDER BY quarter DESC
LIMIT 4;

quarter,total_sales
2025-10-01 00:00:00-05,7223014.85
2025-07-01 00:00:00-05,7216169.54
2025-04-01 00:00:00-05,8633167.27
2025-01-01 00:00:00-06,6228300.81


Recent quarterly revenues:

- <span style="color: var(--vscode-foreground);">Q1 2025: <b>$6.2M</b></span>
- Q2 2025: **$8.6M**
- Q3 2025: **$7.2M**
- Q4 2025: **$7.2M**

<span style="color: var(--vscode-foreground);">The trend shows steady revenue in the <b>$6M–$8M</b> range, confirming stable short-term demand. This forecast supports investment decisions in operations, warehouse scaling, and marketing, ensuring readiness to integrate Ord Cycles.</span>

### **Strengths and Weaknesses of PostgreSQL for OPC Analytics:**

**Strengths of PostgreSQL for OPC Analytics**

1. <span style="color: var(--vscode-foreground);"><b>Robust Analytical Capabilities:&nbsp;</b></span> PostgreSQL supports advanced SQL constructs like Common Table Expressions (CTEs) and Window Functions, which allowed for complex analytical tasks (e.g., ranking builds, quarterly trend analysis, customer behavior segmentation) without requiring external tools.
2. **High Data Integrity and Structured Storage:** The relational schema (e.g., \`orders\`, \`products\`, \`returns\`) ensures consistency through primary/foreign key relationships—ideal for maintaining business-critical sales and customer data.
3. **Efficient for Large Volumes of Data:** Even with millions of rows (as seen with order and return logs), PostgreSQL provided sub-second query execution, supporting near real-time data exploration in Azure Data Studio.
4. **Customizable and Open Source:** Being open-source, PostgreSQL enables cost-effective deployment without licensing fees and is highly customizable to fit into existing infrastructure—especially relevant for a growing company like OPC.
5. **Compatible with External BI Tools:** PostgreSQL integrates well with Power BI, Tableau, and other dashboarding tools, allowing analysts to perform modeling in SQL and visualize trends through external applications.

**Weaknesses of PostgreSQL for OPC Analytics**

1. <span style="color: var(--vscode-foreground);"><b>Lacks Built-In Visualization:</b>&nbsp;</span> PostgreSQL doesn’t offer built-in dashboard or reporting interfaces. As a result, analysts must export results to external BI platforms, adding friction to executive reporting and real-time decision-making.
2. **Steep Learning Curve for Non-Technical Users:** Crafting performant SQL queries (e.g., JOINs, aggregations, filtering logic) requires specialized SQL knowledge. This limits accessibility for business users or junior analysts unfamiliar with database operations.
3. **Manual ETL and Orchestration Needs:** PostgreSQL does not natively support automated data pipelines, scheduling, or transformation orchestration. This limits its usability for complex workflows unless paired with tools like Apache Airflow, dbt, or Python-based ETL scripts.
4. **Limited Native Support for Unstructured or Streaming Data:** PostgreSQL handles structured/tabular data very well, but JSON, arrays, and time-series data require additional configuration or extensions (like TimescaleDB), making it less versatile for unstructured or real-time analytics use cases.

### **Strategic Implication for OPC**

- PostgreSQL is a solid backbone for analytics involving structured sales, customer, and operational data.
- For maximum impact, OPC should pair PostgreSQL with BI tools (like Power BI) and adopt lightweight orchestration tools to automate workflows.
- As OPC scales post-acquisition, investing in training for analysts, query optimization practices, and a unified data strategy will unlock the full potential of PostgreSQL.

### **Convert the file to html:**

In [1]:
!jupyter nbconvert --to html DSCI504_Wk8_Backbrief_SKungulio.ipynb

[NbConvertApp] Converting notebook DSCI504_Wk8_Backbrief_SKungulio.ipynb to html


[NbConvertApp] Writing 381230 bytes to DSCI504_Wk8_Backbrief_SKungulio.html
