# Mystery 1: **The Discount Thief!**

> Someone has been stealing in the form of giving huge discounts, and losing the company big profit! Find out who it is! find the highest discount that is on record, get every order that used that exact max discount, and find out who the culprit is from this information!

In [None]:
-- Stage 1: global max
SELECT MAX(sod.UnitPriceDiscount) AS MaxDiscount
FROM Sales.SalesOrderDetail AS sod;


1.  I started by scanning the Sales.SalesOrderDetail table to find the biggest discount ever given.

2. I used the MAX() function on UnitPriceDiscount to capture the single highest percentage.

3. The result (0.40%) becomes the benchmark for the rest of the mystery.

In [None]:
-- Stage 2: which orders hit the max discount?
SELECT
    sod.SalesOrderID,
    sod.SalesOrderDetailID,
    sod.ProductID,
    sod.UnitPriceDiscount
FROM Sales.SalesOrderDetail AS sod
WHERE sod.UnitPriceDiscount = (0.40)
ORDER BY sod.SalesOrderID, sod.SalesOrderDetailID;


<span style="font-size: 10.5pt; font-family: Arial, sans-serif; background-color: transparent; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-variant-position: normal; vertical-align: baseline; white-space-collapse: preserve;">1. I filtered the same table for orders using the exact 40% discount.</span>

<span style="background-color: transparent; color: var(--vscode-foreground); font-size: 10.5pt; font-family: Arial, sans-serif; white-space-collapse: preserve;">2. I listed each order’s IDs and product numbers so I could trace where the pattern occurs.</span><span style="font-size: 10.5pt; font-family: Arial, sans-serif; background-color: transparent; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-variant-position: normal; vertical-align: baseline; white-space-collapse: preserve;"><br></span>

<span style="background-color: transparent; color: var(--vscode-foreground); font-size: 10.5pt; font-family: Arial, sans-serif; white-space-collapse: preserve;">3. </span> <span style="font-family: Arial, sans-serif; white-space-collapse: preserve; color: var(--vscode-foreground);">Sorting by SalesOrderID gave me a clean timeline of every over-discounted order.</span>

In [None]:
-- Stage 3: identify which salesperson gave the most 40% discounts
SELECT TOP (10) WITH TIES
    soh.SalesPersonID,
    p.FirstName + ' ' + p.LastName AS FullName,
    e.JobTitle,
    COUNT(DISTINCT soh.SalesOrderID) AS NumOrdersAt40
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
  ON soh.SalesOrderID = sod.SalesOrderID
JOIN Person.Person AS p
  ON p.BusinessEntityID = soh.SalesPersonID
JOIN HumanResources.Employee AS e
  ON e.BusinessEntityID = soh.SalesPersonID
WHERE soh.SalesPersonID IS NOT NULL
  AND sod.UnitPriceDiscount = 0.40      -- ← hard-coded from Stage 1
GROUP BY soh.SalesPersonID, p.FirstName, p.LastName, e.JobTitle
ORDER BY COUNT(DISTINCT soh.SalesOrderID) DESC;



1. I joined Sales.SalesOrderHeader with Sales.SalesOrderDetail to link each order with its salesperson.

  
2. I then joined with Person.Person and HumanResources.EMployee to reveal the salespesron's full name and job title.

  
3. I grouped by Salesperson and counted how many distinct orders had the 40% discount, then sorted by that total to expose who gave out the most. 

  

There were alot of culprits in here, but **Linda** is the biggest offender! She handed out more 40% coupons than anyone else in the department. I joined using different tables to aggregate the right information to conclude that linda was the main culprit.

# Mystery 2: **The Ghost Orders!**

> Some orders have no name attached to them in the month of December 2013! Find out who is responsible by first, Finding the december orders with no sales person, Find the count of each order per customer, and reveal the culprit(s?) name!

In [None]:
-- Stage 1: "ghost orders" = SalesPersonID is NULL
WITH Dec13 AS (
  SELECT SalesOrderID, CustomerID, SalesPersonID, OrderDate, TotalDue
  FROM Sales.SalesOrderHeader
  WHERE OrderDate >= '2013-12-01' AND OrderDate < '2014-01-01'
)
SELECT *
FROM Dec13
WHERE SalesPersonID IS NULL
ORDER BY SalesOrderID;


1. I filtered the Sales.SalesOrderHeader table for orders placed between December 1 and December 31, 2013.

2. I selected only those records where the SalesPersonID field was NULL, meaning no salesperson was attached.

3. I sorted the results by SalesOrderID to line up every unassigned order for inspection.

In [None]:
-- Stage 2: top customer(s) placing ghost orders
WITH Dec13 AS (
  SELECT SalesOrderID, CustomerID, SalesPersonID, OrderDate, TotalDue
  FROM Sales.SalesOrderHeader
  WHERE OrderDate >= '2013-12-01' AND OrderDate < '2014-01-01'
),
Ghost AS (
  SELECT CustomerID, TotalDue
  FROM Dec13
  WHERE SalesPersonID IS NULL
)
SELECT TOP (1) WITH TIES
  g.CustomerID,
  COUNT(*)        AS NumGhostOrders,
  SUM(g.TotalDue) AS GhostSpend
FROM Ghost g
GROUP BY g.CustomerID
ORDER BY COUNT(*) DESC;


1. I reused the December 2013 dataset and filtered again for rows where SalesPersonID IS NULL.

2. I grouped the suspicious orders by CustomerID to count how many each customer placed.

3. I used COUNT(*) to find the number of ghost orders per customer and SUM(TotalDue) to measure how much money they spent.

In [None]:
-- Stage 3: identity reveal (handles Person or Store)
WITH Dec13 AS (
  SELECT SalesOrderID, CustomerID, SalesPersonID, OrderDate, TotalDue
  FROM Sales.SalesOrderHeader
  WHERE OrderDate >= '2013-12-01' AND OrderDate < '2014-01-01'
),
Ghost AS (
  SELECT CustomerID, TotalDue
  FROM Dec13
  WHERE SalesPersonID IS NULL
),
TopCustomer AS (
  SELECT TOP (1) WITH TIES CustomerID, COUNT(*) AS NumGhostOrders, SUM(TotalDue) AS GhostSpend
  FROM Ghost
  GROUP BY CustomerID
  ORDER BY COUNT(*) DESC
)
SELECT
  tc.CustomerID,
  COALESCE(pp.FirstName + ' ' + pp.LastName, s.Name) AS CustomerName,
  tc.NumGhostOrders,
  tc.GhostSpend
FROM TopCustomer tc
JOIN Sales.Customer c      ON c.CustomerID = tc.CustomerID
LEFT JOIN Person.Person pp ON pp.BusinessEntityID = c.PersonID
LEFT JOIN Sales.Store   s  ON s.BusinessEntityID  = c.StoreID;


1. I took the top customer(s) with the most ghost orders from the previous stage.

2. I joined Sales.Customer to link each CustomerID to either a Person or a Store.

3. I used LEFT JOIN Person.Person and LEFT JOIN Sales.Store together with COALESCE to reveal the actual name behind each ghost buyer.  

This mystery is solved, i traced the december trail of orders that had no assigned salesperson and matched them back to their rightful owners using sales.Customer. The customer with the highest number of these phantom orders turned out to be the main culprit, Ashley, Hailey, Jose, Ryan, Henry, Fernando, and Nicholas.

# Mystery 3: **The Phantom Product!**

> There are items in the database that have never been sold! Dozens of products sitting in the catalog, never touched a single sale. Corporate wants them **GONE**. Find out what product is wasting on the shelves by find all products on the shelves that have never been sold

In [None]:
-- Stage 1: Products active in the catalog and sellable
WITH ActiveCatalog AS (
    SELECT 
        p.ProductID,
        p.Name,
        p.ListPrice,
        p.ProductSubcategoryID
    FROM Production.Product AS p
    WHERE p.SellStartDate IS NOT NULL
      AND (p.SellEndDate IS NULL OR p.SellEndDate > GETDATE())
      AND p.ProductSubcategoryID IS NOT NULL      -- exclude internal parts
      AND p.ListPrice > 0                         -- exclude unpriced items
)
SELECT * 
FROM ActiveCatalog
ORDER BY Name;


1. I scanned Production.Product and kept only sellable items: SellStartDate IS NOT NULL.
    
2. I excluded retired items where an end date is in the past by requiring SellEndDate IS NULL OR SellEndDate \> GETDATE().
    
3. I removed internal parts with ProductSubcategoryID IS NOT NULL.
    
4. I filtered out unpriced items using ListPrice \> 0, then sorted by Name for a clean roster.

In [None]:
-- Stage 2: Products that have ever appeared in a sale
WITH SoldProducts AS (
    SELECT DISTINCT d.ProductID
    FROM Sales.SalesOrderDetail AS d
)
SELECT p.ProductID, p.Name
FROM SoldProducts s
JOIN Production.Product p 
  ON p.ProductID = s.ProductID
ORDER BY p.Name;

1. I pulled distinct ProductIDs from Sales.SalesOrderDetail to capture anything that has shown up on an order.

2. I joined those IDs back to Production.Product to get the product names.

3. I ordered by Name so the list lines up neatly against the catalog from Stage 1.

In [None]:
-- Stage 3: Except clause to remove all sales
SELECT p.ProductID, p.Name, p.ListPrice
FROM Production.Product AS p
WHERE p.SellStartDate IS NOT NULL
  AND (p.SellEndDate IS NULL OR p.SellEndDate > GETDATE())
  AND p.ProductSubcategoryID IS NOT NULL
  AND p.ListPrice > 0
EXCEPT
SELECT d.ProductID, p.Name, p.ListPrice
FROM Sales.SalesOrderDetail AS d
JOIN Production.Product p 
  ON d.ProductID = p.ProductID
ORDER BY Name;

1. I rebuilt the active, sellable catalog (valid start date, not ended, not internal parts, price > 0) so I had the full list of what we should be selling.

2. I pulled every product that has ever appeared on an order from Sales.SalesOrderDetail (distinct ProductID) and joined to Production.Product to get the names.

3. I used EXCEPT to subtract the “ever sold” list from the active catalog, then sorted by Name to reveal the leftovers—the products that have never sold.

To crack this case, i started by narrowing the scene to real catalog items, which by my definition, was products with valid start dates, and a price above zero, which gave me a full lineup of what i should be selling. Then, i pulled up every product that had ever shown up in a sales order. I then compared the two using the **EXCEPT** operator, subtracting all the sold items from the catalog lineup, and what was left were the true culprits. The 'HL Mountain Frame - Black, 46' is the main culprit, for being so expensive, yet being so hard to sell.

# Mystery 4: **The Overworked Salesperson!**

> The regional manager has been hearing complaints: phones ringing nonstop, emails piling up, and one sales representative who never leaves the office. Rumor has it, someone's juggling more customers than the rest of the team combined. The regional managers wants this very hardworking salesperson found, so he can be given mandatory vacation hours for his hard work! find the roster of salespeople, then count the number of distinct customers handled by salesperson, then find his name!

In [None]:
-- Stage 1: Salesperson roster with full names
WITH Roster AS (
    SELECT 
        sp.BusinessEntityID              AS SalesPersonID,
        p.FirstName + ' ' + p.LastName   AS FullName
    FROM Sales.SalesPerson sp
    JOIN Person.Person p 
      ON p.BusinessEntityID = sp.BusinessEntityID
)
SELECT * 
FROM Roster
ORDER BY FullName;


1. I started from Sales.SalesPerson to get every salesperson’s BusinessEntityID.

2. I joined to Person.Person on BusinessEntityID to pull FirstName and LastName.

3. I built a clean display name (FirstName + ' ' + LastName) and aliased it as FullName.

4. I sorted the roster by FullName so the suspects line up neatly.

In [None]:
-- Stage 2: Count DISTINCT customers per salesperson
WITH CustomerCounts AS (
    SELECT 
        soh.SalesPersonID,
        COUNT(DISTINCT soh.CustomerID) AS NumCustomers
    FROM Sales.SalesOrderHeader AS soh
    WHERE soh.SalesPersonID IS NOT NULL
    GROUP BY soh.SalesPersonID
)
SELECT * 
FROM CustomerCounts
ORDER BY NumCustomers DESC;


1. I scanned Sales.SalesOrderHeader and kept only rows with a salesperson (SalesPersonID IS NOT NULL).

2. I grouped by SalesPersonID and counted DISTINCT CustomerID to get each rep’s unique client count.

3. I ordered the results by that count DESC to spotlight the reps juggling the biggest books.

In [None]:
-- Stage 3: Reveal the most overworked salesperson
WITH Roster AS (
    SELECT sp.BusinessEntityID AS SalesPersonID,
           p.FirstName + ' ' + p.LastName AS FullName
    FROM Sales.SalesPerson sp
    JOIN Person.Person p 
      ON p.BusinessEntityID = sp.BusinessEntityID
),
CustomerCounts AS (
    SELECT 
        soh.SalesPersonID,
        COUNT(DISTINCT soh.CustomerID) AS NumCustomers
    FROM Sales.SalesOrderHeader AS soh
    WHERE soh.SalesPersonID IS NOT NULL
    GROUP BY soh.SalesPersonID
)
SELECT TOP (1)
    r.SalesPersonID,
    r.FullName,
    cc.NumCustomers
FROM CustomerCounts cc
JOIN Roster r 
  ON r.SalesPersonID = cc.SalesPersonID
ORDER BY cc.NumCustomers DESC;

1. I kept the Roster from Stage 1 (SalesPerson → Person.Person) to have each rep’s SalesPersonID and FullName.

2. I used the CustomerCounts from Stage 2 (from Sales.SalesOrderHeader) where SalesPersonID IS NOT NULL, grouped by rep, and counted DISTINCT CustomerID.

3. I joined CustomerCounts back to Roster, sorted by NumCustomers DESC, and selected the TOP (1) to expose the heaviest workload.

The most overworked salesperson we see in the database is Jillian Carson, with 121 **DISTINCT** customers. I began with pulling the SalesPerson and Person tables to build a full roster of every Salespersons. Then, i used the SalesOrderHeader table to count how many distinct customers (non-repeating) each sales representative had sold to, exposing who carried the heaviest workload. By joining these results together, I uncovered the one salesperson juggling the widest customer base.

# Mystery 5: **The Scrap Heap**

> The yard is filling up. Rumor says one line is bleeding parts into the bin. We have to find the products with the worst average scrap rate and expose the repeat offenders. Gather al work orders and tie them to product names, Computer each order's ScrapRate using Scrapped Quantity / Order Quantity, and gather them by product, and find the top offenders.

In [None]:
-- Stage 1: Work orders joined to products
WITH WO AS (
    SELECT 
        wo.WorkOrderID,
        wo.ProductID,
        wo.OrderQty,
        wo.ScrappedQty
    FROM Production.WorkOrder AS wo
)
SELECT 
    wo.WorkOrderID,
    p.ProductID,
    p.Name,
    wo.OrderQty,
    wo.ScrappedQty
FROM WO
JOIN Production.Product AS p
  ON p.ProductID = wo.ProductID
ORDER BY p.Name, wo.WorkOrderID;


1. I pulled every row from Production.WorkOrder to get WorkOrderID, ProductID, OrderQty, and ScrappedQty.

2. I joined to Production.Product on ProductID so each work order carries the product name involved.

3. I selected the key fields and sorted by p.Name, then wo.WorkOrderID to line up the trail neatly by part and time.

In [None]:
-- Stage 2: Per-order scrap rates
WITH Rates AS (
    SELECT
        wo.WorkOrderID,
        wo.ProductID,
       wo.ScrappedQty * 1.0 / NULLIF(wo.OrderQty, 0) AS ScrapRate
    FROM Production.WorkOrder wo
)
SELECT * 
FROM Rates
ORDER BY ScrapRate DESC;


1. I computed a ScrapRate per job as wo.ScrappedQty * 1.0 / NULLIF(wo.OrderQty, 0) to get a safe decimal (avoids divide-by-zero).

2. I kept the identifiers (WorkOrderID, ProductID) alongside that rate so I can link back to specific problem orders.

3. I ordered by ScrapRate DESC to surface the worst individual jobs first.

In [None]:
SELECT TOP (10)
    p.ProductID,
    p.Name,
    AVG(wo.ScrappedQty * 1.0 / NULLIF(wo.OrderQty, 0)) AS AvgScrapRate,
    SUM(wo.ScrappedQty) AS TotalScrapped,
    COUNT(*) AS WorkOrderCount
FROM Production.WorkOrder AS wo
JOIN Production.Product AS p
  ON p.ProductID = wo.ProductID
GROUP BY p.ProductID, p.Name
HAVING COUNT(*) >= 3
ORDER BY AvgScrapRate DESC, TotalScrapped DESC;


1. I grouped all work orders by product (p.ProductID, p.Name) and calculated:

    - AVG(wo.ScrappedQty \* 1.0 / NULLIF(wo.OrderQty,0)) as AvgScrapRate
    - SUM(wo.ScrappedQty) as TotalScrapped
    - COUNT(\*) as WorkOrderCount

2. I filtered to products with at least 3 work orders (HAVING COUNT(\*) \>= 3) to avoid one-off noise.
    
3. I sorted by AvgScrapRate DESC (worst first), then by TotalScrapped DESC, and showed the TOP (10) to surface the chronic offenders.
    

After analyzing every work order, the data points to the Road-450 Red, 44 as the main culprit, as it has the highest average scrap rate in the entire yard. Each Batch It runs wastes more material, order for order, than anything else on the floor. I<span style="color: var(--vscode-foreground);">nterestingly, the </span> **BB Ball Bearing**  <span style="color: var(--vscode-foreground);">&nbsp;showed up with an enormous total scrap count but a very low scrp rate. I believe its because its produced constantly in huge volumes, and even a tiny percentage loss adds up to massive totals. I did this by by going through the Production.WorkOrder table, where every unit is built, and every unit is scrapped. From there, i linked each work order to its matching entry in Production.Product to reveal which parts were responsible for the biggest messes on the factory floor. I then computed the ScrapRate by dividing the Scrapped Quantity by the order quantity, and nulled any orders that couldnt be measured. Then i grouped everything by product and took the average rate to find which items consistently performed the worst, ignoring the small jobs.</span>

#  Mystery 6: **The One-and-Done Fugitives**!

> Rumor from Finance: A wave of one-time buyers spiked revenue and then vanished. Who are these ghosts, when did they strike, and which order was the priciest hit? Track down every customer who placed exactly one order that year, uncover their real names and order details, and find out which one pulled off the biggest single sale before vanishing.

In [None]:
-- Stage 1: customers who placed exactly one order in 2013
SELECT
    soh.CustomerID,
    COUNT(*) AS OrdersIn2013
FROM Sales.SalesOrderHeader AS soh
WHERE YEAR(soh.OrderDate) = 2013
GROUP BY soh.CustomerID
HAVING COUNT(*) = 1
ORDER BY soh.CustomerID;


1. I walked the Sales.SalesOrderHeader scene and filtered it to the year 2013 with WHERE YEAR(OrderDate) = 2013.

2. I grouped the evidence by CustomerID to count how many orders each customer placed.

3. I kept only the one-and-done suspects using HAVING COUNT(*) = 1, then listed the CustomerIDs in order.

In [None]:
-- Stage 2: reveal names for those one-time customers and their lone order details
SELECT
    soh.CustomerID,
    ISNULL(p.FirstName + ' ' + p.LastName, s.Name) AS CustomerName,
    MIN(soh.SalesOrderID)       AS OnlyOrderID,
    MIN(soh.OrderDate)          AS OnlyOrderDate,
    MAX(soh.TotalDue)           AS OnlyOrderTotal
FROM Sales.SalesOrderHeader AS soh
LEFT JOIN Sales.Customer       AS c ON c.CustomerID = soh.CustomerID
LEFT JOIN Person.Person        AS p ON p.BusinessEntityID = c.PersonID
LEFT JOIN Sales.Store          AS s ON s.BusinessEntityID = c.StoreID
WHERE YEAR(soh.OrderDate) = 2013
GROUP BY
    soh.CustomerID,
    ISNULL(p.FirstName + ' ' + p.LastName, s.Name)
HAVING COUNT(*) = 1
ORDER BY OnlyOrderDate;


1. I joined the suspects back to their identities through Sales.Customer.
    
2. For people, I linked Customer.PersonID → Person.Person.BusinessEntityID; for stores, I linked Customer.StoreID → Sales.Store.BusinessEntityID.
    
3. I selected a single row per suspect by grouping on CustomerID and the display name, pulling the one order’s SalesOrderID, date, and TotalDue.
    
4. <span style="color: var(--vscode-foreground);">I used ISNULL(PersonName, StoreName) to show a clean CustomerName, then sorted by the order date to see when each hit happened.</span>

In [None]:
-- Stage 3: the priciest one-and-done order in 2013 (top 10 shown)
SELECT TOP (10)
    ISNULL(p.FirstName + ' ' + p.LastName, s.Name) AS CustomerName,
    MAX(soh.TotalDue)  AS OrderValue,
    MIN(soh.OrderDate) AS OrderDate,
    MIN(soh.SalesOrderID) AS SalesOrderID
FROM Sales.SalesOrderHeader AS soh
LEFT JOIN Sales.Customer AS c ON c.CustomerID = soh.CustomerID
LEFT JOIN Person.Person  AS p ON p.BusinessEntityID = c.PersonID
LEFT JOIN Sales.Store    AS s ON s.BusinessEntityID = c.StoreID
WHERE YEAR(soh.OrderDate) = 2013
GROUP BY ISNULL(p.FirstName + ' ' + p.LastName, s.Name)
HAVING COUNT(*) = 1
ORDER BY OrderValue DESC;


1. I returned to 2013’s orders and repeated the identity joins (Customer → Person/Store).

2. I grouped by the final CustomerName and again enforced HAVING COUNT(*) = 1 to keep only single-order culprits

3. I surfaced their haul by selecting the order’s TotalDue (as OrderValue) and timestamp details.

4. I sorted DESC by OrderValue and showed the TOP (10)—the very first line is our ringleader: the priciest one-and-done of 2013.

I combed through 2013 orders for customers who appeared exactly once, then unmasked each using Sales.Customer to Person.Person/Store. Finally, i stacked those one-off hits by TotalDue to see whose single visit cost us the most. The top line is the culprit, Yuping Tian, the person who did one order and vanished.

# Mystery 7: The Scrap Heap 2!

> The scrapyard is overflowing with "LL Bottom Brackets"! 
> 
> The scrap rate for this single part is reported to be 50x higher than any other. Lets investigate by finding the ProductID, finding all related work orders, and calculate and rank the scrap rates.

In [None]:
-- Stage 1: Find the ProductID for 'Front Brakes'
SELECT ProductID, Name
FROM Production.Product
WHERE Name = 'LL bottom bracket';


1. I began by looking inside the Production.Product table to locate the ProductID that corresponds to ‘LL Bottom Bracket’.

2. The ProductID will act as a key to trace every work order related to this product stored in Production. WorkOrder.

3. Once found, this ID helps us for stage 2 of the investigation.

In [None]:
SELECT 
    p.ProductID,
    p.Name AS ProductName,
    SUM(w.OrderQty)      AS TotalOrdered,
    SUM(w.ScrappedQty)   AS TotalScrapped,
    CAST(
        SUM(w.ScrappedQty) * 1.0 / NULLIF(SUM(w.OrderQty), 0)
        AS DECIMAL(10,4)
    ) AS ScrapRate
FROM Production.Product AS p
JOIN Production.WorkOrder AS w
    ON p.ProductID = w.ProductID
WHERE p.ProductID = 994        -- LL Bottom Bracket
GROUP BY p.ProductID, p.Name
HAVING SUM(w.OrderQty) > 0
ORDER BY ScrapRate DESC;


1.I joined Production.Product with Production.WorkOrder to gather all manufacturing data tied to Product ID 994.

2.Summing both OrderQty and ScrappedQty gives total production vs. waste.

3.I then computed a Scrap Rate as ScrappedQty / OrderQty, cast to four decimals for accuracy.

Sorting by ScrapRate reveals whether this component is truly the top culprit. This query exposed how many LL Bottom Brackets were produced and how many were scrapped, along with their scrap rate. It is not the LL bottom bracket that has a high scrap rate.