### **Tutorial 13: Advanced Subqueries in PostgreSQL: Working with Sales.Orders Table**

Subqueries in PostgreSQL allow for powerful and flexible queries by embedding one query inside another. In this tutorial, we'll explore advanced subquery techniques using the `Sales.Orders` table.

#### **1. Find Customers with More Than Five Orders**
This query identifies customers who have placed more than five orders:

```sql
SELECT CustomerID
FROM Sales.Orders
GROUP BY CustomerID
HAVING COUNT(OrderID) > 5;
```

Alternatively, using a subquery:

```sql
SELECT DISTINCT CustomerID
FROM Sales.Orders
WHERE CustomerID IN (
    SELECT CustomerID
    FROM Sales.Orders
    GROUP BY CustomerID
    HAVING COUNT(OrderID) > 5
);
```

#### **2. Retrieve Orders with the Longest Processing Time**
To find orders with the longest processing time (from `OrderDate` to `PickingCompletedWhen`):

```sql
SELECT OrderID, (PickingCompletedWhen - OrderDate) AS ProcessingTime
FROM Sales.Orders
WHERE (PickingCompletedWhen - OrderDate) = (
    SELECT MAX(PickingCompletedWhen - OrderDate)
    FROM Sales.Orders
);
```

#### **3. Identify Backorders That Reference Nonexistent Orders**
To check if there are backorders referencing orders that do not exist in the table:

```sql
SELECT OrderID
FROM Sales.Orders
WHERE BackorderOrderID IS NOT NULL
AND BackorderOrderID NOT IN (
    SELECT OrderID FROM Sales.Orders
);
```

#### **4. Fetch Orders with the Earliest Expected Delivery Date**
This query retrieves orders with the earliest expected delivery date:

```sql
SELECT *
FROM Sales.Orders
WHERE ExpectedDeliveryDate = (
    SELECT MIN(ExpectedDeliveryDate)
    FROM Sales.Orders
);
```

#### **5. Find Salespersons Who Have Processed Orders for the Most Customers**
To list salespersons who have handled orders for the highest number of distinct customers:

```sql
SELECT SalespersonPersonID, COUNT(DISTINCT CustomerID) AS UniqueCustomers
FROM Sales.Orders
GROUP BY SalespersonPersonID
HAVING COUNT(DISTINCT CustomerID) = (
    SELECT MAX(CustomerCount)
    FROM (
        SELECT SalespersonPersonID, COUNT(DISTINCT CustomerID) AS CustomerCount
        FROM Sales.Orders
        GROUP BY SalespersonPersonID
    ) AS Subquery
);
```

#### **6. Retrieve Orders with No Comments**
To find orders where no comments have been provided:

```sql
SELECT *
FROM Sales.Orders
WHERE OrderID NOT IN (
    SELECT OrderID FROM Sales.Orders
    WHERE Comments IS NOT NULL
);
```

#### **Conclusion**
Subqueries are a powerful tool in PostgreSQL for retrieving specific insights from your data. By leveraging subqueries efficiently, you can enhance the performance and readability of your queries. Experiment with these examples and adapt them to your business needs!



### **Business Problem-1**  

The sales team wants to understand customer ordering patterns to identify the most frequent buyers.  

The analytics team needs to determine which customers have placed **more than five orders** to help in targeted marketing and loyalty programs.  

#### **Task**  
Write a query that returns the data for the analytics team. Your output should include **`CustomerID` and `OrderCount`** (number of orders placed).  

##### **Hints:**  
- Use `COUNT(OrderID)` to determine the number of orders per customer.  
- Filter the results to include only customers with **more than five orders**.  
- Group by `CustomerID` to aggregate order counts properly.

---

In [7]:
import pyodbc
from dotenv import load_dotenv
import os

load_dotenv()  # Load environment variables from .env file

server = os.getenv("DB_SERVER")
database = os.getenv("DB_NAME")
username = os.getenv("DB_USER")
password = os.getenv("DB_PASSWORD") 
driver = '{ODBC Driver 18 for SQL Server}'  # Ensure the driver matches your installation

try:
    # Add TrustServerCertificate and ENCRYPT options to the connection string
    conn = pyodbc.connect(
        f'DRIVER={driver};SERVER={server};DATABASE={database};UID={username};PWD={password};ENCRYPT=yes;TrustServerCertificate=yes'
    )
    cursor = conn.cursor()
    print("Connection successful!")
except Exception as e:
    print(f"Error: {e}")

Connection successful!


In [9]:
import pandas as pd
query = """
    SELECT 
        CustomerID, 
        Count(OrderID) as OrderCount
    FROM Sales.Orders
    GROUP BY CustomerID
    Having Count(OrderID)>5
    ORDER BY Count(OrderID) DESC;
    """

cursor.execute(query)
column_names = [column[0] for column in cursor.description]
rows = cursor.fetchall()
df = pd.DataFrame.from_records(rows, columns=column_names)
df


Unnamed: 0,CustomerID,OrderCount
0,90,150
1,831,147
2,968,146
3,804,145
4,405,145
...,...,...
657,1057,19
658,1058,14
659,1056,13
660,1061,9


### **Tutorial 13: Advanced Window Functions in PostgreSQL**

Window functions in PostgreSQL allow calculations across a set of table rows related to the current row without collapsing the result set. They are useful for ranking, running totals, moving averages, and more.

#### **1. Understanding Window Functions**
A window function operates over a subset of rows (a "window") defined by the `OVER()` clause. Unlike aggregate functions, window functions do not group rows into a single result.

#### **2. Common Window Functions**
##### **2.1 Ranking Functions**
- `RANK()`: Assigns a unique rank to each row with gaps for duplicate values.
- `DENSE_RANK()`: Similar to `RANK()`, but without gaps in ranking.
- `ROW_NUMBER()`: Assigns a unique sequential number to each row.

**Example: Rank Orders by Sales Amount**
```sql
SELECT OrderID, SalespersonPersonID, TotalSales,
       RANK() OVER (PARTITION BY SalespersonPersonID ORDER BY TotalSales DESC) AS Rank
FROM Sales.Orders;
```

##### **2.2 Running Total (Cumulative Sum)**
The `SUM()` function can be used with `OVER()` to calculate cumulative totals.

**Example: Cumulative Sales Per Salesperson**
```sql
SELECT OrderID, SalespersonPersonID, TotalSales,
       SUM(TotalSales) OVER (PARTITION BY SalespersonPersonID ORDER BY OrderDate) AS CumulativeSales
FROM Sales.Orders;
```

##### **2.3 Moving Averages**
A moving average helps smooth trends in sales or performance metrics.

**Example: 3-Order Moving Average of Sales**
```sql
SELECT OrderID, SalespersonPersonID, TotalSales,
       AVG(TotalSales) OVER (PARTITION BY SalespersonPersonID ORDER BY OrderDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS MovingAvg
FROM Sales.Orders;
```

##### **2.4 Lead and Lag Analysis**
- `LEAD()`: Accesses the next row’s value.
- `LAG()`: Accesses the previous row’s value.

**Example: Compare Current and Previous Order Sales**
```sql
SELECT OrderID, SalespersonPersonID, TotalSales,
       LAG(TotalSales, 1, 0) OVER (PARTITION BY SalespersonPersonID ORDER BY OrderDate) AS PreviousOrderSales,
       LEAD(TotalSales, 1, 0) OVER (PARTITION BY SalespersonPersonID ORDER BY OrderDate) AS NextOrderSales
FROM Sales.Orders;
```

#### **3. Conclusion**
Window functions provide powerful ways to analyze data trends, rank results, and calculate moving metrics without losing row-level details. By mastering these functions, you can perform advanced analytics directly within PostgreSQL.



### **Tutorial 13: Advanced Window Functions in PostgreSQL**

Window functions in PostgreSQL allow calculations across a set of table rows related to the current row without collapsing the result set. They are useful for ranking, running totals, moving averages, and more.

#### **1. Understanding Window Functions**
A window function operates over a subset of rows (a "window") defined by the `OVER()` clause. Unlike aggregate functions, window functions do not group rows into a single result.

#### **2. Common Window Functions**
##### **2.1 Ranking Functions**
- `RANK()`: Assigns a unique rank to each row with gaps for duplicate values.
- `DENSE_RANK()`: Similar to `RANK()`, but without gaps in ranking.
- `ROW_NUMBER()`: Assigns a unique sequential number to each row.

**Example: Rank Orders by Sales Amount**
```sql
SELECT OrderID, SalespersonPersonID, TotalSales,
       RANK() OVER (PARTITION BY SalespersonPersonID ORDER BY TotalSales DESC) AS Rank
FROM Sales.Orders;
```

##### **2.2 Running Total (Cumulative Sum)**
The `SUM()` function can be used with `OVER()` to calculate cumulative totals.

**Example: Cumulative Sales Per Salesperson**
```sql
SELECT OrderID, SalespersonPersonID, TotalSales,
       SUM(TotalSales) OVER (PARTITION BY SalespersonPersonID ORDER BY OrderDate) AS CumulativeSales
FROM Sales.Orders;
```

##### **2.3 Moving Averages**
A moving average helps smooth trends in sales or performance metrics.

**Example: 3-Order Moving Average of Sales**
```sql
SELECT OrderID, SalespersonPersonID, TotalSales,
       AVG(TotalSales) OVER (PARTITION BY SalespersonPersonID ORDER BY OrderDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS MovingAvg
FROM Sales.Orders;
```

##### **2.4 Lead and Lag Analysis**
- `LEAD()`: Accesses the next row’s value.
- `LAG()`: Accesses the previous row’s value.

**Example: Compare Current and Previous Order Sales**
```sql
SELECT OrderID, SalespersonPersonID, TotalSales,
       LAG(TotalSales, 1, 0) OVER (PARTITION BY SalespersonPersonID ORDER BY OrderDate) AS PreviousOrderSales,
       LEAD(TotalSales, 1, 0) OVER (PARTITION BY SalespersonPersonID ORDER BY OrderDate) AS NextOrderSales
FROM Sales.Orders;
```

#### **3. Conclusion**
Window functions provide powerful ways to analyze data trends, rank results, and calculate moving metrics without losing row-level details. By mastering these functions, you can perform advanced analytics directly within PostgreSQL.



### **Business Problem - 2**

The finance team wants to analyze order trends over time to identify top-performing salespersons.

The analytics team needs to calculate **cumulative order count per salesperson** over time to track their performance and identify trends.

#### **Task**  
Write a query that returns the data for the analytics team. Your output should include **`SalespersonPersonID`**, **`OrderDate`**, and **`CumulativeOrders`** (running total of orders per salesperson).

##### **Hints:**  
- Use the **`COUNT(*)`** window function to calculate cumulative orders.
- Partition by **`SalespersonPersonID`** to track orders separately for each salesperson.
- Order the results by **`OrderDate`** to ensure the cumulative calculation is sequential.

---

In [None]:
import pandas as pd
query = """
    SELECT 
        SalespersonPersonID, 
        OrderDate, 
        COUNT(*) OVER (PARTITION BY SalespersonPersonID ORDER BY OrderDate) AS CumulativeOrders
    FROM 
        Sales.Orders
    ORDER BY 
        SalespersonPersonID, 
        OrderDate;

    """

cursor.execute(query)
column_names = [column[0] for column in cursor.description]
rows = cursor.fetchall()
df = pd.DataFrame.from_records(rows, columns=column_names)

cursor.close()
conn.close()

df