# 🧠 Leetcode 183 — Customers Who Never Order (Databricks Edition)

---

## 📘 Problem Statement

### Table: Customers

| Column Name | Type    |
|-------------|---------|
| id          | int     |
| name        | varchar |

- `id` is the primary key.
- Each row indicates the ID and name of a customer.

---

### Table: Orders

| Column Name | Type |
|-------------|------|
| id          | int  |
| customerId  | int  |

- `id` is the primary key.
- `customerId` is a foreign key referencing `Customers.id`.
- Each row indicates the ID of an order and the ID of the customer who placed it.

---

## 🎯 Objective

Write a query to find all customers who **never order anything**.

Return the result table in any order.

---

## 🧾 Example

### Input

**Customers Table**

| id | name  |
|----|-------|
| 1  | Joe   |
| 2  | Henry |
| 3  | Sam   |
| 4  | Max   |

**Orders Table**

| id | customerId |
|----|------------|
| 1  | 3          |
| 2  | 1          |

### Output

| Customers |
|-----------|
| Henry     |
| Max       |

### Explanation

- Henry and Max have no matching entries in the `Orders` table.
- So they are returned as customers who never placed an order.

---

## 🧱 PySpark DataFrame Creation

```python
from pyspark.sql import Row

# Sample data
customers_data = [
    Row(id=1, name="Joe"),
    Row(id=2, name="Henry"),
    Row(id=3, name="Sam"),
    Row(id=4, name="Max")
]

orders_data = [
    Row(id=1, customerId=3),
    Row(id=2, customerId=1)
]

# Create DataFrames
customers_df = spark.createDataFrame(customers_data)
orders_df = spark.createDataFrame(orders_data)

# Register temp views
customers_df.createOrReplaceTempView("Customers")
orders_df.createOrReplaceTempView("Orders")
```

---

## ✅ SQL Solution

```sql
SELECT name AS Customers
FROM Customers
WHERE id NOT IN (
    SELECT customerId FROM Orders
);
```

---

## 🧪 PySpark Solution

```python
from pyspark.sql.functions import col

result_df = customers_df.join(
    orders_df,
    customers_df["id"] == orders_df["customerId"],
    how="left_anti"
).select(
    col("name").alias("Customers")
)

result_df.show()
```

---

📘 *This notebook is part of DataGym’s SQL-to-PySpark transition series. Want to build a reusable template for anti-join problems or customer segmentation? Let’s co-create it!*


In [0]:
# Step 1: Sample data
customers_data = [
    (1, "Joe"),
    (2, "Henry"),
    (3, "Sam"),
    (4, "Max")
]

orders_data = [
    (1, 3),
    (2, 1)
]

# Step 2: Define schemas
from pyspark.sql.types import StructType, StructField, IntegerType, StringType

customers_schema = StructType([
    StructField("id", IntegerType(), nullable=False),
    StructField("name", StringType(), nullable=False)
])

orders_schema = StructType([
    StructField("id", IntegerType(), nullable=False),
    StructField("customerId", IntegerType(), nullable=False)
])

# Step 3: Create DataFrames
customers_df = spark.createDataFrame(customers_data, customers_schema)
orders_df = spark.createDataFrame(orders_data, orders_schema)


In [0]:
customers_df.display()
orders_df.display()

In [0]:
customers_df.join(orders_df, customers_df.id == orders_df.customerId,"left").\
    filter(orders_df.id.isNull()).\
    select(customers_df.name.alias("Customers")).\
    display()

In [0]:
customers_df.join(orders_df, customers_df.id == orders_df.customerId, "left") \
    .filter(orders_df.id.isNull()) \
    .select(customers_df.name.alias("Customers")) \
    .display()

In [0]:

# Step 4: Create temp views
customers_df.createOrReplaceTempView("Customers")
orders_df.createOrReplaceTempView("Orders")

# Step 5: SQL query to find customers who never ordered
result = spark.sql("""
    SELECT name AS Customers
    FROM Customers
    WHERE id NOT IN (
        SELECT customerId FROM Orders
    )
""")

# Step 6: Show result
result.display()

In [0]:
# Step 1: Get all customer IDs who placed orders
ordered_customer_ids = orders_df.select("customerId")

# Step 2: Filter customers whose ID is NOT IN the ordered list
customers_without_orders = customers_df.filter(~customers_df.id.isin([row.customerId for row in ordered_customer_ids.collect()]))

# Step 3: Rename column and display
customers_without_orders.selectExpr("name AS Customers").display()