# 🧠 Leetcode 584 — Find Customers Who Are Not Referred by Customer 2 (Databricks Edition)

---

## 📘 Problem Statement

### Table: Customer

| Column Name | Type    |
|-------------|---------|
| id          | int     |
| name        | varchar |
| referee_id  | int     |

- `id` is the primary key.
- Each row indicates the ID of a customer, their name, and the ID of the customer who referred them.

---

## 🎯 Objective

Find the names of customers who are either:

- Referred by any customer with `id != 2`, or  
- Not referred by any customer (`referee_id IS NULL`)

Return the result table in any order.

---

## 🧾 Example

### Input

**Customer Table**

| id | name | referee_id |
|----|------|------------|
| 1  | Will | null       |
| 2  | Jane | null       |
| 3  | Alex | 2          |
| 4  | Bill | null       |
| 5  | Zack | 1          |
| 6  | Mark | 2          |

### Output

| name |
|------|
| Will |
| Jane |
| Bill |
| Zack |

---

## 🧱 PySpark DataFrame Creation

```python
from pyspark.sql import Row

# Sample data
customer_data = [
    Row(id=1, name="Will", referee_id=None),
    Row(id=2, name="Jane", referee_id=None),
    Row(id=3, name="Alex", referee_id=2),
    Row(id=4, name="Bill", referee_id=None),
    Row(id=5, name="Zack", referee_id=1),
    Row(id=6, name="Mark", referee_id=2)
]

# Create DataFrame
customer_df = spark.createDataFrame(customer_data)

# Register temp view
customer_df.createOrReplaceTempView("Customer")
```

---

## ✅ SQL Solution

```sql
SELECT name
FROM Customer
WHERE referee_id != 2 OR referee_id IS NULL;
```

---

## 🧪 PySpark Solution

```python
from pyspark.sql.functions import col

result_df = customer_df.filter(
    (col("referee_id") != 2) | col("referee_id").isNull()
).select("name")

result_df.show()
```

---

📘 *This notebook is part of DataGym’s SQL-to-PySpark transition series. Want to build a reusable template for filter + null logic problems or referral-based segmentation? Let’s co-create it!*


In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.functions import *

# 1️⃣ Sample Data
customer_data = [
    (1, "Will", None),
    (2, "Jane", None),
    (3, "Alex", 2),
    (4, "Bill", None),
    (5, "Zack", 1),
    (6, "Mark", 2)
]

# 2️⃣ Schema Definition
customer_schema = StructType([
    StructField("id", IntegerType(), True),
    StructField("name", StringType(), True),
    StructField("referee_id", IntegerType(), True)
])

# 3️⃣ Create DataFrame
customer_df = spark.createDataFrame(customer_data, schema=customer_schema)

# 4️⃣ Register Temp View
customer_df.createOrReplaceTempView("Customer")

# 5️⃣ SQL Query: Filter by referral logic
result = spark.sql("""
    SELECT name
    FROM Customer
    WHERE referee_id != 2 OR referee_id IS NULL
""")

# 6️⃣ Show Result
result.show()

In [0]:
customer_df.filter((col("referee_id") != 2 )|( col("referee_id").isNull())).selectExpr("name").display()