# 🧠 Leetcode 577 — Employees With Less Than 1000 Bonus (Databricks Edition)

---

## 📘 Problem Statement

### Table: Employee

| Column Name | Type    |
|-------------|---------|
| empId       | int     |
| name        | varchar |
| supervisor  | int     |
| salary      | int     |

- `empId` is the column with unique values.
- Each row indicates the name and ID of an employee, their salary, and the ID of their manager.

---

### Table: Bonus

| Column Name | Type |
|-------------|------|
| empId       | int  |
| bonus       | int  |

- `empId` is the column of unique values.
- `empId` is a foreign key referencing `Employee.empId`.
- Each row contains the ID of an employee and their respective bonus.

---

## 🎯 Objective

Write a query to report the `name` and `bonus` amount of each employee with a bonus **less than 1000**.  
If an employee has no bonus record, return `null` for their bonus.

Return the result table in any order.

---

## 🧾 Example

### Input

**Employee Table**

| empId | name   | supervisor | salary |
|-------|--------|------------|--------|
| 3     | Brad   | null       | 4000   |
| 1     | John   | 3          | 1000   |
| 2     | Dan    | 3          | 2000   |
| 4     | Thomas | 3          | 4000   |

**Bonus Table**

| empId | bonus |
|-------|-------|
| 2     | 500   |
| 4     | 2000  |

### Output

| name  | bonus |
|-------|-------|
| Brad  | null  |
| John  | null  |
| Dan   | 500   |

---

## 🧱 PySpark DataFrame Creation

```python
from pyspark.sql import Row

# Sample data
employee_data = [
    Row(empId=3, name="Brad", supervisor=None, salary=4000),
    Row(empId=1, name="John", supervisor=3, salary=1000),
    Row(empId=2, name="Dan", supervisor=3, salary=2000),
    Row(empId=4, name="Thomas", supervisor=3, salary=4000)
]

bonus_data = [
    Row(empId=2, bonus=500),
    Row(empId=4, bonus=2000)
]

# Create DataFrames
employee_df = spark.createDataFrame(employee_data)
bonus_df = spark.createDataFrame(bonus_data)

# Register temp views
employee_df.createOrReplaceTempView("Employee")
bonus_df.createOrReplaceTempView("Bonus")
```

---

## ✅ SQL Solution

```sql
SELECT e.name, b.bonus
FROM Employee e
LEFT JOIN Bonus b ON e.empId = b.empId
WHERE b.bonus < 1000 OR b.bonus IS NULL;
```

---

## 🧪 PySpark Solution

```python
from pyspark.sql.functions import col

result_df = employee_df.join(
    bonus_df,
    on="empId",
    how="left"
).filter(
    (col("bonus") < 1000) | col("bonus").isNull()
).select(
    col("name"), col("bonus")
)

result_df.show()
```

---

📘 *This notebook is part of DataGym’s SQL-to-PySpark transition series. Want to build a reusable template for join + filter problems or bonus eligibility logic? Let’s co-create it!*


In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.functions import *

# 1️⃣ Sample Data: Employee
employee_data = [
    (3, "Brad", None, 4000),
    (1, "John", 3, 1000),
    (2, "Dan", 3, 2000),
    (4, "Thomas", 3, 4000)
]

employee_schema = StructType([
    StructField("empId", IntegerType(), True),
    StructField("name", StringType(), True),
    StructField("supervisor", IntegerType(), True),
    StructField("salary", IntegerType(), True)
])

employee_df = spark.createDataFrame(employee_data, schema=employee_schema)
employee_df.createOrReplaceTempView("Employee")

# 2️⃣ Sample Data: Bonus
bonus_data = [
    (2, 500),
    (4, 2000)
]

bonus_schema = StructType([
    StructField("empId", IntegerType(), True),
    StructField("bonus", IntegerType(), True)
])

bonus_df = spark.createDataFrame(bonus_data, schema=bonus_schema)
bonus_df.createOrReplaceTempView("Bonus")

# 3️⃣ SQL Query: Employees with bonus < 1000 or no bonus
result = spark.sql("""
    SELECT e.name, b.bonus
    FROM Employee e
    LEFT JOIN Bonus b ON e.empId = b.empId
    WHERE b.bonus < 1000 OR b.bonus IS NULL
""")

# 4️⃣ Show Result
result.show()

In [0]:
employee_df_j = employee_df.selectExpr("empId  as e_empID", "name", "salary", "supervisor")
bonus_df_j= bonus_df.selectExpr("empId as b_empID", "bonus")

employee_df_j.join(bonus_df_j, col("e_empID") == col("b_empID"), "left")\
    .filter((col("bonus") < 1000) | (col("bonus").isNull()))\
    .select("name", "bonus")\
    .display()