In [0]:
%sql
/*
1873. Calculate Special Bonus
Table: Employees

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| employee_id | int     |
| name        | varchar |
| salary      | int     |
+-------------+---------+
employee_id is the primary key (column with unique values) for this table.
Each row of this table indicates the employee ID, employee name, and salary.
 

Write a solution to calculate the bonus of each employee. The bonus of an employee is 100% of their salary if the ID of the employee is an odd number and the employee''s name does not start with the character 'M'. The bonus of an employee is 0 otherwise.

Return the result table ordered by employee_id.

The result format is in the following example.

 

Example 1:

Input: 
Employees table:
+-------------+---------+--------+
| employee_id | name    | salary |
+-------------+---------+--------+
| 2           | Meir    | 3000   |
| 3           | Michael | 3800   |
| 7           | Addilyn | 7400   |
| 8           | Juan    | 6100   |
| 9           | Kannon  | 7700   |
+-------------+---------+--------+
Output: 
+-------------+-------+
| employee_id | bonus |
+-------------+-------+
| 2           | 0     |
| 3           | 0     |
| 7           | 7400  |
| 8           | 0     |
| 9           | 7700  |
+-------------+-------+
Explanation: 
The employees with IDs 2 and 8 get 0 bonus because they have an even employee_id.
The employee with ID 3 gets 0 bonus because their name starts with 'M'.
The rest of the employees get a 100% bonus.*/

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType

# Sample data
data = [
    (2, "Meir", 3000),
    (3, "Michael", 3800),
    (7, "Addilyn", 7400),
    (8, "Juan", 6100),
    (9, "Kannon", 7700)
]

# Define schema
schema = StructType([
    StructField("employee_id", IntegerType(), True),
    StructField("name", StringType(), True),
    StructField("salary", IntegerType(), True)
])

# Create DataFrame
df = spark.createDataFrame(data, schema)

# Register as SQL view
df.createOrReplaceTempView("Employees")

In [0]:
%sql
-- SQL: Calculate Special Bonus
SELECT 
    employee_id,
    CASE 
        WHEN employee_id % 2 = 1 AND LEFT(name, 1) != 'M' THEN salary
        ELSE 0
    END AS bonus
FROM Employees
ORDER BY employee_id;

In [0]:
from pyspark.sql.functions import col, when

df_bonus = df.withColumn(
    "bonus",
    when((col("employee_id") % 2 == 1) & (col("name").substr(1, 1) != "M"), col("salary")).otherwise(0)
).select("employee_id", "bonus").orderBy("employee_id")

df_bonus.show()

If you run this code **twice** in a Databricks notebook, here’s exactly what happens:

---

### ✅ **PySpark DataFrame Creation**
Each time you run the cell:
- A new DataFrame `df` is created with the same data and schema.
- It doesn’t overwrite or conflict with the previous `df` unless you’ve reassigned or reused it elsewhere.

---

### ✅ **SQL View Registration**
The line:
```python
df.createOrReplaceTempView("Employees")
```
means:
- If a temp view named `"Employees"` already exists, it will be **replaced** with the new one.
- So running it twice simply **refreshes** the view with the same data.
- No error, no duplication, no side effects—just a clean overwrite.

---

### 🧠 Why This Is Useful
- You can rerun setup cells anytime to reset your data state.
- It’s safe and predictable, especially when iterating on logic or debugging.

---

If you ever want to simulate changes (e.g., update salary, add rows), you can modify the `data` list before rerunning. Want help building a reusable cell that lets you toggle between different test datasets?
