Great! Below are **10 structured PySpark practice questions** with:

* ✅ Question
* 📥 Input Data (Schema & Sample Data)
* 📤 Expected Output (Result explanation)

We'll continue with more once you complete these.

---

### **1. Create a DataFrame from a List of Tuples**

**✅ Question:** Create a DataFrame with schema `id`, `name`, `age` from a list.

**📥 Input:**

```python
data = [(1, "Navin", 28), (2, "Priya", 26), (3, "Amit", 17)]
schema = ["id", "name", "age"]
```

**📤 Output:**

```
+---+-----+---+
|id |name |age|
+---+-----+---+
|1  |Navin|28 |
|2  |Priya|26 |
|3  |Amit |17 |
+---+-----+---+
```

---

### **2. Filter Data Where Age > 25**

**✅ Question:** Filter records with `age > 25`.

**📥 Input:** Same as above
**📤 Output:**

```
+---+-----+---+
|id |name |age|
+---+-----+---+
|1  |Navin|28 |
|2  |Priya|26 |
+---+-----+---+
```

---

### **3. Add Column `is_adult` (True if age >= 18)**

**✅ Question:** Add a column `is_adult`.

**📤 Output:**

```
+---+-----+---+--------+
|id |name |age|is_adult|
+---+-----+---+--------+
|1  |Navin|28 |true    |
|2  |Priya|26 |true    |
|3  |Amit |17 |false   |
+---+-----+---+--------+
```

---

### **4. Group by Department and Calculate Avg Salary**

**✅ Question:** Group employees by `department` and get average salary.

**📥 Input:**

```python
data = [
  (1, "Navin", "IT", 60000),
  (2, "Priya", "HR", 50000),
  (3, "Amit", "IT", 80000),
  (4, "Sara", "HR", 55000)
]
schema = ["id", "name", "department", "salary"]
```

**📤 Output:**

```
+----------+-------------+
|department|avg(salary)  |
+----------+-------------+
|IT        |70000.0      |
|HR        |52500.0      |
+----------+-------------+
```

---

### **5. Rename Column `name` to `employee_name`**

**✅ Question:** Rename column `name`.

**📤 Output:**

```
+---+--------------+----------+------+
|id |employee_name |department|salary|
+---+--------------+----------+------+
```

---

### **6. Sort by Salary Descending**

**✅ Question:** Sort employees by salary in descending order.

**📤 Output:**

```
+---+-----+----------+------+
|id |name |department|salary|
+---+-----+----------+------+
|3  |Amit |IT        |80000 |
|1  |Navin|IT        |60000 |
|4  |Sara |HR        |55000 |
|2  |Priya|HR        |50000 |
+---+-----+----------+------+
```

---

### **7. Add Row Number by Department using Window**

**✅ Question:** Use Window function to add row number by `department` sorted by salary descending.

**📤 Output:**

```
+---+-----+----------+------+----------+
|id |name |department|salary|row_number|
+---+-----+----------+------+----------+
|3  |Amit |IT        |80000 |1         |
|1  |Navin|IT        |60000 |2         |
|4  |Sara |HR        |55000 |1         |
|2  |Priya|HR        |50000 |2         |
+---+-----+----------+------+----------+
```

---

### **8. Join Employees with Department Table**

**✅ Question:** Join employees with `departments` DataFrame on `dept_id`.

**📥 Input Employees:**

```python
[(1, "Navin", 101), (2, "Priya", 102)]
schema = ["id", "name", "dept_id"]
```

**📥 Input Departments:**

```python
[(101, "IT"), (102, "HR")]
schema = ["dept_id", "dept_name"]
```

**📤 Output:**

```
+---+-----+--------+---------+
|id |name |dept_id |dept_name|
+---+-----+--------+---------+
|1  |Navin|101     |IT       |
|2  |Priya|102     |HR       |
+---+-----+--------+---------+
```

---

### **9. Replace Null Bonus with 0**

**✅ Question:** Replace null values in `bonus` column with 0.

**📥 Input:**

```python
[(1, "Navin", 1000), (2, "Priya", None), (3, "Amit", 500)]
```

**📤 Output:**

```
+---+-----+-----+
|id |name |bonus|
+---+-----+-----+
|1  |Navin|1000 |
|2  |Priya|0    |
|3  |Amit |500  |
+---+-----+-----+
```

---

### **10. Write DataFrame to Parquet**

**✅ Question:** Save a DataFrame as Parquet in `/tmp/output/`.

**📤 Output:** A directory `/tmp/output/` with partitioned Parquet files.

---

Would you like me to continue with more **Level 3+ questions** (UDFs, JSON, Explode, Aggregations)? I can also give you a ready-to-run [notebook version](f) or [code examples](f) for these.


In [None]:
# q1 create a dataframe from list of tuple

