### Left join in PySpark

You are given two DataFrames: 
- employees: Contains employee details with the columns emp_id, name, and dept_id.
- departments: Contains department details with the columns dept_id and dept_name.

You need to perform a left join on employees with departments to get all employee details, including the department name. If an employee doesn't have a department, their department name should be null.


**employe_df**

In [0]:
# employe_df
employee_data = [
    (1, "Alice", 10),
    (2, "Bob", 20),
    (3, "Charlie", None),
    (4, "David", 30),
    (5, "Edward", 40)
]

employee_schema = "emp_id int, name string, dept_id int"

employee_df = spark.createDataFrame(employee_data, employee_schema)
employee_df.show()

+------+-------+-------+
|emp_id|   name|dept_id|
+------+-------+-------+
|     1|  Alice|     10|
|     2|    Bob|     20|
|     3|Charlie|   null|
|     4|  David|     30|
|     5| Edward|     40|
+------+-------+-------+



**department_df**

In [0]:
# department_df
department_data = [
    (10, "HR"),
    (20, "Finance"),
    (30, "Marketing")
]

department_schema = "dept_id int, dept_name string"

department_df = spark.createDataFrame(department_data, department_schema)
department_df.show()

+-------+---------+
|dept_id|dept_name|
+-------+---------+
|     10|       HR|
|     20|  Finance|
|     30|Marketing|
+-------+---------+



**Output:**

| emp_id | name      | dept_id | dept_name |
|--------|-----------|---------|-----------|
| 1      | Alice     | 10      | HR        |
| 2      | Bob       | 20      | Finance   |
| 3      | Charlie   | null    | null      |
| 4      | David     | 30      | Marketing |
| 5      | Edward    | 40      | null      |

**PySpark code to perform left join:**

In [0]:
# joining employee_df and department__df on dept_id
joined_df = employee_df.join(department_df, employee_df.dept_id==department_df.dept_id, "left")

# selecting only required columns
final_df = joined_df.select(
    employee_df["emp_id"],employee_df["name"],employee_df["dept_id"],department_df["dept_name"]
)

# showing final output
final_df.show()

+------+-------+-------+---------+
|emp_id|   name|dept_id|dept_name|
+------+-------+-------+---------+
|     1|  Alice|     10|       HR|
|     2|    Bob|     20|  Finance|
|     3|Charlie|   null|     null|
|     4|  David|     30|Marketing|
|     5| Edward|     40|     null|
+------+-------+-------+---------+

