# Well Paid Employees  
**FAANG SQL Interview Question**

---

## Question

Companies often perform salary analyses to ensure fair compensation practices.  
One useful analysis is to check if there are any **employees earning more than their direct managers**.

As an HR Analyst, you're asked to identify all employees who earn more than their direct managers.

The result should include:
- the **employee's ID**
- the **employee's name**

---

## Schema

### `employee` Table:
| Column Name     | Type     | Description                         |
|------------------|----------|-------------------------------------|
| employee_id      | integer  | The unique ID of the employee.      |
| name             | string   | The name of the employee.           |
| salary           | integer  | The salary of the employee.         |
| department_id    | integer  | The department ID of the employee.  |
| manager_id       | integer  | The manager ID of the employee.     |

---

### Example Input:
| employee_id | name             | salary | department_id | manager_id |
|-------------|------------------|--------|----------------|-------------|
| 1           | Emma Thompson     | 3800   | 1              | 6           |
| 2           | Daniel Rodriguez | 2230   | 1              | 7           |
| 3           | Olivia Smith     | 7000   | 1              | 8           |
| 4           | Noah Johnson     | 6800   | 2              | 9           |
| 5           | Sophia Martinez  | 1750   | 1              | 11          |
| 6           | Liam Brown       | 13000  | 3              | NULL        |
| 7           | Ava Garcia       | 12500  | 3              | NULL        |
| 8           | William Davis    | 6800   | 2              | NULL        |

---

## Example Output:
| employee_id | employee_name |
|-------------|----------------|
| 3           | Olivia Smith   |

---

## Explanation

- **Olivia Smith** (employee ID 3) earns **$7,000**.
- Her manager, **William Davis** (employee ID 8), earns **$6,800**.
- Since her salary is greater than her manager's, she appears in the result.

---


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.functions import *

# Initialize Spark session
spark = SparkSession.builder.master('local[1]').appName("WellPaidEmployees").getOrCreate()

# Define schema for employee table
employee_schema = StructType([
    StructField("employee_id", IntegerType(), True),
    StructField("name", StringType(), True),
    StructField("salary", IntegerType(), True),
    StructField("department_id", IntegerType(), True),
    StructField("manager_id", IntegerType(), True)
])

# Sample data based on the question
employee_data = [
    (1, "Emma Thompson", 3800, 1, 6),
    (2, "Daniel Rodriguez", 2230, 1, 7),
    (3, "Olivia Smith", 7000, 1, 8),
    (4, "Noah Johnson", 6800, 2, 9),
    (5, "Sophia Martinez", 1750, 1, 11),
    (6, "Liam Brown", 13000, 3, None),
    (7, "Ava Garcia", 12500, 3, None),
    (8, "William Davis", 6800, 2, None)
]

# Create the DataFrame
employee_df = spark.createDataFrame(employee_data, schema=employee_schema)

# Show the DataFrame
employee_df.show()


+-----------+----------------+------+-------------+----------+
|employee_id|            name|salary|department_id|manager_id|
+-----------+----------------+------+-------------+----------+
|          1|   Emma Thompson|  3800|            1|         6|
|          2|Daniel Rodriguez|  2230|            1|         7|
|          3|    Olivia Smith|  7000|            1|         8|
|          4|    Noah Johnson|  6800|            2|         9|
|          5| Sophia Martinez|  1750|            1|        11|
|          6|      Liam Brown| 13000|            3|      NULL|
|          7|      Ava Garcia| 12500|            3|      NULL|
|          8|   William Davis|  6800|            2|      NULL|
+-----------+----------------+------+-------------+----------+



In [2]:
employee_df.alias('a')\
    .join(employee_df.alias('b'),(col('b.employee_id')==col('a.manager_id')) & (col('a.salary')>col('b.salary')))\
    .select("a.employee_id","a.name").show()

+-----------+------------+
|employee_id|        name|
+-----------+------------+
|          3|Olivia Smith|
+-----------+------------+



In [3]:
employee_df.createOrReplaceTempView('employee')

spark.sql(
    '''
select a.employee_id,a.name
from employee a join employee b
on
a.manager_id = b.employee_id and a.salary>b.salary
    '''
).show()

+-----------+------------+
|employee_id|        name|
+-----------+------------+
|          3|Olivia Smith|
+-----------+------------+

