# Second Highest Salary

## FAANG SQL Interview Question

### Question  
Imagine you're an HR analyst at a tech company tasked with analyzing employee salaries.  
Your manager is keen on understanding the pay distribution and asks you to determine the **second highest salary** among all employees.

It's possible that multiple employees may share the same second highest salary.  
In case of duplicate, display the salary **only once**.

---

### `employee` Schema:

| column_name   | type     | description                          |
|---------------|----------|--------------------------------------|
| employee_id   | integer  | The unique ID of the employee.       |
| name          | string   | The name of the employee.            |
| salary        | integer  | The salary of the employee.          |
| department_id | integer  | The department ID of the employee.   |
| manager_id    | integer  | The manager ID of the employee.      |

---

### Example Input:

| employee_id | name             | salary | department_id | manager_id |
|-------------|------------------|--------|----------------|-------------|
| 1           | Emma Thompson    | 3800   | 1              | 6           |
| 2           | Daniel Rodriguez | 2230   | 1              | 7           |
| 3           | Olivia Smith     | 2000   | 1              | 8           |

---

### Example Output:

| second_highest_salary |
|------------------------|
| 2230                   |

---

### Explanation:

The highest salary is **$3,800**, and the second highest is **$2,230**.  
Only one value is returned, even if multiple employees earned that same amount.


In [7]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.functions import *

# Create Spark session
spark = SparkSession.builder.master('local[1]').appName("SecondHighestSalary").getOrCreate()

# Define schema
schema = StructType([
    StructField("employee_id", IntegerType(), True),
    StructField("name", StringType(), True),
    StructField("salary", IntegerType(), True),
    StructField("department_id", IntegerType(), True),
    StructField("manager_id", IntegerType(), True),
])

# Sample data
data = [
    (1, "Emma Thompson", 3800, 1, 6),
    (2, "Daniel Rodriguez", 2230, 1, 7),
    (3, "Olivia Smith", 2000, 1, 8),
]

# Create DataFrame
employee_df = spark.createDataFrame(data, schema)

# Show DataFrame
employee_df.show(truncate=False)


+-----------+----------------+------+-------------+----------+
|employee_id|name            |salary|department_id|manager_id|
+-----------+----------------+------+-------------+----------+
|1          |Emma Thompson   |3800  |1            |6         |
|2          |Daniel Rodriguez|2230  |1            |7         |
|3          |Olivia Smith    |2000  |1            |8         |
+-----------+----------------+------+-------------+----------+



In [10]:
max_salary = employee_df.agg(max("salary")).collect()[0][0]

employee_df.where(col("salary") != max_salary)\
    .agg(max("salary").alias('second_highest_salary'))\
    .show()

+---------------------+
|second_highest_salary|
+---------------------+
|                 2230|
+---------------------+



In [8]:
employee_df.createOrReplaceTempView('employee')
spark.sql('''
SELECT max(salary) as second_highest_salary
FROM employee
where salary not in (select max(salary)
from employee)
''').show()

+---------------------+
|second_highest_salary|
+---------------------+
|                 2230|
+---------------------+

