###Problem Statement: 
Department-Wise Minimum and Maximum Salary Calculation Using Spark SQL
dataframe containing employee information with the following columns:

Dept (Department name)

Employee (Employee name)

Salary (Employee's salary)

The dataset contains records for multiple employees across different departments. The objective is to calculate the minimum and maximum salaries for each department using Spark SQL. Additionally, you need to sort the result by the maximum salary in descending order.

In [0]:
# Data as a list of tuples
data ="""Dept, Employee,Salary
    HR, Alice, 50000,
    HR, Bob, 55000
    HR, Charlie, 52000
    Finance, David, 75000
    Finance, Eve, 70000
    Finance, Frank, 78000
    IT, Grace, 60000
    IT, Hank, 62000
    IT, Ivy, 61000
    """


In [0]:
dbutils.fs.put('/Filestore/tables/data.csv',data,True)

Wrote 226 bytes.
Out[23]: True

In [0]:
# Create a DataFrame
df = spark.read.format('csv').option('header',True).option('inferschema',True).load('/Filestore/tables/data.csv')

In [0]:
df.display()

Dept,Employee,Salary
HR,Alice,50000.0
HR,Bob,55000.0
HR,Charlie,52000.0
Finance,David,75000.0
Finance,Eve,70000.0
Finance,Frank,78000.0
IT,Grace,60000.0
IT,Hank,62000.0
IT,Ivy,61000.0


In [0]:
df.printSchema()

root
 |-- Dept: string (nullable = true)
 |--  Employee: string (nullable = true)
 |-- Salary: double (nullable = true)



In [0]:
from pyspark.sql.functions import max, min

# Calculate department-wise minimum and maximum salary and order by Max_Salary
df_min_max_salary = df.groupBy(df['Dept']).agg(min(df['Salary']).alias("Min_Salary"),max(df['Salary']).alias("Max_Salary"))

df_min_max_salary.display()


Dept,Min_Salary,Max_Salary
HR,50000.0,55000.0
Finance,70000.0,78000.0
IT,60000.0,62000.0


#Saprk SQL

In [0]:
df.createOrReplaceTempView("MinMaxSalaryView")

In [0]:
result = spark.sql("""
    SELECT Dept, MIN(Salary) AS Min_Salary, MAX(Salary) AS Max_Salary
    FROM MinMaxSalaryView
    GROUP BY Dept
    ORDER BY Max_Salary DESC
""")
result.display()

Dept,Min_Salary,Max_Salary
Finance,70000.0,78000.0
IT,60000.0,62000.0
HR,50000.0,55000.0
