### Rank() in PySpark

How would you use the `rank()` function in PySpark to rank employees based on their salary within their department?

=> To rank employees based on their salary within their department using the rank() function in PySpark, you would typically use the Window function to partition the data by the department and order the employees based on their salary in descending order.


In [0]:
# sample data
data = [
    (1, 'Alice', 'HR', 5000),
    (2, 'Bob', 'HR', 6000),
    (3, 'Charlie', 'IT', 7000),
    (4, 'David', 'IT', 9000),
    (5, 'Eve', 'HR', 5500),
    (6, 'Frank', 'IT', 8000)
]

columns = ["EmployeeID", "Name", "Department", "Salary"]

In [0]:
# Start a SparkSession
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Rank").getOrCreate()
spark

In [0]:
# Create the DataFram
df = spark.createDataFrame(data, columns)
df.show()

+----------+-------+----------+------+
|EmployeeID|   Name|Department|Salary|
+----------+-------+----------+------+
|         1|  Alice|        HR|  5000|
|         2|    Bob|        HR|  6000|
|         3|Charlie|        IT|  7000|
|         4|  David|        IT|  9000|
|         5|    Eve|        HR|  5500|
|         6|  Frank|        IT|  8000|
+----------+-------+----------+------+



In [0]:
# defind a window specs

from pyspark.sql.window import Window
from pyspark.sql.functions import col, rank, desc

window_spec = Window.partitionBy("Department").orderBy(desc("Salary"))

rank_df = df.withColumn("rank", rank().over(window_spec))
rank_df.show()


+----------+-------+----------+------+----+
|EmployeeID|   Name|Department|Salary|rank|
+----------+-------+----------+------+----+
|         2|    Bob|        HR|  6000|   1|
|         5|    Eve|        HR|  5500|   2|
|         1|  Alice|        HR|  5000|   3|
|         4|  David|        IT|  9000|   1|
|         6|  Frank|        IT|  8000|   2|
|         3|Charlie|        IT|  7000|   3|
+----------+-------+----------+------+----+

