## Ranking Functions

We can use ranking functions to assign ranks to a particular record within a partition.

* Sparse Rank - rank
* Dense Rank - dense_rank
* Assigning Row Numbers - row_number
* Percentage Rank - percent_rank

Let us start spark context for this Notebook so that we can execute the code provided.

If you want to use terminal for the practice, here is the command to use.

```
spark2-shell \
  --master yarn \
  --name "Joining Data Sets" \
  --conf spark.ui.port=0
```

In [None]:
import org.apache.spark.sql.SparkSession

val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    appName("Windowing Functions").
    master("yarn").
    getOrCreate()

In [None]:
spark.conf.set("spark.sql.shuffle.partitions", "2")

In [None]:
import spark.implicits._

### Tasks

Let us perform few tasks related to ranking.

In [None]:
val airlines_path = "/public/airlines_all/airlines-part/flightmonth=200801"

In [None]:
val airlines = spark.
    read.
    parquet(airlines_path)

In [None]:
import org.apache.spark.sql.functions.{col, lit, lpad, concat}
import org.apache.spark.sql.functions.{rank, dense_rank}
import org.apache.spark.sql.functions.{percent_rank, row_number, round}
import org.apache.spark.sql.expressions.Window

In [None]:
val spec = Window.
    partitionBy("FlightDate", "Origin").
    orderBy(col("DepDelay").desc)

In [None]:
airlines.
    filter("IsDepDelayed = 'YES' and Cancelled = 0").
    select(concat($"Year", 
                  lpad($"Month", 2, "0"), 
                  lpad($"DayOfMonth", 2, "0")
                 ).alias("FlightDate"),
           $"Origin",
           $"UniqueCarrier",
           $"FlightNum",
           $"CRSDepTime",
           $"IsDepDelayed",
           $"DepDelay".cast("int").alias("DepDelay")
          ).
    withColumn("srank", rank().over(spec)).
    withColumn("drank", dense_rank().over(spec)).
    withColumn("prank", round(percent_rank().over(spec), 2)).
    withColumn("rn", row_number().over(spec)).
    orderBy($"FlightDate", $"Origin", $"DepDelay".desc).
    show