## rowsBetween and rangeBetween

We can get cumulative aggregations using `rowsBetween` or `rangeBetween`.

* We can use `rowsBetween` to include particular set of rows to perform aggregations.
* We can use `rangeBetween` to include particular range of values on a given column.

Let us start spark context for this Notebook so that we can execute the code provided.

If you want to use terminal for the practice, here is the command to use.

```
spark2-shell \
  --master yarn \
  --name "Joining Data Sets" \
  --conf spark.ui.port=0
```

In [None]:
import org.apache.spark.sql.SparkSession

val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    appName("Windowing Functions").
    master("yarn").
    getOrCreate()

In [None]:
spark.conf.set("spark.sql.shuffle.partitions", "2")

In [None]:
import spark.implicits._

In [None]:
val airlines_path = "/public/airlines_all/airlines-part/flightmonth=200801"

In [None]:
val airlines = spark.
  read.
  parquet(airlines_path)

In [None]:
val spec = Window.
    partitionBy("FlightDate", "Origin").
    orderBy("CRSDepTime").
    rowsBetween(Window.unboundedPreceding, 0)

In [None]:
airlines.
    filter("IsDepDelayed = 'YES' and Cancelled = 0").
    select(concat($"Year", 
                  lpad($"Month", 2, "0"), 
                  lpad($"DayOfMonth", 2, "0")
                 ).alias("FlightDate"),
           $"Origin",
           $"UniqueCarrier",
           $"FlightNum",
           $"CRSDepTime",
           $"IsDepDelayed",
           $"DepDelay".cast("int").alias("DepDelay")
          ).
    withColumn("DepDelaySum", sum("DepDelay").over(spec)).
    orderBy("FlightDate", "Origin", "CRSDepTime").
    show

In [None]:
val spec = Window.
    partitionBy("FlightDate", "Origin").
    orderBy("CRSDepTime").
    rowsBetween(-3, 0)

In [None]:
airlines.
    filter("IsDepDelayed = 'YES' and Cancelled = 0").
    select(concat($"Year", 
                  lpad($"Month", 2, "0"), 
                  lpad($"DayOfMonth", 2, "0")
                 ).alias("FlightDate"),
           $"Origin",
           $"UniqueCarrier",
           $"FlightNum",
           $"CRSDepTime",
           $"IsDepDelayed",
           $"DepDelay".cast("int").alias("DepDelay")
          ).
    withColumn("DepDelaySum", sum("DepDelay").over(spec)).
    orderBy("FlightDate", "Origin", "CRSDepTime").
    show