## Grouped Aggregations

Let us go through the details related to aggregation using Spark.

* We can perform total aggregations directly on Dataframe or we can perform aggregations after grouping by a key(s).
* Here are the APIs which we typically use to group the data using a key.
  * groupBy
  * rollup
  * cube
* Here are the functions which we typically use to perform aggregations.
  * count
  * sum, avg
  * min, max
* If we want to provide aliases to the aggregated fields then we have to use `agg` after `groupBy`.
* Let us get the count of flights for each day for the month of 200801.

In [None]:
val airlines_path = "/public/airlines_all/airlines-part/flightmonth=200801"

In [None]:
val airlines = spark.
    read.
    parquet(airlines_path)

In [None]:
import org.apache.spark.sql.functions.{concat, lpad, count, lit}

In [None]:
import spark.implicits._

In [None]:
airlines.
    groupBy(concat($"year",
                   lpad($"Month", 2, "0"),
                   lpad($"DayOfMonth", 2, "0")
                  ).alias("FlightDate")
           ).
    agg(count(lit(1)).alias("FlightCount")).
    show