# Aggregate functions
Aggregate functions operate on groups of tiles at once. An example of this type of function is finding the mean cell values of a column of tiles. Other statistical and diagnostic tasks can be performed on columns at a time.

In [97]:
import astraea.spark.rasterframes._
import geotrellis.raster.io.geotiff.SinglebandGeoTiff
import org.apache.spark.sql.functions._
import geotrellis.spark.SpatialKey._


implicit val spark = SparkSession.builder().
  master("local").appName("RasterFrames").
  config("spark.ui.enabled", "false").
  getOrCreate().
  withRasterFrames

val sc = spark.sparkContext

def readTiff(name: String): SinglebandGeoTiff = SinglebandGeoTiff(s"../samples/$name")
val filenamePattern = "L8-B%d-Elkton-VA.tiff"

import astraea.spark.rasterframes._
import geotrellis.raster.io.geotiff.SinglebandGeoTiff
import org.apache.spark.sql.functions._
import geotrellis.spark.SpatialKey._
spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@de25f71
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@7922d202
readTiff: (name: String)geotrellis.raster.io.geotiff.SinglebandGeoTiff
filenamePattern: String = L8-B%d-Elkton-VA.tiff


In [34]:
def bandsToJoin(bands: Seq[Int]) = bands.
  map { b ⇒ (b, filenamePattern.format(b)) }.
  map { case (b, f) ⇒ ((b - 1) % 3, readTiff(f)) }.
  map { case (b, t) ⇒ t.projectedRaster.toRF(s"col_$b") }.
  reduce(_ spatialJoin _)

bandsToJoin: (bands: Seq[Int])astraea.spark.rasterframes.RasterFrame


Create a contrived example of columns with multiple tiles. Bands 1,4,7 are in col_0, 2,5,8 are in col_1, and 3,6,9 are in col_2.

In [35]:
val bands1 = bandsToJoin(Seq(1,2,3))
val bands2 = bandsToJoin(Seq(4,5,6))
val bands3 = bandsToJoin(Seq(7,8,9))
val grouped = bands1.union(bands2).union(bands3)

bands1: astraea.spark.rasterframes.RasterFrame = [spatial_key: struct<col: int, row: int>, col_0: rf_tile ... 2 more fields]
bands2: astraea.spark.rasterframes.RasterFrame = [spatial_key: struct<col: int, row: int>, col_0: rf_tile ... 2 more fields]
bands3: astraea.spark.rasterframes.RasterFrame = [spatial_key: struct<col: int, row: int>, col_0: rf_tile ... 2 more fields]
grouped: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [spatial_key: struct<col: int, row: int>, col_0: rf_tile ... 2 more fields]


Compute statistics based on aggregates of tiles. These are all column-wise operations, so `aggMean` computes the average of all cells of all tiles in the first column.

In [37]:
grouped.select(aggMean($"col_0")).show()

+-----------------+
|  agg_mean(col_0)|
+-----------------+
|8549.990636465822|
+-----------------+



`aggDataCells` returns the number of cells in a column that are no NODATA. `aggNoDataCells` does the exact opposite, returning the numbers of NODATA cells.

In [39]:
grouped.select(aggDataCells($"col_1")).show()

+---------------------+
|agg_data_cells(col_1)|
+---------------------+
|               187895|
+---------------------+



## Local aggregate functions
Local functions operate on a cell-by-cell basis. For instance, `localAggMax` examines every tile in a column and the output cells are each the maximum of the number (in this case, three) of values that it sees. The output is a tile with every cell as the maximum of the corresponding cells 

In [121]:
import geotrellis.raster._
val array1: Array[Int] = (1 to 9).toArray
val tile1: Tile = IntArrayTile(array1, 3, 3)
val array2: Array[Int] = (9 to 1 by -1).toArray
val tile2: Tile = IntArrayTile(array1, 3, 3)
val array3: Array[Int] = Array.fill(9)(6)
val tile3: Tile = IntArrayTile(array1, 3, 3)

import geotrellis.raster._
array1: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
tile1: geotrellis.raster.Tile = IntConstantNoDataArrayTile([I@61b1ec22,3,3)
array2: Array[Int] = Array(9, 8, 7, 6, 5, 4, 3, 2, 1)
tile2: geotrellis.raster.Tile = IntConstantNoDataArrayTile([I@61b1ec22,3,3)
array3: Array[Int] = Array(6, 6, 6, 6, 6, 6, 6, 6, 6)
tile3: geotrellis.raster.Tile = IntConstantNoDataArrayTile([I@61b1ec22,3,3)


In [123]:
val df = sc.parallelize(Seq(((0,0), tile1), ((0,0), tile2), ((0,0), tile3))).toDF("spatial_key","tiles").show()

<console>: 139: error: value withColumns is not a member of org.apache.spark.sql.DataFrame