# Raster Statistics

In [2]:
from pyrasterframes import *
from pyrasterframes.rasterfunctions import *
import pyspark
from pyspark.sql import SparkSession
from pathlib import Path

resource_dir = Path('./samples').resolve()

spark = SparkSession.builder. \
    master("local[*]"). \
    appName("RasterFrames"). \
    config("spark.ui.enabled", "false"). \
    getOrCreate(). \
    withRasterFrames()
# spark.sparkContext.setLogLevel("ERROR")

rf = spark.read.geotiff(resource_dir.joinpath("L8-B8-Robinson-IL.tiff").as_uri())

RasterFrames has a number of extension methods and columnar functions for performing analysis on tiles.

## Tile Statistics 

### Tile Dimensions

Get the nominal tile dimensions. Depending on the tiling there may be some tiles with different sizes on the edges.

In [6]:
rf.select(rf.spatialKeyColumn(), tileDimensions("tile")).show()

+-----------+---------------+
|spatial_key|dimension(tile)|
+-----------+---------------+
|      [0,0]|      [250,250]|
|      [1,0]|      [250,250]|
|      [0,1]|      [250,250]|
|      [1,1]|      [250,250]|
+-----------+---------------+



### Descriptive Statistics

#### NoData Counts

Count the numer of `NoData` and non-`NoData` cells in each tile.

In [14]:
rf.select(rf.spatialKeyColumn, noDataCells("tile"), dataCells("tile")).show(3)

AttributeError: 'function' object has no attribute '_get_object_id'

#### Tile Mean

Compute the mean value in each tile. Use `tileMean` for integral cell types, and `tileMeanDouble` for floating point
cell types.

In [None]:
 
```tut
rf.select(rf.spatialKeyColumn, tileMean($"tile")).show(3)
```

#### Tile Summary Statistics

Compute a suite of summary statistics for each tile. Use `tileStats` for integral cells types, and `tileStatsDouble`
for floating point cell types.

In [None]:
```tut
rf.withColumn("stats", tileStats($"tile")).select(rf.spatialKeyColumn, $"stats.*").show(3)
```

### Histogram

The `tileHistogram` function computes a histogram over the data in each tile. See the 
@scaladoc[GeoTrellis `Histogram`](geotrellis.raster.histogram.Histogram) documentation for details on what's
available in the resulting data structure. Use this version for integral cell types, and `tileHistorgramDouble` for
floating  point cells types. 

In this example we compute quantile breaks.

In [None]:
```tut
rf.select(tileHistogram($"tile")).map(_.quantileBreaks(5)).show(5, false)
```

## Aggregate Statistics

The `aggStats` function computes the same summary statistics as `tileStats`, but aggregates them over the whole 
RasterFrame.

In [None]:
```tut
rf.select(aggStats($"tile")).show()
```

A more involved example: extract bin counts from a computed `Histogram`.

In [None]:
```tut
rf.select(aggHistogram($"tile")).
  map(h => for(v <- h.labels) yield(v, h.itemCount(v))).
  select(explode($"value") as "counts").
  select("counts._1", "counts._2").
  toDF("value", "count").
  orderBy(desc("count")).
  show(10)
```

In [None]:
```tut:invisible
spark.stop()
```