# Exporting&nbsp;RasterFrames

In [1]:
from pyrasterframes import *
from pyrasterframes.rasterfunctions import *
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf

In [2]:
spark = SparkSession.builder. \
    master("local[*]"). \
    appName("RasterFrames"). \
    config("spark.ui.enabled", "false"). \
    getOrCreate(). \
    withRasterFrames()

samplePath = 'samples/L8-B8-Robinson-IL.tiff'
rf = spark.read.geotiff(samplePath)

While the goal of RasterFrames is to make it as easy as possible to do your geospatial analysis with a single 
construct, it is helpful to be able to transform it into other representations for various use cases.

## Converting to Array

The cell values within a `Tile` are encoded internally as an array. There may be use cases 
where the additional context provided by the `Tile` construct is no longer needed and one would
prefer to work with the underlying array data.

The `tileToIntArray` or `tileToDoubleArray` column functions can be used to create an array from tile cell values.

In [3]:
withArrays = rf.withColumn("tileData", tileToIntArray('tile')).drop('tile')
withArrays.select('spatial_key','tiledata').show(5, 40)

+-----------+----------------------------------------+
|spatial_key|                                tiledata|
+-----------+----------------------------------------+
|      [0,0]|[14294, 13939, 13604, 14851, 15584, 1...|
|      [1,0]|[7988, 7852, 7941, 7695, 7703, 7781, ...|
|      [0,1]|[9041, 9231, 9213, 9249, 9273, 9426, ...|
|      [1,1]|[9387, 9782, 9777, 10150, 10660, 1008...|
+-----------+----------------------------------------+



You can convert the data back to a tile, but you have to specify the target tile dimensions. 

In [4]:
tileBack = withArrays.withColumn("tileAgain", arrayToTile("tileData", 128, 128))
tileBack.drop("tileData").select('spatial_key', 'tileAgain').show(5, 40) 

+-----------+------------------------------------+
|spatial_key|                           tileAgain|
+-----------+------------------------------------+
|      [0,0]|IntRawArrayTile([I@21cbafcf,128,128)|
|      [1,0]|IntRawArrayTile([I@10b577a1,128,128)|
|      [0,1]|IntRawArrayTile([I@13948768,128,128)|
|      [1,1]|IntRawArrayTile([I@4c794b28,128,128)|
+-----------+------------------------------------+



Note that the created tile will not have a `NoData` value associated with it. Here's how you can do that:

In [5]:
tileBackAgain = withArrays.withColumn("tileAgain", withNoData(arrayToTile("tileData", 128, 128), 3.0))
tileBackAgain.drop("tileData").select('spatial_key', 'tileAgain').show(5, 50)

+-----------+--------------------------------------------------+
|spatial_key|                                         tileAgain|
+-----------+--------------------------------------------------+
|      [0,0]|IntUserDefinedNoDataArrayTile([I@55e7363c,128,1...|
|      [1,0]|IntUserDefinedNoDataArrayTile([I@2d5ff14d,128,1...|
|      [0,1]|IntUserDefinedNoDataArrayTile([I@37026b60,128,1...|
|      [1,1]|IntUserDefinedNoDataArrayTile([I@1d59abc8,128,1...|
+-----------+--------------------------------------------------+



## Writing to Parquet

It is often useful to write Spark results in a form that is easily reloaded for subsequent analysis. 
The [Parquet](https://parquet.apache.org/)columnar storage format, native to Spark, is ideal for this. RasterFrames
work just like any other DataFrame in this scenario as long as @scaladoc[`rfInit`][rfInit] is called to register
the imagery types.


Let's assume we have a RasterFrame we've done some fancy processing on: 

In [6]:
equalizer = udf(lambda t: t.equalize())
spark.withRasterFrames()
equalized = rf.withColumn("equalized", equalizer("tile")).asRF()
equalized.printSchema()

root
 |-- spatial_key: struct (nullable = false)
 |    |-- col: integer (nullable = false)
 |    |-- row: integer (nullable = false)
 |-- bounds: polygon (nullable = true)
 |-- metadata: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = false)
 |-- tile: rf_tile (nullable = false)
 |-- equalized: string (nullable = true)



In [7]:
equalized.select(aggStats("tile")).show(1, False)
equalized.select(aggStats("equalized")).show(1, False)

+-----------------------------------------------------+
|aggStats(tile)                                       |
+-----------------------------------------------------+
|[250000,7249.0,39217.0,10158.452748,3326057.95298326]|
+-----------------------------------------------------+



AnalysisException: "cannot resolve 'CellStatsAggregateFunction(equalized)' due to data type mismatch: argument 1 requires rf_tile type, however, '`equalized`' is of string type.;;\n'Aggregate [cellstatsaggregatefunction(equalized#157, CellStatsAggregateFunction(), 0, 0) AS aggStats(equalized)#333]\n+- Project [spatial_key#24, bounds#25, metadata#26, tile#27, <lambda>(tile#27) AS equalized#157]\n   +- Relation[spatial_key#24,bounds#25,metadata#26,tile#27] GeoTiffRelation(org.apache.spark.sql.SQLContext@44801e1c,samples/L8-B8-Robinson-IL.tiff)\n"

We write it out just like any other DataFrame, including the ability to specify partitioning:

In [None]:
filePath = "/tmp/equalized.parquet"
equalized.select("*", "spatial_key.*").write.partitionBy("col", "row").mode(SaveMode.Overwrite).parquet(filePath)

Let's confirm partitioning happened as expected:

In [None]:
import java.io.File
new File(filePath).list.filter(f => !f.contains("_"))

Now we can load the data back in and check it out:

In [None]:
rf2 = spark.read.parquet(filePath)

rf2.printSchema
equalized.select(aggStats($"tile")).show(false)
equalized.select(aggStats($"equalized")).show(false)

## Exporting a Raster

For the purposes of debugging, the RasterFrame tiles can be reassembled back into a raster for viewing. However, 
keep in mind that this will download all the data to the driver, and reassemble it in-memory. So it's not appropriate 
for very large coverages.

Here's how one might render the image to a georeferenced GeoTIFF file: 

In [None]:
import geotrellis.raster.io.geotiff.GeoTiff
image = equalized.toRaster($"equalized", 774, 500)
GeoTiff(image).write("target/scala-2.11/tut/rf-raster.tiff")

[*Download GeoTIFF*](rf-raster.tiff)

Here's how one might render a raster frame to a false color PNG file:

In [None]:
val colors = ColorMap.fromQuantileBreaks(image.tile.histogram, ColorRamps.BlueToOrange)
image.tile.color(colors).renderPng().write("target/scala-2.11/tut/rf-raster.png")

![](rf-raster.png)

## Exporting to a GeoTrellis Layer

For future analysis it is helpful to persist a RasterFrame as a [GeoTrellis layer](http://geotrellis.readthedocs.io/en/latest/guide/tile-backends.html).

First, convert the RasterFrame into a TileLayerRDD. The return type is an Either;
the `left` side is for spatial-only keyed data

In [None]:
tlRDD = equalized.toTileLayerRDD($"equalized").left.get

Then create a GeoTrellis layer writer:

In [None]:
import java.nio.file.Files
import spray.json._
import DefaultJsonProtocol._
import geotrellis.spark.io._
p = Files.createTempDirectory("gt-store")
writer: LayerWriter[LayerId] = LayerWriter(p.toUri)

layerId = LayerId("equalized", 0)
writer.write(layerId, tlRDD, index.ZCurveKeyIndexMethod)

Take a look at the metadata in JSON format:

In [None]:
AttributeStore(p.toUri).readMetadata[JsValue](layerId).prettyPrint

## Converting to `RDD` and `TileLayerRDD`

Since a `RasterFrame` is just a `DataFrame` with extra metadata, the method 
@scaladoc[`DataFrame.rdd`][rdd] is available for simple conversion back to `RDD` space. The type returned 
by `.rdd` is dependent upon how you select it.

In [None]:
import scala.reflect.runtime.universe._
def showType[T: TypeTag](t: T) = println(implicitly[TypeTag[T]].tpe.toString)

showType(rf.rdd)

showType(rf.select(rf.spatialKeyColumn, $"tile".as[Tile]).rdd) 

showType(rf.select(rf.spatialKeyColumn, $"tile").as[(SpatialKey, Tile)].rdd) 

If your goal convert a single tile column with its spatial key back to a `TileLayerRDD[K]`, then there's an additional
extension method on `RasterFrame` called [`toTileLayerRDD`][toTileLayerRDD], which preserves the tile layer metadata,
enhancing interoperation with GeoTrellis RDD extension methods.

In [None]:
showType(rf.toTileLayerRDD($"tile".as[Tile]))

In [None]:
```tut:invisible
spark.stop()
```

[rfInit]: astraea.spark.rasterframes.package#rfInit%28SQLContext%29:Unit
[rdd]: org.apache.spark.sql.Dataset#frdd:org.apache.spark.rdd.RDD[T]
[toTileLayerRDD]: astraea.spark.rasterframes.RasterFrameMethods#toTileLayerRDD%28tileCol:RasterFrameMethods.this.TileColumn%29:Either[geotrellis.spark.TileLayerRDD[geotrellis.spark.SpatialKey],geotrellis.spark.TileLayerRDD[geotrellis.spark.SpaceTimeKey]]
[tileToArray]: astraea.spark.rasterframes.ColumnFunctions#tileToArray