# Exporting&nbsp;RasterFrames

We perform the usual imports

In [None]:
from pyrasterframes import *
from pyrasterframes.rasterfunctions import *
import pyspark
from pyspark.sql import SparkSession

Our sparksession is decleared

In [None]:
spark = SparkSession.builder. \
    master("local[*]"). \
    appName("RasterFrames"). \
    config("spark.ui.enabled", "false"). \
    getOrCreate(). \
    withRasterFrames()
    

We read in our tiff using `spark.read.geotiff()`

In [4]:
samplePath = 'samples/L8-B8-Robinson-IL.tiff'
rf = spark.read.geotiff(samplePath)

While the goal of RasterFrames is to make it as easy as possible to do your geospatial analysis with a single 
construct, it is helpful to be able to transform it into other representations for various use cases.

## Converting to Array

The cell values within a `Tile` are encoded internally as an array. There may be use cases 
where the additional context provided by the `Tile` construct is no longer needed and one would
prefer to work with the underlying array data.

The `tileToIntArray` or `tileToDoubleArray` column functions can be used to create an array from tile cell values.

In [5]:
withArrays = rf.withColumn("tileData", tileToIntArray('tile')).drop('tile')
withArrays.select('spatial_key','tiledata').show(5, 40)

+-----------+----------------------------------------+
|spatial_key|                                tiledata|
+-----------+----------------------------------------+
|      [2,1]|[9387, 10904, 9782, 9777, 10273, 1015...|
|      [0,0]|[14294, 14277, 13939, 13604, 14182, 1...|
|      [3,1]|[8498, 8423, 8550, 8603, 8561, 8685, ...|
|      [1,0]|[9827, 9926, 10055, 9953, 9817, 10055...|
|      [3,0]|[9651, 9600, 9442, 9179, 9181, 10513,...|
+-----------+----------------------------------------+
only showing top 5 rows



We can convert the data back to a tile, but we have to specify the target tile dimensions. 

In [6]:
tileBack = withArrays.withColumn("tileAgain", arrayToTile("tileData", 128, 128))
tileBack.drop("tileData").select('spatial_key', 'tileAgain').show(5, 40) 

+-----------+------------------------------------+
|spatial_key|                           tileAgain|
+-----------+------------------------------------+
|      [2,1]|IntRawArrayTile([I@1791e44f,128,128)|
|      [0,0]|IntRawArrayTile([I@14b637ba,128,128)|
|      [3,1]|IntRawArrayTile([I@4bf62e80,128,128)|
|      [1,0]|IntRawArrayTile([I@428eadc7,128,128)|
|      [3,0]|IntRawArrayTile([I@5a258a45,128,128)|
+-----------+------------------------------------+
only showing top 5 rows



Note that the created tile will not have a `NoData` value associated with it. Here's how you can do that:

In [7]:
tileBackAgain = withArrays.withColumn("tileAgain", withNoData(arrayToTile("tileData", 128, 128), 3.0))
tileBackAgain.drop("tileData").select('spatial_key', 'tileAgain').show(5, 50)

+-----------+--------------------------------------------------+
|spatial_key|                                         tileAgain|
+-----------+--------------------------------------------------+
|      [2,1]|IntUserDefinedNoDataArrayTile([I@41ddc27,128,12...|
|      [0,0]|IntUserDefinedNoDataArrayTile([I@2e7267d9,128,1...|
|      [3,1]|IntUserDefinedNoDataArrayTile([I@8141c57,128,12...|
|      [1,0]|IntUserDefinedNoDataArrayTile([I@52b4cfa3,128,1...|
|      [3,0]|IntUserDefinedNoDataArrayTile([I@3822cd36,128,1...|
+-----------+--------------------------------------------------+
only showing top 5 rows



## Writing to Parquet

It is often useful to write Spark results in a form that is easily reloaded for subsequent analysis. 
The [Parquet](https://parquet.apache.org/) columnar storage format, native to Spark, is ideal for this. RasterFrames
work just like any other DataFrame in this scenario as long as `spark.withRasterFrames` is called to register
the imagery types


Let's assume we have a RasterFrame we've done some basic processing on: 

In [8]:
added = rf.withColumn("plus100", localAddScalarInt("tile", 100)).asRF()
added.printSchema()

root
 |-- spatial_key: struct (nullable = false)
 |    |-- col: integer (nullable = false)
 |    |-- row: integer (nullable = false)
 |-- bounds: polygon (nullable = true)
 |-- metadata: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = false)
 |-- tile: rf_tile (nullable = false)
 |-- plus100: rf_tile (nullable = true)



In [9]:
added.select(aggStats("tile")).show(1, False)
added.select(aggStats("plus100")).show(1, False)

+---------------------------------------------------------------+
|aggStats(tile)                                                 |
+---------------------------------------------------------------+
|[388000,0,7209.0,39217.0,10160.657951030928,3315112.9759380817]|
+---------------------------------------------------------------+

+--------------------------------------------------------------+
|aggStats(plus100)                                             |
+--------------------------------------------------------------+
|[388000,0,7309.0,39317.0,10260.657951030928,3315112.975938067]|
+--------------------------------------------------------------+



As we can see, the number of cells is the same but the min, max, and mean all rose by 100. We write it out just like any other DataFrame, including the ability to specify partitioning:

In [11]:
filePath = "/tmp/equalized.parquet"
added.select("*", "spatial_key.*").write.partitionBy("col", "row").mode(SaveMode.Overwrite).parquet(filePath)

NameError: name 'SaveMode' is not defined

Let's confirm partitioning happened as expected:

In [None]:
import java.io.File
new File(filePath).list.filter(f => !f.contains("_"))

Now we can load the data back in and check it out:

In [None]:
rf2 = spark.read.parquet(filePath)

rf2.printSchema
equalized.select(aggStats($"tile")).show(false)
equalized.select(aggStats($"equalized")).show(false)

## Exporting a Raster

For the purposes of debugging, the RasterFrame tiles can be reassembled back into a raster for viewing. However, 
keep in mind that this will download all the data to the driver, and reassemble it in-memory. So it's not appropriate 
for very large coverages.

Here's how one might render the image to a georeferenced GeoTIFF file: 

In [None]:
import geotrellis.raster.io.geotiff.GeoTiff
image = equalized.toRaster($"equalized", 774, 500)
GeoTiff(image).write("target/scala-2.11/tut/rf-raster.tiff")

[*Download GeoTIFF*](rf-raster.tiff)

Here's how one might render a raster frame to a false color PNG file:

In [None]:
val colors = ColorMap.fromQuantileBreaks(image.tile.histogram, ColorRamps.BlueToOrange)
image.tile.color(colors).renderPng().write("target/scala-2.11/tut/rf-raster.png")

![](rf-raster.png)

## Exporting to a GeoTrellis Layer

For future analysis it is helpful to persist a RasterFrame as a [GeoTrellis layer](http://geotrellis.readthedocs.io/en/latest/guide/tile-backends.html).

First, convert the RasterFrame into a TileLayerRDD. The return type is an Either;
the `left` side is for spatial-only keyed data

In [None]:
tlRDD = equalized.toTileLayerRDD($"equalized").left.get

Then create a GeoTrellis layer writer:

In [None]:
import java.nio.file.Files
import spray.json._
import DefaultJsonProtocol._
import geotrellis.spark.io._
p = Files.createTempDirectory("gt-store")
writer: LayerWriter[LayerId] = LayerWriter(p.toUri)

layerId = LayerId("equalized", 0)
writer.write(layerId, tlRDD, index.ZCurveKeyIndexMethod)

Take a look at the metadata in JSON format:

In [None]:
AttributeStore(p.toUri).readMetadata[JsValue](layerId).prettyPrint