# Focal Operations with RastrFrames Notebook

## Setup Spark Environment

In [1]:
import pyrasterframes
from pyrasterframes.utils import create_rf_spark_session
import pyrasterframes.rf_ipython  # enables nicer visualizations of pandas DF
from pyrasterframes.rasterfunctions import *
import pyspark.sql.functions as F

In [2]:
spark = create_rf_spark_session()

bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by bash)
21/09/30 03:19:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


### Get a PySpark DataFrame from elevation raster

Read a single scene of elevation into DataFrame or raster tiles.
Each tile overlaps its neighbor by "buffer_size" of pixels, providing focal operations neighbor information around tile edges.
You can configure the default size of these tiles, by passing a tuple of desired columns and rows as: `raster(uri, tile_dimensions=(96, 96))`. The default is `(256, 256)`

In [3]:
uri = 'https://geotrellis-demo.s3.us-east-1.amazonaws.com/cogs/harrisburg-pa/elevation.tif'
df = spark.read.raster(uri, tile_dimensions=(512, 512), buffer_size=2)

In [4]:
df.printSchema()

root
 |-- proj_raster_path: string (nullable = false)
 |-- proj_raster: struct (nullable = true)
 |    |-- tile: tile (nullable = true)
 |    |-- extent: struct (nullable = true)
 |    |    |-- xmin: double (nullable = false)
 |    |    |-- ymin: double (nullable = false)
 |    |    |-- xmax: double (nullable = false)
 |    |    |-- ymax: double (nullable = false)
 |    |-- crs: crs (nullable = true)



The extent struct tells us where in the [CRS](https://spatialreference.org/ref/sr-org/6842/) the tile data covers. The granule is split into arbitrary sized chunks. Each row is a different chunk. Let's see how many.

In [5]:
df.count()

                                                                                

81

## Focal Operations
Additional transformations are complished through use of column functions.
The functions used here are mapped to their Scala implementation and applied per row.
For each row the source elevation data is fetched only once before it's used as input.

In [6]:
df.select(
    rf_crs(df.proj_raster), 
    rf_extent(df.proj_raster), 
    rf_aspect(df.proj_raster), 
    rf_slope(df.proj_raster, z_factor=1), 
    rf_hillshade(df.proj_raster, azimuth=315, altitude=45, z_factor=1))

                                                                                

rf_crs(proj_raster),rf_extent(proj_raster),rf_aspect(proj_raster),"rf_slope(proj_raster, 1)","rf_hillshade(proj_raster, 315, 45, 1)"
utm-CS,"{240929.2154, 4398599.0319, 256289.2154, 4401599.0319}",,,
utm-CS,"{210209.2154, 4432319.0319, 225569.2154, 4447679.0319}",,,
utm-CS,"{256289.2154, 4416959.0319, 271649.2154, 4432319.0319}",,,
utm-CS,"{271649.2154, 4509119.0319, 287009.2154, 4524479.0319}",,,
utm-CS,"{333089.2154, 4398599.0319, 341969.2154, 4401599.0319}",,,
