# Polygon Filtering

The purpose of this section is to provide an exampe for the type of geospatial querying available in RasterFrames through geotrellis. 

In [1]:
# Initial imports

import pyspark
from pyspark.sql import SparkSession
from pyrasterframes import *
from pyrasterframes.rasterfunctions import *

# Add other configuration options as needed

spark = SparkSession.builder. \
    master("local[*]"). \
    appName("RasterFrames"). \
    config("spark.ui.enabled", "false"). \
    getOrCreate(). \
    withRasterFrames()

We read in our tif and examine its structure. 

In [2]:
RF = spark.read.geotiff("samples/construction.tif").asRF()

RF.select("spatial_key").show()
RF.printSchema()

+-----------+
|spatial_key|
+-----------+
|      [1,0]|
|      [0,1]|
|      [2,1]|
|      [0,0]|
|      [1,1]|
|      [2,0]|
+-----------+

root
 |-- spatial_key: struct (nullable = false)
 |    |-- col: integer (nullable = false)
 |    |-- row: integer (nullable = false)
 |-- bounds: polygon (nullable = true)
 |-- metadata: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = false)
 |-- tile_1: rf_tile (nullable = false)
 |-- tile_2: rf_tile (nullable = false)
 |-- tile_3: rf_tile (nullable = false)
 |-- tile_4: rf_tile (nullable = false)



As we can see, this rasterframe consists of six tiles in a three column by two row configuration across four bands. For the sake of compute time we will only run computations on the green band.

In [3]:
RF = RF.select("spatial_key", "bounds", "tile_2")

In this example we have six tiles but we only are seeking to query the area around the houses in the middle of the scene. We are now going to filter tiles based on whether or not they intersect with the polygon in the picture.

![-](pics/with_geom.png)

Here we define our polygon with the help of QGIS according to the wkt (well-known text) format and the CRS corresponding to that of the scene.

In [4]:
wkt = 'POLYGON((724341.153356255497783 4213434.954353030771017, 724447.811390113900416 4213410.254597820341587, 724409.639041154878214 4213259.810634279623628, 724322.067181774647906 4213281.14224105514586, 724341.153356255497783 4213434.954353030771017))'

We convert the wkt to geometry

In [5]:
from pyspark.sql.functions import lit
geomRF = RF.withColumn("wkt", lit(wkt)) \
.withColumn("polygon", st_geomFromWKT("wkt"))

geomRF.printSchema()

root
 |-- spatial_key: struct (nullable = false)
 |    |-- col: integer (nullable = false)
 |    |-- row: integer (nullable = false)
 |-- bounds: polygon (nullable = true)
 |-- tile_2: rf_tile (nullable = false)
 |-- wkt: string (nullable = false)
 |-- polygon: geometry (nullable = true)



Now that the geometry is in a column, we are ready to call our spatial functions on it.

In [6]:
intersectRF = geomRF.filter(st_intersects("bounds", "polygon")).asRF()

intersectRF.show()

+-----------+--------------------+--------------------+--------------------+--------------------+
|spatial_key|              bounds|              tile_2|                 wkt|             polygon|
+-----------+--------------------+--------------------+--------------------+--------------------+
|      [1,0]|POLYGON ((724282 ...|UByteRawArrayTile...|POLYGON((724341.1...|POLYGON ((724341....|
|      [1,1]|POLYGON ((724282 ...|UByteRawArrayTile...|POLYGON((724341.1...|POLYGON ((724341....|
+-----------+--------------------+--------------------+--------------------+--------------------+



As we can see, the result of this operation is that we are left with two tiles out of the original six, the two in the middle of the image. All other tiles were filtered out because they did not intersect the geometry. Note that the filtered image is black and white because only one band was used in this example. Because `filter` operates on rows, the example syntax wouldn't change to implement multiple bands.

![](pics/poly_filter.png)