# Working With RasterLayers

The purpose of `RasterLayer` is to store and format data to produce a `TiledRasterLayer`. Thus, this class lacks the methods needed to perform any kind of spatial analysis. It can be thought of as something of an “organizer”. Which sorts and lays out the data so that `TiledRasterLayer` can perform operations on the data.

This following guide will go over the creation methods and abilities unique to `RasterLayer`. For `TiledRasterLayer`, please see this [guide] and for a more general overview of layers, look [here].

## Setting up the Environment for This Guide

In [None]:
!curl -o /tmp/cropped.tif https://s3.amazonaws.com/geopyspark-test/example-files/cropped.tif

In [13]:
import numpy as np

from geopyspark import geopyspark_conf
from geopyspark.geotrellis import catalog, geotiff, Tile, Extent, ProjectedExtent, TileLayout
from geopyspark.geotrellis.constants import LayerType, ResampleMethod
from geopyspark.geotrellis.layer import RasterLayer

from pyspark import SparkContext

In [2]:
conf = geopyspark_conf(master="local[*]", appName="raster-layer-examples")
pysc = SparkContext(conf=conf)

## Creating RasterLayers

There are just two ways to create a `RasterLayer`: (1) through reading GeoTiffs from the local file system, S3, or HDFS; or (2) from an existing PySpark RDD.

### From GeoTiffs

The `get` method in `geopyspark.geotrellis.geotiff` creates an instance of `RasterLayer` from GeoTiffs. In this example, a GeoTiff with spatial data is read locally.

In [15]:
raster_layer = geotiff.get(pysc=pysc, layer_type=LayerType.SPATIAL, uri="file:///tmp/cropped.tif")
raster_layer

<geopyspark.geotrellis.layer.RasterLayer at 0x7f817a013318>

**Note**: If you have multiple GeoTiffs, you can just specify the directory where they’re all stored. Or if the GeoTiffs are spread out in multiple locations, you can give get a list of the places to read in the GeoTiffs.

### From PySpark RDDs

The second option is to create a new RasterRDD from a PySpark RDD via the `from_numpy_rdd` class method. This step is a bit more involved than the last, as it requires the data within the PySpark RDD to be formatted in a specific way.

The following example constructs a RDD with a tuple. The first element is a `ProjectedExtent` because we have decidec to make the data spatial. If we were dealing with spatial-temproal data, then `TemporalProjectedExtent` would be the first element. `Tile` will always be the second element of the tuple.

In [16]:
arr = np.ones((1, 16, 16), dtype=int)

tile = Tile(cells=np.array(arr), cell_type='INT', no_data_value=-500)
extent = Extent(0.0, 1.0, 2.0, 3.0)
projected_extent = ProjectedExtent(extent=extent, epsg=3857)

rdd = pysc.parallelize([(projected_extent, tile)])
RasterLayer.from_numpy_rdd(pysc=pysc, layer_type=LayerType.SPATIAL, numpy_rdd=rdd)

<geopyspark.geotrellis.layer.RasterLayer at 0x7f817a059318>

## Using RasterLayers

Once we've initialized our `RasterLayer` instance, it is now time to use it.

## Collecting Metadata

In order to convert a `RasterLayer` to a `TiledRasterLayer` the layer's metadata must first be collected; as it contains the information on how the data should be formatted and laid out in the `TiledRasterLayer`. `collect_metadata` is used to obtain the metadata, and it can accept two different types of inputs depending on how one wishes to layout the data.

The first option is to specify an `Extent` and a `TileLayout` for the Metadata. Where the `Extent` is the area that will be covered by the tiles and the `TileLayout` describes the tiles and the grid they’re arranged on.

In [17]:
extent = Extent(0.0, 0.0, 33.0, 33.0)
tile_layout = TileLayout(2, 2, 256, 256)

raster_layer.collect_metadata(extent=extent, layout=tile_layout)

Metadata(Bounds(minKey=SpatialKey(col=4, row=1), maxKey=SpatialKey(col=4, row=1)), int16, -32768, +proj=longlat +datum=WGS84 +no_defs , Extent(xmin=80.0, ymin=5.0, xmax=81.0, ymax=7.0), TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256), LayoutDefinition(extent=Extent(xmin=0.0, ymin=0.0, xmax=33.0, ymax=33.0), tileLayout=TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256)))

The other option is to simply give `collect_metadata` the `tile_size` that each tile should be in the resulting grid. `Extent` and `TileLayout` will be calculated from this size. Using this method will ensure that the native resolutions of the rasters are kept.

In [18]:
metadata = raster_layer.collect_metadata(tile_size=512)
metadata

Metadata(Bounds(minKey=SpatialKey(col=0, row=0), maxKey=SpatialKey(col=2, row=4)), int16, -32768, +proj=longlat +datum=WGS84 +no_defs , Extent(xmin=80.0, ymin=5.0, xmax=81.0, ymax=7.0), TileLayout(layoutCols=3, layoutRows=5, tileCols=512, tileRows=512), LayoutDefinition(extent=Extent(xmin=80.0, ymin=4.866666666666667, xmax=81.28, ymax=7.0), tileLayout=TileLayout(layoutCols=3, layoutRows=5, tileCols=512, tileRows=512)))

## Tiling Data to a Layout

Once `Metadata` has been obtained, `RasterLayer` will be able to format the data, which will result in a new `TiledRasterLayer` instance. There are two methods to do this: `cut_tiles` and `tile_to_layout`.

Both of these methods have the same inputs and similar outputs, however, there is one key difference between the two. `cut_tiles` will cut the rasters to the given layout, but will not fix any overlap that may occur. Whereas `tile_to_layout` will cut and then merge together areas that are overlapped. This matters as each tile is referenced by a key, and if there’s overlap than there could be duplicate keys.

Therefore, it is recommended to use `tile_to_layout` to ensure there is no duplication.

In [19]:
raster_layer.tile_to_layout(layer_metadata=metadata)

<geopyspark.geotrellis.layer.TiledRasterLayer at 0x7f817a0598b8>

It is also possible to select a `resample_method` when tiling the layer.

In [20]:
raster_layer.tile_to_layout(layer_metadata=metadata, resample_method=ResampleMethod.BILINEAR)

<geopyspark.geotrellis.layer.TiledRasterLayer at 0x7f817a059868>

### A Quicker Way to TiledRasterLayer

`to_tiled_layer` allows the user to layout their data and produce a `TiledRasterLayer` in just one step. This method is `collect_metadata` and `tile_to_layout` combined, and is used to save a little time when writing.

In [21]:
raster_layer.to_tiled_layer(extent=extent, layout=tile_layout)

<geopyspark.geotrellis.layer.TiledRasterLayer at 0x7f817a059818>