# Working With RasterLayers

The purpose of `RasterLayer` is to store and format data to produce a `TiledRasterLayer`. Thus, this class lacks the methods needed to perform any kind of spatial analysis. It can be thought of as something of an “organizer”. Which sorts and lays out the data so that `TiledRasterLayer` can perform operations on the data.

This following guide will go over the creation methods and abilities unique to `RasterLayer`. For `TiledRasterLayer`, please see this [guide] and for a more general overview of layers, look [here].

## Setting up the Environment for This Guide

In [1]:
!curl -o /tmp/cropped.tif https://s3.amazonaws.com/geopyspark-test/example-files/cropped.tif

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5631k  100 5631k    0     0  1610k      0  0:00:03  0:00:03 --:--:-- 1610k


In [2]:
import numpy as np

import geopyspark as gps

from pyspark import SparkContext

In [3]:
conf = gps.geopyspark_conf(master="local[*]", appName="raster-layer-examples")
pysc = SparkContext(conf=conf)

## Creating RasterLayers

There are just two ways to create a `RasterLayer`: (1) through reading GeoTiffs from the local file system, S3, or HDFS; or (2) from an existing PySpark RDD.

### From GeoTiffs

The `get` method in `geopyspark.geotrellis.geotiff` creates an instance of `RasterLayer` from GeoTiffs. In this example, a GeoTiff with spatial data is read locally.

In [5]:
raster_layer = gps.geotiff.get(pysc=pysc, layer_type=gps.LayerType.SPATIAL, uri="file:///tmp/cropped.tif")
raster_layer

RasterLayer(layer_type=LayerType.SPATIAL)

**Note**: If you have multiple GeoTiffs, you can just specify the directory where they’re all stored. Or if the GeoTiffs are spread out in multiple locations, you can give get a list of the places to read in the GeoTiffs.

### From PySpark RDDs

The second option is to create a new RasterRDD from a PySpark RDD via the `from_numpy_rdd` class method. This step is a bit more involved than the last, as it requires the data within the PySpark RDD to be formatted in a specific way.

The following example constructs a RDD with a tuple. The first element is a `ProjectedExtent` because we have decidec to make the data spatial. If we were dealing with spatial-temproal data, then `TemporalProjectedExtent` would be the first element. `Tile` will always be the second element of the tuple.

In [7]:
arr = np.ones((1, 16, 16), dtype=int)

tile = gps.Tile(cells=np.array(arr), cell_type='INT', no_data_value=-500)
extent = gps.Extent(0.0, 1.0, 2.0, 3.0)
projected_extent = gps.ProjectedExtent(extent=extent, epsg=3857)

rdd = pysc.parallelize([(projected_extent, tile)])
gps.RasterLayer.from_numpy_rdd(pysc=pysc, layer_type=gps.LayerType.SPATIAL, numpy_rdd=rdd)

RasterLayer(layer_type=LayerType.SPATIAL)

## Using RasterLayers

Once we've initialized our `RasterLayer` instance, it is now time to use it.

## Collecting Metadata

`Metadata` describes how the data within a `RasterLayer` should be formatted and laid out. `collect_metadata` is used to obtain the metadata, and it can accept two different types of inputs depending on how one wishes to layout the data.

The first option is to specify a `LayoutDefinition` that is created from `Extent` and a `TileLayout` for the Metadata. Where the `Extent` is the area that will be covered by the tiles and the `TileLayout` describes the tiles and the grid they’re arranged on.

In [8]:
extent = gps.Extent(0.0, 0.0, 33.0, 33.0)
tile_layout = gps.TileLayout(2, 2, 256, 256)
layout_definition = gps.LayoutDefinition(extent, tile_layout)

raster_layer.collect_metadata(layout=layout_definition)

Metadata(Bounds(minKey=SpatialKey(col=4, row=1), maxKey=SpatialKey(col=4, row=1)), int16, -32768, +proj=longlat +datum=WGS84 +no_defs , Extent(xmin=80.0, ymin=5.0, xmax=81.0, ymax=7.0), TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256), LayoutDefinition(extent=Extent(xmin=0.0, ymin=0.0, xmax=33.0, ymax=33.0), tileLayout=TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256)))

The other option is to use either ``LocalLayout`` or ``GlobalLayout`` to specify the layout. The exact meaning behind these two layout types is discuessed [link].

In [9]:
# Using LocalLayout
metadata = raster_layer.collect_metadata(layout=gps.LocalLayout())
metadata

Metadata(Bounds(minKey=SpatialKey(col=0, row=0), maxKey=SpatialKey(col=4, row=9)), int16, -32768, +proj=longlat +datum=WGS84 +no_defs , Extent(xmin=80.0, ymin=5.0, xmax=81.0, ymax=7.0), TileLayout(layoutCols=5, layoutRows=10, tileCols=256, tileRows=256), LayoutDefinition(extent=Extent(xmin=80.0, ymin=4.866666666666667, xmax=81.06666666666666, ymax=7.0), tileLayout=TileLayout(layoutCols=5, layoutRows=10, tileCols=256, tileRows=256)))

In [10]:
# Using GlobalLayout
metadata = raster_layer.collect_metadata(layout=gps.GlobalLayout())
metadata

Metadata(Bounds(minKey=SpatialKey(col=1479, row=944), maxKey=SpatialKey(col=1484, row=967)), int16, -32768, +proj=longlat +datum=WGS84 +no_defs , Extent(xmin=80.0, ymin=5.0, xmax=81.0, ymax=7.0), TileLayout(layoutCols=2048, layoutRows=2048, tileCols=256, tileRows=256), LayoutDefinition(extent=Extent(xmin=-180.0, ymin=-89.99999, xmax=179.99999, ymax=89.99999), tileLayout=TileLayout(layoutCols=2048, layoutRows=2048, tileCols=256, tileRows=256)))

## Tiling Data to a Layout

`tile_to_layout` will tile and format the rasters within a ``RasterLayer`` to a given layout. The layout to tile to can be derived from various sources.

### From Metadata

In [14]:
tiled_raster_layer = raster_layer.tile_to_layout(layout=metadata)
tiled_raster_layer

TiledRasterLayer(layer_type=LayerType.SPATIAL, zoom_level=None, is_floating_point_layer=False)

### From LayoutDefinition

In [13]:
raster_layer.tile_to_layout(layout=layout_definition)

TypeError: not all arguments converted during string formatting

### From A TiledRasterLayer

One can tile a `RasterLayer` to the same layout as a `TiledRasterLayout`.

In [15]:
raster_layer.tile_to_layout(layout=tiled_raster_layer)

TypeError: Metadata(Bounds(minKey=SpatialKey(col=1479, row=944), maxKey=SpatialKey(col=1484, row=967)), int16, -32768, +proj=longlat +datum=WGS84 +no_defs , Extent(xmin=80.0, ymin=5.0, xmax=81.0, ymax=7.0), TileLayout(layoutCols=2048, layoutRows=2048, tileCols=256, tileRows=256), LayoutDefinition(extent=Extent(xmin=-180.0, ymin=-89.99999, xmax=179.99999, ymax=89.99999), tileLayout=TileLayout(layoutCols=2048, layoutRows=2048, tileCols=256, tileRows=256))) is not JSON serializable

### From LocalLayout

In [16]:
raster_layer.tile_to_layout(gps.LocalLayout())

TiledRasterLayer(layer_type=LayerType.SPATIAL, zoom_level=None, is_floating_point_layer=False)

### From GlobalLayout

In [17]:
raster_layer.tile_to_layout(gps.GlobalLayout())

TiledRasterLayer(layer_type=LayerType.SPATIAL, zoom_level=11, is_floating_point_layer=False)

### Resampling During Tiling

It is also possible to select a `resample_method` when tiling the layer.

In [19]:
raster_layer.tile_to_layout(layout=metadata, resample_method=gps.ResampleMethod.BILINEAR)

TiledRasterLayer(layer_type=LayerType.SPATIAL, zoom_level=None, is_floating_point_layer=False)