In [9]:
import numpy as np
import geopyspark as gps

# Core Concepts

Because GeoPySpark is a binding of an existing project, GeoTrellis, some terminology and data representations have carried over. This section seeks to explain this jargon in addition to describing how GeoTrellis types are represented in GeoPySpark.

You may notice as read through this section that camel case is used instead of Python’s more traditional naming convention for some values. This is because Scala uses this style of naming, and when it receives data from Python it expects the value names to be in camel case.

## Rasters

GeoPySpark differs in how it represents rasters from other geo-spatial Python libraries like rasterIO. In GeoPySpark, they are represented by a namedtuple called, `Tile`.

**Note**: All rasters in GeoPySpark are represented as having multiple bands, even if the original raster just contained one.

In [10]:
arr = np.array([[[0, 0, 0, 0],
                 [1, 1, 1, 1],
                 [2, 2, 2, 2]]], dtype=np.int16)

# The resulting Tile will set -10 has the no_data_value for the raster
gps.Tile.from_numpy_array(numpy_array=arr, no_data_value=-10)

Tile(cells=array([[[0, 0, 0, 0],
        [1, 1, 1, 1],
        [2, 2, 2, 2]]], dtype=int16), cell_type='SHORT', no_data_value=-10)

In [11]:
# The resulting Tile will have no no_data_value
gps.Tile.from_numpy_array(numpy_array=arr)

Tile(cells=array([[[0, 0, 0, 0],
        [1, 1, 1, 1],
        [2, 2, 2, 2]]], dtype=int16), cell_type='SHORT', no_data_value=None)

## Extent

Describes the area on Earth a raster represents. In GeoPySpark, this is represented by `Extent`

**Note**: The values within the `Extent` must be `float`s and not `double`s.

In [12]:
extent = gps.Extent(0.0, 0.0, 10.0, 10.0)
extent

Extent(xmin=0.0, ymin=0.0, xmax=10.0, ymax=10.0)

## ProjectedExtent

Describes both the area on Earth a raster represents in addition to its CRS. In GeoPySpark, this is represented by
`ProjectedExtent`.

In [13]:
# Using an EPSG code

gps.ProjectedExtent(extent=extent, epsg=3857)

ProjectedExtent(extent=Extent(xmin=0.0, ymin=0.0, xmax=10.0, ymax=10.0), epsg=3857, proj4=None)

In [14]:
# Using a Proj4 String

proj4 = "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs "
gps.ProjectedExtent(extent=extent, proj4=proj4)

ProjectedExtent(extent=Extent(xmin=0.0, ymin=0.0, xmax=10.0, ymax=10.0), epsg=None, proj4='+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ')

## TemporalProjectedExtent

Describes the area on Earth the raster represents, its CRS, and the time the data was collected. In GeoPySpark, this is represented by `TemporalProjectedExtent`.

In [15]:
gps.TemporalProjectedExtent(extent=extent, instant=0.1, epsg=3857)

TemporalProjectedExtent(extent=Extent(xmin=0.0, ymin=0.0, xmax=10.0, ymax=10.0), instant=0.1, epsg=3857, proj4=None)

## TileLayout

``TileLayout`` describes the grid in which the rasters should be laid out.

In [16]:
# Describes a layer where there are four rasters in a 2x2 grid. Each raster has 256 cols and rows.

tile_layout = gps.TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256)
tile_layout

TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256)

## LayoutDefinition

`LayoutDefinition` describes both how the rasters are orginized in a layer as well as the area covered by the grid.

In [17]:
layout_definition = gps.LayoutDefinition(extent=extent, tileLayout=tile_layout)
layout_definition

LayoutDefinition(extent=Extent(xmin=0.0, ymin=0.0, xmax=10.0, ymax=10.0), tileLayout=TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256))

## LocalLayout

`LocalLayout` is a tiling strategy that represents a layout definition where the grid is constructed over all of the pixels within a layer of a given tile size. The resulting layout will match the original resolution of the cells within the rasters.

**Note**: This layout **cannot be used for creating display layers. Rather, it is best used for layers where operations and analysis will be performed.**

In [18]:
# Creates a LocalLayout where each tile within the grid will be 256x256 pixels.
gps.LocalLayout()

LocalLayout(tile_cols=256, tile_rows=256)

In [19]:
# Creates a LocalLayout where each tile within the grid will be 512x512 pixels.
gps.LocalLayout(tile_size=512)

LocalLayout(tile_cols=512, tile_rows=512)

In [20]:
# Creates a LocalLayout where each tile within the grid will be 256x512 pixels.
gps.LocalLayout(tile_cols=256, tile_rows=512)

LocalLayout(tile_cols=256, tile_rows=512)

## GlobalLayout

`GlobalLayout` is a tiling strategy that represents a layout where the grid is constructed over the global extent CRS. The cell resolution of the resulting layer be multiplied by a power of 2 for the CRS. Thus, using this strategy will result in either up or down sampling of the original raster.

**Note**: This layout strategy **should be used when the resulting layer is to be dispalyed in a TMS server.**

In [21]:
# Creates a GobalLayout instance with the default values
gps.GlobalLayout()

GlobalLayout(tile_size=256, zoom=None, threshold=0.1)

In [22]:
# Creates a GlobalLayout instance for a zoom of 12
gps.GlobalLayout(zoom=12)

GlobalLayout(tile_size=256, zoom=12, threshold=0.1)

You may have noticed from the above two examples that `GlobalLayout` does not create layout for a given zoom level by default. Rather, it determines what the zoom should be based on the size of the cells within the rasters. If you do want to create a layout for a specific zoom level, then the `zoom` parameter must be set.

## SpatialKey

Represents the position of a raster within a grid. This grid is a 2D plane where raster positions are represented by a pair of coordinates. In GeoPySpark, this is represented by `SpatialKey`.

In [23]:
gps.SpatialKey(col=0, row=0)

SpatialKey(col=0, row=0)

## SpaceTimeKey

Represents the position of a raster within a grid. This grid is a 3D plane where raster positions are represented by a pair of coordinates as well as a z value that represents time. In GeoPySpark, this is represented by `SpaceTimeKey`.

In [24]:
gps.SpaceTimeKey(col=0, row=0, instant=0.0)

SpaceTimeKey(col=0, row=0, instant=0.0)

## Bounds

`Bounds` represents the the extent of the layout grid in terms of keys. It has both a `minKey` and a `maxKey` attributes. These can either be a `SpatialKey` or a `SpaceTimeKey` depending on the type of data within the layer. The `minKey` is left, uppermost cell in the grid and the `maxKey` is the right, bottommost cell.

In [25]:
# Creating a Bounds from SpatialKeys

min_spatial_key = gps.SpatialKey(0, 0)
max_spatial_key = gps.SpatialKey(10, 10)

bounds = gps.Bounds(min_spatial_key, max_spatial_key)
bounds

Bounds(minKey=SpatialKey(col=0, row=0), maxKey=SpatialKey(col=10, row=10))

In [26]:
# Creating a Bounds from SpaceTimeKeys

min_space_time_key = gps.SpaceTimeKey(0, 0, 1.0)
max_space_time_key = gps.SpaceTimeKey(10, 10, 1.0)

gps.Bounds(min_space_time_key, max_space_time_key)

Bounds(minKey=SpaceTimeKey(col=0, row=0, instant=1.0), maxKey=SpaceTimeKey(col=10, row=10, instant=1.0))

## Metadata

`Metadata` contains information of the values within a layer. This data pretains to the layout, projection, and extent of the data contained within the layer.

The below example shows how to construct `Metadata` by hand, however, this is almost never required and `Metadata` can be produced using easier means. For `RasterLayer`, one call the method, `collect_metadata()` and `TiledRasterLayer` has the attribute, `layer_metadata`.

In [28]:
# Creates Metadata for a layer with rasters that have a cell type of int16 with the previously defined
# bounds, crs, extent, and layout definition.
gps.Metadata(bounds=bounds,
             crs=proj4,
             cell_type=gps.CellType.INT16.value,
             extent=extent,
             layout_definition=layout_definition)

Metadata(Bounds(minKey=SpatialKey(col=0, row=0), maxKey=SpatialKey(col=10, row=10)), int16, -32768, +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs , Extent(xmin=0.0, ymin=0.0, xmax=10.0, ymax=10.0), TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256), LayoutDefinition(extent=Extent(xmin=0.0, ymin=0.0, xmax=10.0, ymax=10.0), tileLayout=TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256)))