Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raster data footprint determination support #307

Merged
merged 12 commits into from
Aug 1, 2022
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Allow MultiPolygons when fixing antimeridian issues ([#317](https://github.com/stac-utils/stactools/pull/317))
- Conda package, via [conda-forge](https://anaconda.org/conda-forge/stactools) ([#324](https://github.com/stac-utils/stactools/pull/324))
- Context manager to ignore rasterio's NotGeoreferencedWarning ([#331](https://github.com/stac-utils/stactools/pull/331))
- Added `raster_footprint` module to assist in populating the geometry of an Item from data coverage of its data assets ([#307](https://github.com/stac-utils/stactools/pull/307))

### Changed

Expand Down
Binary file added docs/_static/landsat-8-example-polygon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/landsat-8-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/modis.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/sentinel-2-partial-data-only.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/sentinel-2-partial.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,12 @@ Reprojection
.. automodule:: stactools.core.projection
:members:

Raster Footprint Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: stactools.core.utils.raster_footprint
:members:

Testing
-------

Expand Down
138 changes: 138 additions & 0 deletions docs/footprint.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
Determining Footprints from Raster Data
=======================================

When creating STAC Item records from raster images, one important action is
generating a spatial footprint of the image data to use for the Item geometry
field. We intentionally use the imprecise term "footprint" here,
as we are not trying to create a precise vector representation of the data of
an image (which could contain as many
polygons as pixels in the image!), but we are trying to do better than just a
bounding box of the entire image, both
data and no data. By footprint, we mean a vector that balances many competing
factors, and generally represents a shape
covering the data in an image. Figuring out the right parameter values to the
methods in the ``raster_footprint``
module will usually require some experimental validation, to find exactly the
right values for your data, CRS, and
use cases.

There are a few properties we'd like to have in a footprint:

#. The footprint only covers the area on the image only where there is data.
#. The footprint has enough points to accurately capture the shape of the data
in the EPSG:4326 CRS used by GeoJSON.
#. The footprint does not have too many points, such that it is becomes very
large.

Discussion of these properties follows.

Only Data Areas are Represented
-------------------------------

Typically, gridded imagery will have image files representing a single grid
square, with no data values
where there is no data. There are many reasons why an image may contain no
data values, and we will describe a few of them here.

In the Sentinel-2 L2A product, many images only have
a partial swath of data where an orbit of the sensor only captured data
within part of a single grid square.

.. image:: _static/sentinel-2-partial.png

Ideally, our footprint generated from this image would not cover the
entire image, but only the area where there is data, like this:

.. image:: _static/sentinel-2-partial-data-only.png

In the Landsat 8 product, the data area appears rotated, as the orbit
of the satellite and path-row gridding is not aligned with the CRS.

.. image:: _static/landsat-8-example.png

We'd like the footprint to only cover that data area, and not the
no data areas that square the image:

.. image:: _static/landsat-8-example-polygon.png

Another issue is products that use no data values for areas where
clouds, snow, or ice exist. A precise vector representation of
such an image would likely be a MultiPolygon with holes where the
no data values are. However, this is rarely useful for STAC use cases.

.. image:: _static/modis.png

Likewise, where an area of no data values coincide with one of
of the edges of the image, we could attempt to create a concavity to
carve out the no data pixels through the
calculation of the alpha shape. Instead, we
take an easier approach and create a convex hull around the data
areas of the image. In practice, the difference in
coverage between the convex hull and the alpha shape is relatively
small. This difference has the potential to
result in false positives where a search geometry intersects the
geometry of an Item, but the only pixels within the
geometry are no data, but in practice this is rarely a problem.

When using the raster_footprint functions, the ``no_data`` parameter
value can be used to pass in the value
used for no data if it not defined in the image metadata.

Reprojected Geometry Accurately Represents the Footprint
--------------------------------------------------------

In the native CRS, the geometry of the data area of an image can
typically be represented by a polygon containing a
small number of points. If the entire image has data pixels, it
maybe represented by a 5 point polygon (where the
first and last points are the same). When this polygon is reprojected
to EPSG:4326, it will still result in a 5 point
polygon, as the points have simply been reprojected into the new CRS.
This new polygon may not accurately represent
original polygon if there is significant curved distortion between
the CRSes. An example of this is most apparent in
reprojecting the sinusoidal geometry of MODIS to EPSG:4326. Simply
reprojecting the 5 point polygon would result in
a parallelogram, whereas a more accurate representation is two
parallel straight sides and two curved sides.

.. image:: _static/modis.png

The increase the accuracy of the reprojected shape, we can first
densify the polygon. Here we add additional, redundant
points between existing points in the polygon, so that the
reprojected shape will have points that more closely match
the actual form of the reprojection. In the above example for
MODIS, we have used a densification factor of 10, so that
each "side" of the polygon has 11 points representing it instead
of 2, and we get a "curved" side that more closely
matches the raster reprojection.

When using the raster_footprint functions, the
``densification_factor`` parameter can be used to densify the
geometry before reprojection. This will require some
experimentation to find the appropriate value for the CRS of your data.

Simplifying the Geometry
-------------------------------

A 4801 point polygon is required to precisely represent a MODIS tile
in EPSG:4326 to accurately represent itself, as
it is a 2400x2400 pixel image and two sides are curves after reprojection.
However, a polygon this dense both makes
the Item body very large and increases the computation required to perform
an intersects calculation during search,
with little benefit. These polygons can be simplified by defining a minimum
distance, in degrees, for which two points
may be collapsed into one. This is also useful when the polygon in the native
CRS has a large number of points, so
that the end result is a polygon of a more manageable size.

When using the raster_footprint functions, the ``simplify_tolerance`` parameter
defines the a minimum distance,
in degrees, for which points should be combined. This will require some
experimentation to find the appropriate
value for the CRS of your data.



1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,4 @@ More information

cli
api
footprint
2 changes: 2 additions & 0 deletions src/stactools/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ def register_plugin(registry: "Registry") -> None:
lint,
merge,
migrate,
update_geometry,
validate,
version,
)
Expand All @@ -34,6 +35,7 @@ def register_plugin(registry: "Registry") -> None:
registry.register_subcommand(merge.create_merge_command)
registry.register_subcommand(validate.create_validate_command)
registry.register_subcommand(version.create_version_command)
registry.register_subcommand(update_geometry.create_update_geometry_command)

# TODO
# registry.register_subcommand(migrate.create_migrate_command)
Expand Down
76 changes: 76 additions & 0 deletions src/stactools/cli/commands/update_geometry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
from typing import List, Optional

import click
from click import Command, Group
from pystac import Item

from stactools.core.utils import raster_footprint


def create_update_geometry_command(cli: Group) -> Command:
@cli.command(
"update-geometry", short_help="Update an item geometry from an asset footprint"
)
@click.argument("item-path")
@click.option(
"-a",
"--asset-name",
"asset_names",
multiple=True,
help=(
"The names of the assets to try for footprint extraction. "
"The first successful footprint will be used. "
"If no assets are provided, all assets will be tried until one is successful."
),
)
@click.option(
"-p",
"--precision",
type=int,
help="The number of decimal places to include in the coordinates for the "
"reprojected geometry.",
default=raster_footprint.DEFAULT_PRECISION,
)
@click.option(
"-d",
"--densification-factor",
type=int,
help="The factor by which to increase point density within the polygon.",
)
@click.option(
"-s",
"--simplify-tolerance",
type=float,
help="All points in the simplified object will be within "
"the tolerance distance of the original geometry, in degrees.",
)
@click.option(
"-n",
"--no-data",
type=int,
help="explicitly set the no data value if not in image metadata",
)
def update_geometry_command_raster_command(
item_path: str,
asset_names: List[str],
precision: int,
densification_factor: Optional[int],
simplify_tolerance: Optional[float],
no_data: Optional[int],
) -> None:
item = Item.from_file(item_path)
success = raster_footprint.update_geometry_from_asset_footprint(
item,
asset_names=asset_names,
precision=precision,
densification_factor=densification_factor,
simplify_tolerance=simplify_tolerance,
no_data=no_data,
)
if success:
item.save_object()
click.echo(f"Item geometry updated, saved to {item.get_self_href()}")
else:
click.echo("Unable to update geometry")

return update_geometry_command_raster_command
Loading