Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect bounds and shape when stacking 3DEP items from Planetary Computer #147

Closed
aazuspan opened this issue Apr 27, 2022 · 12 comments
Closed

Comments

@aazuspan
Copy link
Contributor

aazuspan commented Apr 27, 2022

Hi @gjoseph92, I'm using stackstac.stack with two adjacent COGs from the Planetary Computer 3dep-seamless collection, and the output bounds and shape are incorrect. If I load them manually with xarray and concat them, the bounds are correct.

Reproducing

import planetary_computer
from pystac_client import Client
import stackstac
import xarray as xr

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
collection = "3dep-seamless"
# This bbox spans two items that are adjacent in x dimension
bbox = [-124.210979, 43.336502, -123.657496, 43.798309]

search = catalog.search(
    collections=[collection],
    bbox=bbox,
    # 3DEP contains both 10m and 30m items
    filter={"eq": [{"property": "gsd"}, 30]},
)

signed = planetary_computer.sign(search)

# Stack the two adjacent items with stackstac (doesn't work as expected)
da_stackstac = stackstac.stack(signed, assets=["data"])

# Stack the two items manually by opening and concatenating with xarray (works as expected)
da_list = [xr.open_rasterio(item.assets["data"].href, chunks=1024) for item in signed]
da_xarray = xr.concat(da_list, dim="time")

Issues

The bounds for the two arrays are much different and the stackstac version doesn't cover the the search bbox:

# Bounds don't cover the bbox: (-125.00167, 43.965540000000004, -123.96554, 44.001670000000004)
print(stackstac.array_bounds(da_stackstac))

# Bounds do cover the bbox: (-125.00152777777778, 42.99819444444364, -122.99819444444364, 44.001527777777774)
print(stackstac.array_bounds(da_xarray))

The array shapes (ignoring the time and band dims) are also much different:

# (3613, 103613)
print(da_stackstac.shape[2:])

# (3612, 7224)
print(da_xarray.shape[2:])

Checking the proj:shape property for the items indicates their shape should be 3612 x 3612, so it looks like the stackstac version is off by 1 in the y dimension and 100k in the x dimension.

Any thoughts on what's going on here? Thanks!

For reference, I'm running stackstac=0.4.1 and xarray=2022.3.0.

EDIT: This looked like it might be related to #132, but the proj:transform of the items seem to be in the correct order:

>> signed[0].properties

{'gsd': 30,
 'datetime': '2013-01-01T00:00:00Z',
 'proj:epsg': 5498,
 'proj:shape': [3612, 3612],
 'end_datetime': '2013-11-01T00:00:00Z',
 'proj:transform': [1e-05,
  0.0,
  -125.0016666667,
  0.0,
  -1e-05,
  44.00166666666,
  0.0,
  0.0,
  1.0],
 'start_datetime': '1999-02-01T00:00:00Z',
 'threedep:region': 'n40w130'}
@gjoseph92
Copy link
Owner

Thanks for reporting @aazuspan. I haven't looked into this yet, but I think there's enough information here to figure it out.

FYI, you might want to look at the thread linked in #112. It's reasonable to expect that stackstac's bounds may differ slightly from what you get if you open the GeoTIFFs directly, because we're working with STAC metadata, which could be inaccurate (like in #132). However, this is such a significant mismatch, I'd guess something else is going on here.

I probably won't be able to look into this for a couple weeks. Would you or anyone else (@TomAugspurger @scottyhq?) be interested in taking a look in the meantime?

@aazuspan
Copy link
Contributor Author

Gotcha, thanks for the reference @gjoseph92. So stackstac array bounds and shapes are estimates based on STAC metadata and won't necessarily match file bounds exactly--makes sense. The difference in y is negligible then, but the huge x error probably indicates a) incorrect item metadata or b) correct metadata somehow being misinterpreted.

I'll take a look and see if I can track down a problem, but any insights from folks who are more familiar with stackstac or STAC are definitely welcome.

@TomAugspurger
Copy link
Contributor

are estimates based on STAC metadata and won't necessarily match file bounds exactly

I think they should match exactly. But it's possible there's a mismatch. Are you able look at the gdalinfo of the asset and compare it to the STAC metadata?

@aazuspan
Copy link
Contributor Author

aazuspan commented May 1, 2022

Looks like there's a pixel size discrepancy causing the issue: 0.00001 according to STAC vs. 0.000277 according to GDAL.

STAC metatdata
{'type': 'Feature',
 'stac_version': '1.0.0',
 'id': 'n44w125-1',
 'properties': {'gsd': 30,
  'datetime': '2013-01-01T00:00:00Z',
  'proj:epsg': 5498,
  'proj:shape': [3612, 3612],
  'end_datetime': '2013-11-01T00:00:00Z',
  'proj:transform': [1e-05,
   0.0,
   -125.0016666667,
   0.0,
   -1e-05,
   44.00166666666,
   0.0,
   0.0,
   1.0],
  'start_datetime': '1999-02-01T00:00:00Z',
  'threedep:region': 'n40w130'},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-123.99833333, 42.99833333],
    [-123.99833333, 44.00166667],
    [-125.00166667, 44.00166667],
    [-125.00166667, 42.99833333],
    [-123.99833333, 42.99833333]]]},
 'links': [{'rel': 'collection',
   'href': 'https://planetarycomputer.microsoft.com/api/stac/v1/collections/3dep-seamless',
   'type': 'application/json'},
  {'rel': 'parent',
   'href': 'https://planetarycomputer.microsoft.com/api/stac/v1/collections/3dep-seamless',
   'type': 'application/json'},
  {'rel': <RelType.ROOT: 'root'>,
   'href': 'https://planetarycomputer.microsoft.com/api/stac/v1',
   'type': <MediaType.JSON: 'application/json'>,
   'title': 'Microsoft Planetary Computer STAC API'},
  {'rel': 'self',
   'href': 'https://planetarycomputer.microsoft.com/api/stac/v1/collections/3dep-seamless/items/n44w125-1',
   'type': 'application/geo+json'},
  {'rel': 'via',
   'href': 'https://ai4edataeuwest.blob.core.windows.net/3dep/Elevation/1/TIFF/n44w125/USGS_1_n44w125.xml'},
  {'rel': 'preview',
   'href': 'https://planetarycomputer.microsoft.com/api/data/v1/item/map?collection=3dep-seamless&item=n44w125-1',
   'type': 'text/html',
   'title': 'Map of item'}],
 'assets': {'data': {'href': 'https://ai4edataeuwest.blob.core.windows.net/3dep/Elevation/1/TIFF/n44w125/USGS_1_n44w125.tif?st=2022-04-29T23%3A13%3A27Z&se=2022-04-30T23%3A58%3A27Z&sp=rl&sv=2020-06-12&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2022-04-30T19%3A43%3A28Z&ske=2022-05-07T19%3A43%3A28Z&sks=b&skv=2020-06-12&sig=AgVxx/WNq/thlRHe9wO3ixVxXCWnk5ZxXtMwynWPaVY%3D',
   'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
   'title': 'USGS 1 arc-second n44w125 1 x 1 degree',
   'description': 'This tile of the 3D Elevation Program (3DEP) seamless products is 1 arc-second resolution.  3DEP data serve as the elevation layer of The National Map, and provide basic elevation information for Earth science studies and mapping applications in the United States. Scientists and resource managers use 3DEP data for global change research, hydrologic modeling, resource monitoring, mapping and visualization, and many other applications. 3DEP data compose an elevation dataset that consists of seamless layers and a high resolution layer.  Each of these layers consists of the best available raster elevation data of the conterminous United States, Alaska, Hawaii, territorial islands, Mexico and Canada.  3DEP data are updated continually as new data become available. Seamless 3DEP data are derived from diverse source data that are processed to a common coordinate system and unit of vertical measure. These data are distributed in geographic coordinates in units of decimal degrees, and in conformance with the North American Datum of 1983 (NAD 83). All elevation values are in meters and, over the conterminous United States, are referenced to the North American Vertical Datum of 1988 (NAVD 88). The vertical reference will vary in other areas. Seamless 3DEP data are available nationally (except for Alaska) at resolutions of 1 arc-second (approximately 30 meters) and 1/3 arc-second (approximately 10 meters). In most of Alaska, only lower resolution source data are available. As a result, most seamless 3DEP data for Alaska are at 2 arc-second (approximately 60 meters) grid spacing. Part of Alaska is available at the 1 and 1/3 arc-second resolutions from interferometric synthetic aperture radar (ifsar) collections starting in 2010.  Plans are in place for collection of statewide ifsar in Alaska. All 3DEP products are public domain.',
   'roles': ['data']},
  'gpkg': {'href': 'https://ai4edataeuwest.blob.core.windows.net/3dep/Elevation/1/TIFF/n44w125/n44w125.gpkg?st=2022-04-29T23%3A13%3A27Z&se=2022-04-30T23%3A58%3A27Z&sp=rl&sv=2020-06-12&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2022-04-30T19%3A43%3A28Z&ske=2022-05-07T19%3A43%3A28Z&sks=b&skv=2020-06-12&sig=AgVxx/WNq/thlRHe9wO3ixVxXCWnk5ZxXtMwynWPaVY%3D',
   'type': 'application/geopackage+sqlite3',
   'roles': ['metadata']},
  'metadata': {'href': 'https://ai4edataeuwest.blob.core.windows.net/3dep/Elevation/1/TIFF/n44w125/USGS_1_n44w125.xml?st=2022-04-29T23%3A13%3A27Z&se=2022-04-30T23%3A58%3A27Z&sp=rl&sv=2020-06-12&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2022-04-30T19%3A43%3A28Z&ske=2022-05-07T19%3A43%3A28Z&sks=b&skv=2020-06-12&sig=AgVxx/WNq/thlRHe9wO3ixVxXCWnk5ZxXtMwynWPaVY%3D',
   'type': 'application/xml',
   'roles': ['metadata']},
  'thumbnail': {'href': 'https://ai4edataeuwest.blob.core.windows.net/3dep/Elevation/1/TIFF/n44w125/USGS_1_n44w125.jpg?st=2022-04-29T23%3A13%3A27Z&se=2022-04-30T23%3A58%3A27Z&sp=rl&sv=2020-06-12&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2022-04-30T19%3A43%3A28Z&ske=2022-05-07T19%3A43%3A28Z&sks=b&skv=2020-06-12&sig=AgVxx/WNq/thlRHe9wO3ixVxXCWnk5ZxXtMwynWPaVY%3D',
   'type': 'image/jpeg',
   'roles': ['thumbnail']},
  'tilejson': {'href': 'https://planetarycomputer.microsoft.com/api/data/v1/item/tilejson.json?collection=3dep-seamless&item=n44w125-1&assets=data&colormap_name=terrain&rescale=-1000,4000',
   'type': 'application/json',
   'title': 'TileJSON with default rendering',
   'roles': ['tiles']},
  'rendered_preview': {'href': 'https://planetarycomputer.microsoft.com/api/data/v1/item/preview.png?collection=3dep-seamless&item=n44w125-1&assets=data&colormap_name=terrain&rescale=-1000,4000',
   'type': 'image/png',
   'title': 'Rendered preview',
   'rel': 'preview',
   'roles': ['overview']}},
 'bbox': [-125.0016666667, 42.99833333333, -123.9983333334, 44.00166666666],
 'stac_extensions': ['https://stac-extensions.github.io/projection/v1.0.0/schema.json'],
 'collection': '3dep-seamless'}
gdalinfo
Driver: GTiff/GeoTIFF
Files: USGS_1_n44w125.tif
Size is 3612, 3612
Coordinate System is:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101004,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]
Data axis to CRS axis mapping: 2,1
Origin = (-125.001666666666665,44.001666666666665)
Pixel Size = (0.000277777777778,-0.000277777777778)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
  PREDICTOR=3
Corner Coordinates:
Upper Left  (-125.0016667,  44.0016667) (125d 0' 6.00"W, 44d 0' 6.00"N)
Lower Left  (-125.0016667,  42.9983333) (125d 0' 6.00"W, 42d59'54.00"N)
Upper Right (-123.9983333,  44.0016667) (123d59'54.00"W, 44d 0' 6.00"N)
Lower Right (-123.9983333,  42.9983333) (123d59'54.00"W, 42d59'54.00"N)
Center      (-124.5000000,  43.5000000) (124d30' 0.00"W, 43d30' 0.00"N)
Band 1 Block=256x256 Type=Float32, ColorInterp=Gray
  Description = Layer_1
  Min=-5.715 Max=517.677 
  Minimum=-5.715, Maximum=517.677, Mean=27.307, StdDev=64.316
  NoData Value=-999999
  Overviews: 1806x1806, 903x903, 452x452, 226x226, 113x113
  Metadata:
    LAYER_TYPE=athematic
    STATISTICS_MAXIMUM=517.67651367188
    STATISTICS_MEAN=27.306801648412
    STATISTICS_MINIMUM=-5.7150769233704
    STATISTICS_STDDEV=64.31571761273
    STATISTICS_VALID_PERCENT=99.83

Manually editing the proj:transform with the correct pixel size fixes the x bounds and shape being wildly off. The off-by-one discrepancy in the shape arises from geom_utils.snapped_bounds rounding min coords down and max coords up, which can be disabled with snap_bounds=False. Looks like that's as-designed, @gjoseph92?

@TomAugspurger, want me to open an issue on Planetary Computer for the 3dep STAC transform issue? It seems to affect 10m items in that collection as well.

@TomAugspurger
Copy link
Contributor

Thanks. I opened stactools-packages/threedep#9, which is what we used to generate the STAC metadata. We'll look into it and see what's up.

@gjoseph92
Copy link
Owner

The off-by-one discrepancy in the shape arises from geom_utils.snapped_bounds rounding min coords down and max coords up, which can be disabled with snap_bounds=False. Looks like that's as-designed, @gjoseph92?

Sorry I missed this—yes, that's exactly the point of snapped_bounds. I feel like snap_bounds=True is usually what you want, curious if you agree or if that's a bad default.

So sounds like the takeaway is that STAC metadata was off, leading to unexpected results? Nothing to do on the stackstac end?

@aazuspan I'd be curious to confirm—even though the stackstac array was a wildly different shape than you expected, was the data still reasonable? I assume that if you specified resolution= 0.000277 (and maybe bounds too?), you'd get what you wanted?

@aazuspan
Copy link
Contributor Author

Thanks for following up! Yeah, STAC metadata plus my confusion about snap_bounds was the problem, not stackstac. It looks like the STAC for those items hasn't been updated yet, but manually setting the correct proj:transform before stacking confirmed that it should work once that's up. I wasn't able to get usable data without setting the proj:transform. Specifying resolution and bounds gave the correct array shape, but the data was ~95% nans. I can post reproducible examples of what I was doing if you're interested.

I do think defaulting to snap_bounds=True can lead to surprising behavior with the array shape and bounds not matching STAC/GDAL. Is there a downside to leaving fractional offsets? My preference would be to return the data generally "as-is", without snapping by default, unless there was a reason that might cause problems.

@gjoseph92
Copy link
Owner

Specifying resolution and bounds gave the correct array shape, but the data was ~95% nans

Ah. I wouldn't be surprised if stackstac thought that certain chunks were entirely out of bounds for a certain asset, so it skipped opening and reading them entirely. I bet if you increased the spatial chunksize a ton (like maybe even to -1), you'd get the data you expect, but what's the point of that?

Is there a downside to leaving fractional offsets? My preference would be to return the data generally "as-is"

Especially for users newer to geospatial, not dealing with fractional offsets seemed more approachable and intuitive to me. And in many cases, data "is-is" goes out the window when you're stacking—you usually are warping most assets to a common grid and bounding box anyway, at least for sensors like landsat/sentinel/etc. So wherever there's been a tradeoff between usability and preservation of original data, I've gone for usability, since stackstac is less a tool for retrieving original data than for combining lots of pieces of original data together to get them into a format that's useful to you for downstream analysis.

@aazuspan
Copy link
Contributor Author

I bet if you increased the spatial chunksize a ton (like maybe even to -1), you'd get the data you expect

Yep, chunksize=-1 plus specifying bounds and resolution did the trick. Good to know there's a workaround for incorrect STAC metadata, without having to manually edit it.

I've gone for usability, since stackstac is less a tool for retrieving original data than for combining lots of pieces of original data together

Totally reasonable. I was thinking in the context of nice, evenly-spaced 3dep tiles, but snapping definitely makes more sense when you have irregular, overlapping scenes like Sentinel.

Thanks for your help, and for building stackstac! I'm ready to close this, unless you'd rather keep it open in case someone else runs into it before the STAC is fixed.

@gjoseph92
Copy link
Owner

Looks like stactools-packages/threedep#9 is closed, but @TomAugspurger are we still waiting for the data to be reprocessed? If that will happen in the next week or so, let's leave it open, otherwise we can close.

@TomAugspurger
Copy link
Contributor

Yeah, the items need to be regenerated with the patched stactools and re-ingested. No ETA on when that'll happen right now.

@gjoseph92
Copy link
Owner

I've opened and pinned #152 for these sorts of issues, so hopefully others running into the same thing, or similar things in the future, will find that. So I'm closing this specific one now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants