Label Extension #362
To add more ML use cases and sample data for consideration in this STAC extension proposal, here's a small sample set of ML training inputs (source imagery from UAV & building footprint labels) and outputs (raster and polygons of segmented buildings) for building segmentation in Zanzibar.
Data provided is for illustrative purposes only and I manually defined much of the metadata so any errors/inconsistencies with the official data sources or the schema are mine.
I organized the source imagery into its own collection with
Also still looking for good ideas to organize ML outputs. Currently I have
Awesome demo @daveluo! Cool to see it in action in stac browser.
It'll take some modifications to STAC Browser, but we talked about for training data just referencing a single 'master' COG and then making the bounding box meaningful, where a renderer would just render the portion of the source image that is in the bounding box.
The other cool change to make to STAC Browser is to render the geojson asset in addition to the COG. If I'm tempted to try to make that PR for STAC Browser, but my time is way too tight these days... It's nice you've got the gist link now so that anyone can click and see it. To have stac browser render it it'd probably have to be a link to the 'raw' gist https://gist.githubusercontent.com/daveluo/c743c6b0f99795336636a1b0084786b5/raw/2d28c8638a1018e5f581cb7e390f0859ac538810/znz-example-labels.json or else teach stac browser to be able to get the raw from a gist geojson.
Thanks for the example @daveluo , great seeing it up and with a STAC Browser deployed.
I think the way that you have organized things is fine If your source imagery was already represented in a STAC catalog somewhere you could skip the "imagery" collection and just point to that Item directly.
Here's another example catalog where we simply point to a record in the Sentinel STAC catalog as our source rather than having a separate catalog:
Although note those links are currently incorrect :-(, I've got to fix that.
It's a good point about having the COGs as assets in multiple Items, this does make it easier to preview them.
What we are doing with the catalog above, and I recently just pushed some changes to this PR to reflect it, is you provide an optional "rendered image" asset in the
@lossyrob brought up a good point about supporting regression, it would be nice if the extension could handle both regression and classification tasks.
So perhaps the following changes:
Thanks for the thoughts!
Also agreed with the idea of generalizing labels beyond classification. I need to think more about how generic this should be w.r.t. different ML tasks. There could be more than one label type for a task, ie. object detection with both regression (of bounding box coordinates) and classification (of object within each bbox) labels. Maybe regression and classification are two primary label types and multiple types are allowed within an item to flexibly suit the particular ML task? I'll try out some examples to see what may work well.
Updated stac4ml demo catalog & browser at https://zen-turing-2069dc.netlify.com with some new things for consideration:
I have reviewed this, but i am unable to officially add my review since I'm the one who opened the PR, so we'll need someone else to add theirs.
Just a few things:
This was referenced
Jun 18, 2019
m-mohr left a comment •
Jun 19, 2019
1 check passed
Good thought, I think this should always be
Related: I'll update the zanzibar examples to add the