Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic statistics for dimensions #5

Open
wonder-sk opened this issue Mar 2, 2023 · 7 comments
Open

Basic statistics for dimensions #5

wonder-sk opened this issue Mar 2, 2023 · 7 comments

Comments

@wonder-sk
Copy link

It would be useful to have an optional attribute with basic stats defined by this extension:

  • min/max for all dimensions
  • enumeration of distinct values (with counts) for some dimensions - e.g. classification, number of returns, return number, point source ID, edge of flight line, scan direction, scanner channel

For software that does visualization of point clouds, this is quite important for initialization of renderer settings. Without stats, one has to sample data to extract them upon load. This would be especially useful when working with a collection of items to quickly get aggregate stats of the whole collection, rather than having to touch every assets of individual items.

@wonder-sk
Copy link
Author

A relevant discussion from some time ago when COPC was being designed: copcio/copcio.github.io#19
TL;DR: we could add also more detailed stats (mean, variance, histogram) but in the end those may not be needed by the clients or the clients may have more specific needs making the additional stats useless (e.g. picking a good bucket size for GpsTime can be tricky).

To kick off some discussion I would propose this a new pc:stats attribute with this kind of content:

{
  "Intensity" : {
    "minimum": 0,
    "maximum": 12345
  },
  "GpsTime": {
    "minimum": 123456.78,
    "maximum": 123999.99
  },
  "Classification": {
    "minimum": 0,
    "maximum": 7,
    "class-count": {
      "0": 1000,
      "1": 2000,
      "3": 4000,
      "7": 8000
    }
  },
  "ReturnNumber": {
    "minimum": 1,
    "maximum": 3,
    "class-count": {
      "1": 9000,
      "2": 4000,
      "3": 2000
    }
  },
  ...
}

@wonder-sk
Copy link
Author

cc @hobu

@wonder-sk
Copy link
Author

Oops only now I have realized that there is already a Stats object defined in the extension 🤦‍♂️
It just does not include support for classes and their counts.

Other notes on the existing Stats object:

  • there is stddev and variance which are essentially the same thing - worth dropping one of those
  • count I assume is the same for all dimensions and the same value as pc:count - probably not worth including it?
  • position does not seem relevant to statistics at all
  • average + stddev (or variance) - IMHO they are not that useful and could be dropped, but no problem to keep them either

@raelwaed
Copy link

raelwaed commented Mar 5, 2023

Great post @wonder-sk - I was planning a similar post just this week. My concern is the stats object is just a dump of PDAL information without considering the value to STAC - i.e. What do people want to search for?

Many of the example stats objects provide little value, e.g. ScanDirectionFlag, EdgeOfFlightLine, Classification, UserData, etc. And within those stats objects fields like count and position are questionable.

stdev and variance are just one square root away from each other - but I think they can be left as optional.

I was planning to add a pc:classification field as a [string] of Classifications so you knew what was in a point cloud, but prefer your proposal so you can quantify how much of a particular classification exists.

  • Would consider changing these raw counts to percentages?

The number of returns is valuable, but we have lot of metadata that gives more context to the returns themselves that I would like to capture - e.g. "First and Last" would mean we have just two returns and ignored all intermediate returns, or ""4 Returns (1st, 2nd, 3rd, last)"

@m-mohr
Copy link
Contributor

m-mohr commented Mar 6, 2023

Maybe you can align with or use the Classification extension? https://github.com/stac-extensions/classification

@hobu
Copy link

hobu commented Mar 6, 2023

My concern is the stats object is just a dump of PDAL information

Indeed this was the case, and the intention was to see if we could attract usage and attention to improve the extension. Maybe now is the time. I don't think we have found the stats particularly helpful for searching, but we haven't ditched the schema stuff. That said, I think the schema stuff would probably be better expressed in arrow or regular flatbuf for reusability in other contexts.

You can see an Item collection example we write for the USGS 3DEP lidar collection at https://usgs-lidar-stac.s3-us-west-2.amazonaws.com/ept/item_collection.json

If you visit https://viewer.copc.io you can also bring any of those in and viewing by clicking on the USGS 3DEP LiDAR link and then double clicking on any name that looks interesting or filtering by simple regex.
viewer copc io-stac

@mccarthyryanc
Copy link

I like @wonder-sk suggestions on updating the stats object. Since they are all optional, perhaps it is enough to add another optional class-count?

@m-mohr, I think that extension (correct me if I'm wrong, I just learned about it) describes all possible classes. In this case we just want to summarize the classes present in a single item. So if you were working with LAZ 1.4 data, you'd put the ASPRS Class definitions into the schema, not the statistics.

To simplify this for searching, I usually just want to know if a pointcloud has any building classified points (I don't really care how many points there are.) So maybe modify @wonder-sk suggestion into something like:

    "unique-classes": {
        "title": "unique list of classifications",
            "type": "array",
            "minItems": 1,
            "items": {
                "title": "point classifications present in pointcloud",
                "type": "integer"
            }
    }

And then add unique-classes as an optional field in the stats object?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants