New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test needed for feature instance geometry within feature extents #443

Closed
ajanett opened this Issue May 22, 2018 · 13 comments

Comments

Projects
None yet
5 participants
@ajanett

ajanett commented May 22, 2018

The following test requirement is needed for feature GeoPackages:

When the geometry envelope is not empty and valid, the geometry envelope for each feature instance shall fall within the extents of the feature as expressed the feature entry in gpkg_geometry_columns.

Test Method:

  1. SELECT table_name AS tn, column_name AS cn, min_x, min_y, max_x, max_y, FROM
    gpkg_geometry_columns WHERE table_name IN (SELECT table_name FROM
    gpkg_contents WHERE data_type = 'features')
  2. Not testable if returns an empty result set
  3. For each row from step 1
    a. SELECT cn FROM tn
    b. Not testable if none found
    c For each cn value from step a
    i. If the geometry is not empty, and the envelope is not empty, and envelope values are not NaN: Fail if the envelope falls outside the extents of gpkg_geometry_columns extents.

Alternative: Add this test (3.c.i) to the end of 19.

FYI: This test is also known as NSG Requirement 19 B for features.

@jyutzler

This comment has been minimized.

Contributor

jyutzler commented Jun 13, 2018

This would be a new requirement, penciled in as R148. Do you have a reference for NSG R19b? I would prefer to align them if possible. The SWG will have to decide whether this ought to be added. This would be a minor change, so GPKG 1.3.0.

The bit at the bottom about empty geometries could also be its own requirement, penciled in as R149.

@ajanett

This comment has been minimized.

ajanett commented Jun 14, 2018

The NGA standard will be changed because it refers to the spatialite extension calls ST_MinX, etc. It also would be much better stated as a general requirement against feature data and separately against tile data.

The specific reference for this requirement is found here:

NGA.STND.0051_2.1_GEOPKG
NGA STANDARDIZATION DOCUMENT
National System for Geospatial-Intelligence (NSG) GeoPackage Encoding Standard 1.1
Interoperability Standard
(2017-08-10), Version 2.1

The text for the general requirement:
NSG Req 19: Data validity SHALL be assessed against data value constraints specified in Table 26 below using a test suite. Data validity MAY be enforced by SQL triggers.

Table 26: Data Validity Constraints, Rows 4-7,

ID# | Table Name | Column Name | Value Constraints
4 | “gpkg_contents” | “min_x” | NULL if no information available, Otherwise: If gpkg_contents.data_type = “features”, value SHALL be >= the minimum ST_MinX({geom_col}) value from gpkg_contents.table_name. If gpkg_contents data_type = “tiles”, value SHALL be >= gpkg_tile_matrix_set.min_x column value for gpkg_contents.table_name.
5 | “gpkg_contents” | “min_y” | NULL if no information available.  Otherwise: If gpkg_contents.data_type = “features”, value SHALL be >= the minimum ST_MinY({geom_col}) value from gpkg_contents.table_name. If gpkg_contents data_type = “tiles”, value SHALL be >= gpkg_tile_matrix_set.min_y column value for gpkg_contents.table_name.
6 | “gpkg_contents” | “max_x” | NULL if no information available.  Otherwise: If gpkg_contents.data_type = “features”, value SHALL be <=  the maximum ST_MaxX({geom_col}) value from gpkg_contents.table_name. If gpkg_contents data_type = “tiles”, value SHALL be  <= gpkg_tile_matrix_set.max_x column value for gpkg_contents.table_name.
7 | “gpkg_contents” | “max_y” | NULL if no information available.  Otherwise: If gpkg_contents.data_type = “features”, value SHALL be <=  the maximum ST_MaxY({geom_col}) value from gpkg_contents.table_name. If gpkg_contents data_type = “tiles”, value SHALL be  <= gpkg_tile_matrix_set.max_y column value for gpkg_contents.table_name.

@msorenson

This comment has been minimized.

msorenson commented Jun 14, 2018

Here is link to the entire NSG GeoPackage Profile we updated last year.
https://nsgreg.nga.mil/NSGDOC/files/doc/Document/NGA%20STND%200051_2.1_GEOPKG_10AUG2017_Final.docx

jyutzler added a commit to jyutzler/geopackage that referenced this issue Jun 19, 2018

@jyutzler

This comment has been minimized.

Contributor

jyutzler commented Jun 19, 2018

@ajanett How does #450 look? Does it address your issue?

@jyutzler

This comment has been minimized.

Contributor

jyutzler commented Jun 25, 2018

@ajanett The SWG asks for justification for this change. As there is some merit to keeping the extents "informative", for example to provide a default view, the SWG would like to have a clear understanding as to why this change would benefit GPKG stakeholders.

@ajanett

This comment has been minimized.

ajanett commented Jun 25, 2018

No where in the geopackage do we have an informative extents for the entire grouping of all data within the geopackage, or even all vector features within the geopackage.

The geopackage currently contains a min/max extent per feature 'type' within the table gpkg_contents. By using the term feature 'type' I am saying for example: the set of all Building Polygon features have a bounding box extent, and set of all Road Line features have a separate bounding box extent, and so on. We also know that the geometric data within each feature instance contains a bounding envelope for that particular feature instance. For example, each row within the Building Polygon feature table has a bounding envelope for that individual feature instance.

There must be some relationship between these extents. It would make no sense at all to allow the extents specified for Building Polygon to have no geometric relationship at all to the geometric extents held in the rows within Building Polygon or you could have them on opposite sides of the world and, well, what good would that do? The extents specified for the Building Polygon feature type should encompass (and that could be by a large amount, hence it would still be informative) the content held within the Building Point geometries.

In summary, the min/max extents specified within gpkg_contents for each table, must encompass the extents of the data content contained within that table. Notice I've not stated by how much. We could assign a tolerance, or just leave it out if we want to be more 'informative', and I think it is fine be less specific at this time. If we wish to constrain the requirement to the data type 'features', I can live with that, but my justification actually does apply to all geometric types that have this bounding box information, and I really believe this requirement should apply to all geometric data types in the geopackage.

@jyutzler

This comment has been minimized.

Contributor

jyutzler commented Jun 25, 2018

No where in the geopackage do we have an informative extents for the entire grouping of all data within the geopackage, or even all vector features within the geopackage.

The Metadata extension could be used for that purpose.

There must be some relationship between these extents. It would make no sense at all...

I agree that this is not logical, but it isn't our job as a SWG to officiate logic. How does it benefit the stakeholders to prohibit this practice? How is interoperability improved? How is the user experience improved? These are the kinds of questions we need to be able to answer before the SWG will agree to this requirement.

@ajanett

This comment has been minimized.

ajanett commented Jun 25, 2018

Requirement 13 states:
A GeoPackage file SHALL include a gpkg_contents table per table Contents Table Definition and gpkg_contents Table Definition SQL.
Table 4. Contents Table Definition
min_x DOUBLE Bounding box minimum easting or longitude for all content in table_name
min_y DOUBLE Bounding box minimum northing or latitude for all content in table_name
max_x DOUBLE Bounding box maximum easting or longitude for all content in table_name
max_y DOUBLE Bounding box maximum northing or latitude for all content in table_name

This request is asking for a test to ensure that when the min_x, min_y, max_x, max_y values are specified in gpkg_contents table, that they do actually represent "the bounding box " ... " for all content in table_name" as per the current geopackage specification.

@ajanett

This comment has been minimized.

ajanett commented Jun 26, 2018

I do not see this issue as a new requirement. It is already a requirement in the specification. It just does not have an associated test. However, given it is a requirement under #13 and implied by a table under 13, as we've seen in other issues it may be better to pull those requirements out explicitly. I thought that is what was being done with this particular issue.

As for how this aids stakeholders, interoperability, etc., here are some thoughts on that:
The Geopackage Contents table expresses at a high level what is in the geopackage in a form that is quick and easy to digest. The users of the geopackage don't have to perform a deep examination of the actual data in the geopackage but can easily examine the contents table, and see if the data in the geopackage might work for their need in their specific geometric area (goes to interoperability). If the geopackage content table misrepresents the actual geopackage information; for example, if the extents do not accurately reflect the data content (and yes, I've actually seen many geopackages that do NOT correctly state the extents in the geopackage content), then stakeholders will become discouraged by the lack of quality and they will assume the rest of the geopackage is of similar poor quality and accuracy. Consider how we feel when using a technical document, and then we find the table of contents has incorrect page numbers, or references simply do not exist.

Please also recall that vector feature geometry is represented using a BLOB, so if the geopackage contents do not accurately state the extents for the feature data, users (stakeholders) have to do a significant amount of work to try and figure out what the spatial extents actually are for a particular feature table. Currently, with no extensions enabled, the geopackage contents table is the only place where feature data geometric extents may be specified clearly in a form that is easily queried by stakeholders.

This takes us back to the issue – the geometric extents in the geopackage contents need to be accurate; and I've seen evidence in plenty of geopackages that they are often not accurate. Hence, the ETS needs to test for that.

My goal with this request is we add the test corresponding to the specification, and ensure that those geometric extents found in the geopackage contents accurately reflect the underlying data so that geodatabase consumers can make use of the geopackage contents knowing that the contents really do state the geospatial extents of the data within the geopackage.

@jratike80

This comment has been minimized.

jratike80 commented Jun 28, 2018

Have you considered that people edit data in GeoPackage and during edits at least momentarily aggregated extents of features are not always the same as in "gpkg_contents"?

  • If you insert new data outside the extent that is reported in gpkg_contents, should it trigger re-calculation of exact extents?
  • If you delete a feature should it trigger a test if the feature was at the edge of the extent envelope and start re-calculation of exact extent if needed?

At least it does not feel right to make a constraint that prevents inserting data beyond the limits set in "gpkg_contents". That is how the SDO_DIM_ELEMENTs work in Oracle Spatial and for practical reasons we initialize all new tables to have for sure big enough extents in SDO_GEOM_METADATA so that inserts that we are likely to do in the future will be accepted.

I think that bounding box in gpkg_contents can be informative and it should be OK to advertise bigger than real bounding box (this table will contain/contains data from Finland even at the moment it is empty/has only one point). Perhaps there is no advantage in allowing the bounding box to be too small. What to think about data accidentally inserted around Null Island?

Actually it does not require huge effort to read the exact extents of a table starting from the blobs because rtree_index extension is not uncommon and that defines ST_MinX, ST_MinY, ST_MaxX, and ST_MaxY. So they are not SpatiaLite functions. For exact bounds just four selects like "SELECT min(ST_MinX(geom))" are needed.

@jyutzler

This comment has been minimized.

Contributor

jyutzler commented Jun 28, 2018

Note that we have not committed to taking action here. This issue was raised and we are working through it.

Have you considered that people edit data in GeoPackage and during edits at least momentarily aggregated extents of features are not always the same as in "gpkg_contents"?

Yes. We don't really care if the GeoPackage is in an invalid state temporarily. It only matters when we are interfacing between disparate systems or subsystems where interoperability is needed.

If you insert new data outside the extent that is reported in gpkg_contents, should it trigger re-calculation of exact extents?

No. This is too expensive an operation if there are multiple inserts. It should be an on-demand thing. It wouldn't be a bad idea for us to provide a method to do this but we do not want to do it with a trigger.

If you delete a feature should it trigger a test if the feature was at the edge of the extent envelope and start re-calculation of exact extent if needed?

No, this is even more expensive because it requires a full-table scan even to update the index.

@davidstrategicaci

This comment has been minimized.

davidstrategicaci commented Nov 5, 2018

I agree with @jyutzler. The contents bounding box should informative only. There are cases where some GISs dont use this field set extents with in the viewer for feature layers. The Mandate of this field and its intended use should be left to profiles of the base standard, where their user communities have more say in the specific utility and interoperability in the systems that employ GPKG (i.e. The NSG profile). Subsequently, the responsibility to assert compliance relies on the profile owner's to provide acceptable guidance and test procedures.

@jyutzler

This comment has been minimized.

Contributor

jyutzler commented Dec 3, 2018

During today's SWG meeting, we decided that no action was required here and that the existing requirements cover what they need to cover.

@jyutzler jyutzler closed this Dec 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment