Geo "profile" for Data Package #86

rufuspollock · 2013-12-28T21:46:40Z

Similar to Simple Data Format Data Package "profile" for tabular data - that is we just leverage the base datapackage.json but constraint the types of data resources you can ship.

Options for data formats:

geojson (most likely)
sqlite
geocsv ...
shapefile (probably not)

Could allow a couple of options but prefer to fix on one.

pvgenuchten · 2014-02-04T15:03:17Z

Things to store in a geo-enabled datapackage would typically be:

format of the file geojson/topojson/geocsv/kml/gml/geotiff/geojpeg
field(s) of type geometry (or 2 fields having lat-lon)
format of the geometry (json/wkt/wkb/gml/...)
projection (epsg:4326...)
bounds of the dataset
type of geometry (point/polyline/polygon/multipolygon/multiple -> this can probably be deduced from wkt, however you'll never know all fields will contain point, if top 5 have a point)

Note that sqlite potentially stores many tables, each table might require a datapackage.json
(maybe have a look at geopackage specification, that introduces a metadata-table within the database; https://github.com/opengis/geopackage)

Note that some datasets will not be flat tables, managing a complex schema is probably managed in another issue

rufuspollock · 2014-02-04T18:57:19Z

@pvgenuchten really useful suggestions. I think our aim here should be to try and be as minimalistic as possible compatible with being useful to a reasonable number of people. Kind of 80/20 but even stronger.

So my question would be: what about essentially metadata do you need to e.g. import geojson usefully into something else. If the answer is none that would be great but i'm imagining the projection might be important. cf also here #81 (geo csv).

rufuspollock · 2014-06-07T11:52:23Z

Strongly inclining to going with a recommendation of geojson and format geojson in the resource.

See also in progress recommendation at http://data.okfn.org/doc/publish-geodata

/cc @peterdesmet @jalbertbowden

peterdesmet · 2014-06-08T11:44:25Z

Moving conversation regarding describing properties of a geo data package here. In reply to question by @rgrp regarding this:

could you advise what your use case is for describing the properties, for example will you be processing the data in some way that requires you to know the types of the property fields?

No, I don't have plans to process the data myself, I would just like to provide good metadata for the properties/fields, such as a description, or unit, or type. Example:

{
    "name": "code",
    "description": "Belgian traffic sign code.",
    "web": "http://wiki.openstreetmap.org/wiki/Road_signs_in_Belgium",
    "type": "string"
},

In the Tabular data format one can do this in "schema": { "fields": [] } of the datapackage.json, which I find very useful. Geojson is new to me: maybe it's possible to add metadata about the properties in the geojson file itself, but quite like it in the datapackage.json. I am trying to figure out the recommended way to do this.

rufuspollock · 2015-09-24T14:36:41Z

@dr-shorthair any thoughts here about schemas for properties on features.

dr-shorthair · 2015-09-25T07:14:36Z

Coming late to this conversation. @pvgenuchten seems to have good handle on the issues.

Principle dilemma is that, while 2 columns (x,y) looks like obvious solution for points, it begs a lot of questions, particularly the key issue that coordinates are not independent of each and shouldn't be managed or processed independently. So a micro-syntax is preferred which binds them together (as is already done for time in the 8601/xsd 7-component string). Then the options are essentially GeoJSON or WKT. The former has the advantage of software support, but a significant limitation regarding non-2D geometries, and essentially non-existent support for coordinate reference systems. WKT is better on those issues, but is a very niche product! Both support various geometry types, labelled in the data. WKT allows a CRS to be referenced in the data. However, the GeoJSON CRS limitation may not be such a problem in this context, since you would only be using the GeoJSON geometry object so could carry the CRS reference separately, but then we could hit the coordinate-order issue*. Would also have to extend GeoJSON for solid geometries if required.

standard CRS definitions also prescribe the coordinate order. There is a historical convention, which is respected by the standard CRS definitions such as epsg:4326, that geographic coordinates are expressed in lat-lon order (i.e. y,x) while projected systems are generally (x,y). GeoJSON has a rule that, regardless of what the CRS says, the coordinate order is always (x,y). This may seem trivial, but there are many many examples of how things can go wrong because of mistaken assumptions.

pwalsh · 2017-05-29T11:30:42Z

from @danfowler

weecology/retriever#797 @henrykironde

Stephen-Gates · 2017-07-05T07:51:54Z

I've started a guide on point data in CSVs. Your feedback is very welcome. It touches on some of the issues raised above (CRS, axis-order). Other geometry in CSV's makes less sense to me but happy to write about that also.

Edit: Now published Point location data in CSV files

henrykironde · 2017-07-31T06:07:37Z

The Spatial Data Package specification:

This proposal provides specifications for the Spatial Data Package. The proposed specifications are an extension of the Data package specification created by Frictionless Data. The current status of the Data package specification cover tabular data (Tabular Data Package). The Tabular Data Package provides a platform to standardize and organize data making sharing among tools and people effortless.

Relationship between a Tabular Data package and a spatial Data package

Unlike Spatial Data, Tabular data is simply text data separated by special delimiters(comma, tab and etc..) in a text file. Spatial data occurs in various forms of complex data structures often associated with the file extension.

Spatial data Categories

Spatial data is categorized into two groups, raster data and vector data. In the vector data model, geographical elements are represented using points, lines and polygons. Vector data captures and represents discrete objects with boundaries(Lakes, Rivers. roads and etc..).

The Raster data model is used to store data element using pixels or cells . The value of these cells captures the type of object or entity that is observed. A good example is a digital photograph, the pixels in the photo store a color that corresponds to the real world object at that point. Rasters can store discrete data, for example thematic information of land cover and continuous data for example chemical concentrations(Carbon Dioxide, Nitrates).

Vector Data Specifications

The specifications inherit the data package specifications like

Recommended Properties

name
id
licenses
profile

Optional Properties

title
description
homepage
version
sources
contributors
keywords
image
created

{
 #required
  "name": "name of the data",
  "title": "human readable label or title for the dataset",
  "gis_class": "Raster data or vector data",
  "file_type": "extension of format of the dataset",
  "description": "A good description for the dataset",
  "license": "A license",
  "keywords": ["rivers", "North America",], "keywords separated by comma" 
  "citation": "citation for the dataset",
  "spatial_ref": "Coordinate Reference System"
  "citation": "A good description for the dataset",
  "[path or url]":"path to the file"
  "resources": [
      #For each layer, give a name and the properties 
      #layer one
      { 
        "name": "Name for the layer eg.river",
        "Geometry_type": "point, linestring,....", "geometry_notation": 
        "NoDataValue": "what represents missing values",
        # define attribute data and type for each vector feature
        "schema": { 
          "fields": [
            {
              "name": "data name",
              "type": "data type"
            },
            {
              "name": "data name",
              "type": "data type"
            },
            {...}
          ],
        }
      },
      #layer two
      {....},
      #layer three
      {..}
}

Rasters

Like the vector data specifications, raster data specifications inherit the core components of the data package specifications. Rasters can have multiple nested datasets within a file, however the Json schema take on a similar structure like the vector data schema

The data package

Json schema example

{
    #required
    "name": "name of the data",
    "title": "human readable label or title for the dataset",
    "format": "extension of format of the dataset or  driver required",
    "file_size": "size of file on disk",
    "group_count": "Number of groups in the dataset if applicable"
    "dataset_count": "The number of individual datasets"
    "description": "A good description for the dataset",
    "license": "A license",
    "keywords": ["carbon map", "North America",], "keywords separated by comma" 
    "citation": "citation for the dataset",
    "version": "The version of the dataset"
    "homepage": "The home page of the data"
    "datum": "Coordinate Reference System",

  ""
  "[url or path]": "link to where the data is stored"
  #each band is defined
  "resources": [
    {
      "Group": "Name for the group if applicable",
      "name": "Name for the band",
      "relative_path": "Location relative to route path/url above",
      "resolution": "The resolution",
      "resolution_units": "The units of resolution",
      "dimensions": "dimensions",
      "noDataValue": "pixels where data is missing or no data collected",
      "geoTransform": "The transformation of the dataset",
      "parameter": "The parameter or feature",
      "extent": ["the extent values of the band"],
    },
    { ...},
  ]
}

henrykironde · 2017-07-31T06:10:59Z

Thanks @Stephen-Gates for comments in #499. Could you transfer them to his issue.

Stephen-Gates · 2017-07-31T08:02:11Z

Thanks for this Henry.

I think a worked example using real data would help to clearly separate what's needed in a :

spatial data package - similar to tabular data package E.g.
- Each resource MUST be a Spatial Data Resource
- or could a mix of Spatial and Tabular data be in a package?
- should spatial and temporal extent be described at this level or for each resource?
spatial data resource - similar to tabular data resource. E.g.
- the spatial reference system must be included.
- the supported file types (GeoJSON, GML, etc)
- would a CSV with point data be a valid resource?
"layer schema" - similar to table schema

Thanks for starting the conversation.

henrykironde · 2017-08-01T16:28:50Z

@Stephen-Gates, Thanks for the suggestion, I will get some sample data to annotate as examples.

loleg · 2022-01-21T09:27:14Z

This is being further developed, and feedback is very welcome in the issues, at https://github.com/cividi/spatial-data-package

rufuspollock · 2022-01-24T10:31:17Z

@loleg that's great ... could you provide a brief summary of state and plans here?

n0rdlicht · 2022-02-01T21:27:35Z

Hi @rufuspollock, thanks for checking in.

Very happy to get some feedback on https://github.com/cividi/spatial-data-package#detailed-data-package-structure. A proof of concept viewer is implemented in dfour, deployed for example for simple web publication of client projects with gemeindescan.ch, as a self publishing for events, like sandbox.dfour.space or [campusbochum.de] to public participation, like (https://beteiligung.campusbochum.de/de/SDY4F/0N2AQB/).

Pros

no dependency on a specific library or implementation -> independent of renderer, e.g. simple styles spec supported in many map libraries and tools (e.g. geojson.io, GitHub Previews, ...)
styles "baked in" -> curated snapshot, human readable, no interpretation needed

Cons

requires extra tooling to create styles: hard to update or change style, e.g. we wrote a special QGIS Plugin
style not declarative/rule based -> no support for complex style definitions (e.g. zoom based)
currently requires/only supports (inline) geojson -> no support for tabular data, e.g. CSV(T) or other frictionless compliant geo data

Potential options

Separate data and style definition, e.g. similar to Vega-Lite, but an abstraction of mapbox-gl styles
Vega-Lite geo

peterdesmet mentioned this issue Jun 8, 2014

GeoJSON data package example frictionlessdata/frictionlessdata.io#64

Closed

3 tasks

peterdesmet mentioned this issue Jun 8, 2014

Add 2 geojson geo data packages frictionlessdata/frictionlessdata.io#114

Merged

jpmckinney added the Data Package label Feb 3, 2015

lexman mentioned this issue May 17, 2016

Fields in descriptor don't match properties on features datasets/geo-countries#3

Closed

roll added the backlog label Aug 8, 2016

roll removed the backlog label Aug 29, 2016

pwalsh modified the milestone: Backlog Feb 5, 2017

pwalsh mentioned this issue Feb 5, 2017

[New] GeoCSV / Geo CSV #81

Closed

rufuspollock mentioned this issue Jul 31, 2017

Spatial Data Package specification proposal #499

Closed

henrykironde mentioned this issue Nov 10, 2017

Develop Geo data specification protocol frictionlessdata/frictionlessdata.io#359

Closed

serahkiburu mentioned this issue Nov 27, 2017

Develop Geo data specification protocol #545

Closed

roll added this to Specifications in Frictionless General Mar 19, 2019

This was referenced Feb 16, 2022

A proposal to add geographic specifications in TableSchema #771

Closed

Add wkt to field types #772

Open

roll removed this from the Backlog milestone Apr 14, 2023

frictionlessdata locked and limited conversation to collaborators Apr 12, 2024

roll converted this issue into discussion #906 Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Geo "profile" for Data Package #86

Geo "profile" for Data Package #86

rufuspollock commented Dec 28, 2013

pvgenuchten commented Feb 4, 2014

rufuspollock commented Feb 4, 2014

rufuspollock commented Jun 7, 2014

peterdesmet commented Jun 8, 2014

rufuspollock commented Sep 24, 2015

dr-shorthair commented Sep 25, 2015

pwalsh commented May 29, 2017

Stephen-Gates commented Jul 5, 2017 •

edited

henrykironde commented Jul 31, 2017 •

edited

henrykironde commented Jul 31, 2017

Stephen-Gates commented Jul 31, 2017

henrykironde commented Aug 1, 2017

loleg commented Jan 21, 2022

rufuspollock commented Jan 24, 2022

n0rdlicht commented Feb 1, 2022

This issue was moved to a discussion.

This issue was moved to a discussion.

Geo "profile" for Data Package #86

Geo "profile" for Data Package #86

Comments

rufuspollock commented Dec 28, 2013

pvgenuchten commented Feb 4, 2014

rufuspollock commented Feb 4, 2014

rufuspollock commented Jun 7, 2014

peterdesmet commented Jun 8, 2014

rufuspollock commented Sep 24, 2015

dr-shorthair commented Sep 25, 2015

pwalsh commented May 29, 2017

Stephen-Gates commented Jul 5, 2017 • edited

henrykironde commented Jul 31, 2017 • edited

The Spatial Data Package specification:

Relationship between a Tabular Data package and a spatial Data package

Vector Data Specifications

The data package

henrykironde commented Jul 31, 2017

Stephen-Gates commented Jul 31, 2017

henrykironde commented Aug 1, 2017

loleg commented Jan 21, 2022

rufuspollock commented Jan 24, 2022

n0rdlicht commented Feb 1, 2022

Pros

Cons

Potential options

This issue was moved to a discussion.

Stephen-Gates commented Jul 5, 2017 •

edited

henrykironde commented Jul 31, 2017 •

edited