Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geo "profile" for Data Package #86

Closed
rufuspollock opened this issue Dec 28, 2013 · 15 comments
Closed

Geo "profile" for Data Package #86

rufuspollock opened this issue Dec 28, 2013 · 15 comments

Comments

@rufuspollock
Copy link
Contributor

Similar to Simple Data Format Data Package "profile" for tabular data - that is we just leverage the base datapackage.json but constraint the types of data resources you can ship.

Options for data formats:

  • geojson (most likely)
  • sqlite
  • geocsv ...
  • shapefile (probably not)

Could allow a couple of options but prefer to fix on one.

@pvgenuchten
Copy link
Contributor

Things to store in a geo-enabled datapackage would typically be:

  • format of the file geojson/topojson/geocsv/kml/gml/geotiff/geojpeg
  • field(s) of type geometry (or 2 fields having lat-lon)
  • format of the geometry (json/wkt/wkb/gml/...)
  • projection (epsg:4326...)
  • bounds of the dataset
  • type of geometry (point/polyline/polygon/multipolygon/multiple -> this can probably be deduced from wkt, however you'll never know all fields will contain point, if top 5 have a point)

Note that sqlite potentially stores many tables, each table might require a datapackage.json
(maybe have a look at geopackage specification, that introduces a metadata-table within the database; https://github.com/opengis/geopackage)

Note that some datasets will not be flat tables, managing a complex schema is probably managed in another issue

@rufuspollock
Copy link
Contributor Author

@pvgenuchten really useful suggestions. I think our aim here should be to try and be as minimalistic as possible compatible with being useful to a reasonable number of people. Kind of 80/20 but even stronger.

So my question would be: what about essentially metadata do you need to e.g. import geojson usefully into something else. If the answer is none that would be great but i'm imagining the projection might be important. cf also here #81 (geo csv).

@rufuspollock
Copy link
Contributor Author

Strongly inclining to going with a recommendation of geojson and format geojson in the resource.

See also in progress recommendation at http://data.okfn.org/doc/publish-geodata

/cc @peterdesmet @jalbertbowden

@peterdesmet
Copy link
Member

Moving conversation regarding describing properties of a geo data package here. In reply to question by @rgrp regarding this:

could you advise what your use case is for describing the properties, for example will you be processing the data in some way that requires you to know the types of the property fields?

No, I don't have plans to process the data myself, I would just like to provide good metadata for the properties/fields, such as a description, or unit, or type. Example:

{
    "name": "code",
    "description": "Belgian traffic sign code.",
    "web": "http://wiki.openstreetmap.org/wiki/Road_signs_in_Belgium",
    "type": "string"
},

In the Tabular data format one can do this in "schema": { "fields": [] } of the datapackage.json, which I find very useful. Geojson is new to me: maybe it's possible to add metadata about the properties in the geojson file itself, but quite like it in the datapackage.json. I am trying to figure out the recommended way to do this.

@rufuspollock
Copy link
Contributor Author

@dr-shorthair any thoughts here about schemas for properties on features.

@dr-shorthair
Copy link

Coming late to this conversation. @pvgenuchten seems to have good handle on the issues.

Principle dilemma is that, while 2 columns (x,y) looks like obvious solution for points, it begs a lot of questions, particularly the key issue that coordinates are not independent of each and shouldn't be managed or processed independently. So a micro-syntax is preferred which binds them together (as is already done for time in the 8601/xsd 7-component string). Then the options are essentially GeoJSON or WKT. The former has the advantage of software support, but a significant limitation regarding non-2D geometries, and essentially non-existent support for coordinate reference systems. WKT is better on those issues, but is a very niche product! Both support various geometry types, labelled in the data. WKT allows a CRS to be referenced in the data. However, the GeoJSON CRS limitation may not be such a problem in this context, since you would only be using the GeoJSON geometry object so could carry the CRS reference separately, but then we could hit the coordinate-order issue*. Would also have to extend GeoJSON for solid geometries if required.

  • standard CRS definitions also prescribe the coordinate order. There is a historical convention, which is respected by the standard CRS definitions such as epsg:4326, that geographic coordinates are expressed in lat-lon order (i.e. y,x) while projected systems are generally (x,y). GeoJSON has a rule that, regardless of what the CRS says, the coordinate order is always (x,y). This may seem trivial, but there are many many examples of how things can go wrong because of mistaken assumptions.

@pwalsh
Copy link
Member

pwalsh commented May 29, 2017

@Stephen-Gates
Copy link
Contributor

Stephen-Gates commented Jul 5, 2017

I've started a guide on point data in CSVs. Your feedback is very welcome. It touches on some of the issues raised above (CRS, axis-order). Other geometry in CSV's makes less sense to me but happy to write about that also.

Edit: Now published Point location data in CSV files

@henrykironde
Copy link

henrykironde commented Jul 31, 2017

The Spatial Data Package specification:

This proposal provides specifications for the Spatial Data Package. The proposed specifications are an extension of the Data package specification created by Frictionless Data. The current status of the Data package specification cover tabular data (Tabular Data Package). The Tabular Data Package provides a platform to standardize and organize data making sharing among tools and people effortless.

Relationship between a Tabular Data package and a spatial Data package

Unlike Spatial Data, Tabular data is simply text data separated by special delimiters(comma, tab and etc..) in a text file. Spatial data occurs in various forms of complex data structures often associated with the file extension.

Spatial data Categories

Spatial data is categorized into two groups, raster data and vector data. In the vector data model, geographical elements are represented using points, lines and polygons. Vector data captures and represents discrete objects with boundaries(Lakes, Rivers. roads and etc..).

The Raster data model is used to store data element using pixels or cells . The value of these cells captures the type of object or entity that is observed. A good example is a digital photograph, the pixels in the photo store a color that corresponds to the real world object at that point. Rasters can store discrete data, for example thematic information of land cover and continuous data for example chemical concentrations(Carbon Dioxide, Nitrates).

Vector Data Specifications

The specifications inherit the data package specifications like

Recommended Properties

  • name
  • id
  • licenses
  • profile

Optional Properties

  • title
  • description
  • homepage
  • version
  • sources
  • contributors
  • keywords
  • image
  • created
{
 #required
  "name": "name of the data",
  "title": "human readable label or title for the dataset",
  "gis_class": "Raster data or vector data",
  "file_type": "extension of format of the dataset",
  "description": "A good description for the dataset",
  "license": "A license",
  "keywords": ["rivers", "North America",], "keywords separated by comma" 
  "citation": "citation for the dataset",
  "spatial_ref": "Coordinate Reference System"
  "citation": "A good description for the dataset",
  "[path or url]":"path to the file"
  "resources": [
      #For each layer, give a name and the properties 
      #layer one
      { 
        "name": "Name for the layer eg.river",
        "Geometry_type": "point, linestring,....", "geometry_notation": 
        "NoDataValue": "what represents missing values",
        # define attribute data and type for each vector feature
        "schema": { 
          "fields": [
            {
              "name": "data name",
              "type": "data type"
            },
            {
              "name": "data name",
              "type": "data type"
            },
            {...}
          ],
        }
      },
      #layer two
      {....},
      #layer three
      {..}
}

Rasters

Like the vector data specifications, raster data specifications inherit the core components of the data package specifications. Rasters can have multiple nested datasets within a file, however the Json schema take on a similar structure like the vector data schema

The data package

Json schema example

{
    #required
    "name": "name of the data",
    "title": "human readable label or title for the dataset",
    "format": "extension of format of the dataset or  driver required",
    "file_size": "size of file on disk",
    "group_count": "Number of groups in the dataset if applicable"
    "dataset_count": "The number of individual datasets"
    "description": "A good description for the dataset",
    "license": "A license",
    "keywords": ["carbon map", "North America",], "keywords separated by comma" 
    "citation": "citation for the dataset",
    "version": "The version of the dataset"
    "homepage": "The home page of the data"
    "datum": "Coordinate Reference System",

  ""
  "[url or path]": "link to where the data is stored"
  #each band is defined
  "resources": [
    {
      "Group": "Name for the group if applicable",
      "name": "Name for the band",
      "relative_path": "Location relative to route path/url above",
      "resolution": "The resolution",
      "resolution_units": "The units of resolution",
      "dimensions": "dimensions",
      "noDataValue": "pixels where data is missing or no data collected",
      "geoTransform": "The transformation of the dataset",
      "parameter": "The parameter or feature",
      "extent": ["the extent values of the band"],
    },
    { ...},
  ]
}

@henrykironde
Copy link

Thanks @Stephen-Gates for comments in #499. Could you transfer them to his issue.

@Stephen-Gates
Copy link
Contributor

Thanks for this Henry.

I think a worked example using real data would help to clearly separate what's needed in a :

  • spatial data package - similar to tabular data package E.g.
    • Each resource MUST be a Spatial Data Resource
    • or could a mix of Spatial and Tabular data be in a package?
    • should spatial and temporal extent be described at this level or for each resource?
  • spatial data resource - similar to tabular data resource. E.g.
    • the spatial reference system must be included.
    • the supported file types (GeoJSON, GML, etc)
    • would a CSV with point data be a valid resource?
  • "layer schema" - similar to table schema

Thanks for starting the conversation.

@henrykironde
Copy link

@Stephen-Gates, Thanks for the suggestion, I will get some sample data to annotate as examples.

@loleg
Copy link

loleg commented Jan 21, 2022

This is being further developed, and feedback is very welcome in the issues, at https://github.com/cividi/spatial-data-package

@rufuspollock
Copy link
Contributor Author

@loleg that's great ... could you provide a brief summary of state and plans here?

@n0rdlicht
Copy link

Hi @rufuspollock, thanks for checking in.

Very happy to get some feedback on https://github.com/cividi/spatial-data-package#detailed-data-package-structure. A proof of concept viewer is implemented in dfour, deployed for example for simple web publication of client projects with gemeindescan.ch, as a self publishing for events, like sandbox.dfour.space or [campusbochum.de] to public participation, like (https://beteiligung.campusbochum.de/de/SDY4F/0N2AQB/).

Pros

  • no dependency on a specific library or implementation -> independent of renderer, e.g. simple styles spec supported in many map libraries and tools (e.g. geojson.io, GitHub Previews, ...)
  • styles "baked in" -> curated snapshot, human readable, no interpretation needed

Cons

  • requires extra tooling to create styles: hard to update or change style, e.g. we wrote a special QGIS Plugin
  • style not declarative/rule based -> no support for complex style definitions (e.g. zoom based)
  • currently requires/only supports (inline) geojson -> no support for tabular data, e.g. CSV(T) or other frictionless compliant geo data

Potential options

  • Separate data and style definition, e.g. similar to Vega-Lite, but an abstraction of mapbox-gl styles
  • Vega-Lite geo

@roll roll removed this from the Backlog milestone Apr 14, 2023
@frictionlessdata frictionlessdata locked and limited conversation to collaborators Apr 12, 2024
@roll roll converted this issue into discussion #906 Apr 12, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
Status: Done
Frictionless General
  
Specifications
Development

No branches or pull requests