Skip to content

sfomuseum/go-whosonfirst-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-whosonfirst-data

Tools for working with Who's On First (WOF) style documents.

Important

Work in progress. This package might be renamed.

Tools

To build binary versions of these tools run the cli Makefile target. For example:

> make cli
go build -mod vendor -o bin/emit cmd/emit/main.go

emit

A command-line tool for parsing and emitting individual WOF records from a WOF data source.

$> ./bin/emit -h
Usage of ./bin/emit:
  -data-source string
    	A valid whosonfirst/go-whosonfirst-index URI. (default "directory://")
  -format-json
    	Format JSON output for each record.
  -json
    	Emit a JSON list.
  -null
    	Emit to /dev/null
  -oembed
    	Emit results as OEmbed records
  -oembed-author-name string
    	A default value for the OEmbed 'author_name' property. (default "SFO Museum")
  -oembed-author-uri-template string
    	A valid RFC 6570 URI template to use for the OEmbed 'author_url' property. (default "https://millsfield.sfomuseum.org/id/{wof_id}")
  -oembed-media-label string
    	A valid (WOF) media:properties.sizes property label to identify image data. (default "z")
  -oembed-media-uri-template fmt
    	A valid Go language fmt template for constucting a RFC 6570 URI template to use for the OEmbed 'url' property. (default "https://millsfield.sfomuseum.org/media/%s/%d_{secret}_{label}.{extension}")
  -oembed-provider-name string
    	A default value for the OEmbed 'provider_name' property. (default "SFO Museum")
  -oembed-provider-url string
    	A default value for the OEmbed 'provider_url' property. (default "https://millsfield.sfomuseum.org/")
  -oembed-thumbnail-media-label string
    	A valid (WOF) media:properties.sizes property label to identify image data for thumbnails. (default "n")
  -query value
    	One or more {PATH}={REGEXP} parameters for filtering records.
  -query-mode string
    	Specify how query filtering should be evaluated. Valid modes are: ALL, ANY (default "ALL")
  -stdout
    	Emit to STDOUT (default true)

Data sources

The emit tool uses the go-whosonfirst-index package to read data from a variety of sources. The following data sources are supported by default:

Emit data from one or more directories containing GeoJSON (WOF) records.

For example:

$> bin/emit -data-source directory:// \
	/usr/local/data/sfomuseum-data-media/data/

Emit data from one or more files containing GeoJSON FeatureCollection (WOF) records.

$> bin/emit -data-source featurecollection:// \
	/path/to/featurecollection.geojson

Feature collection records may also be read from STDIN. For example:

$> cat /path/to/featurecollection.geojson \

   | bin/emit -data-source featurecollection:// STDIN

Emit data from one or more files.

For example:

$> bin/emit -data-source file:// \
	/path/to/feature1.geojson \
	/path/to/feature2.geojson	

Emit data from a list of files (to read and emit).

For example:

$> bin/emit -data-source filelist:// /path/to/files.txt

File lists may also be read from STDIN. For example:

$> cat /path/to/files.txt \

   | bin/emit -data-source filelist:// STDIN

Emit data encoded in one or more line-delimited GeoJSON files.

For example:

$> bin/emit -data-source geojsonls:// /path/to/features.jsonl

Line-delimited GeoJSON records may also be read from STDIN. For example:

$> cat /path/to/features.jsonl \

   | bin/emit -data-source geojsonls:// STDIN

Emit data from one or more Git repositories:

For example:

$> bin/emit -data-source git:// \
	https://github.com/sfomuseum-data/sfomuseum-data-flights-2020-07.git \
	https://github.com/sfomuseum-data/sfomuseum-data-flights-2020-06.git	

Emit data from one or more directories where the relevent WOF data is expected to be found in a data subdirectory.

For example:

$> bin/emit -data-source repo:// \
	/usr/local/data/sfomuseum-data-media \
	/usr/local/data/sfomuseum-data-media-collection	

JSON

By default all records are emitted as line-delimited JSON records. A side-effect of this is that the default WOF formatting is lost. In order to preserve the original formatting pass in the -format-json flag.

$> bin/emit /usr/local/data/sfomuseum-data-media/data/

{"bbox":[-122.387197,37.619087,-122.387197,37.619087],"geometry":{"coordinates":[-122.387197,37.619087],"type":"Point"},"id":1159341477,"properties":{"edtf:cessation":"uuuu","edtf:inception":"uuuu","geom:area":0,"geom:area_square_m":0,"geom:bbox":"-122.387197,37.619087,-122.387197,37.619087","geom:latitude":37.619087,"geom:longitude":-122.387197,"iso:country":"US","media:created":1443524024,"media:fingerprint":"857752f82858b46502479f803da8a52f1e168d5e","media:imagehash_avg":"a:f8ffbe8070f0c080","media:imagehash_diff":"d:40b0383aa3870f3d","media:medium":"image","media:mimetype":"image/jpeg","media:properties":{"colours":[{"closest":[{"hex":"#9c2542","name":"Big Dip O' Ruby","reference":"crayola"},{"hex":"#a52a2a","name":"brown","reference":"css4"}],"hex":"#8e362e","name":"#8e362e","reference":"vibrant"},{"closest":[{"hex":"#a5694f","name":"Sepia","reference":"crayola"},{"hex":"#a0522d","name":"sienna","reference":"css4"}],"hex":"#9c5b59","name":"#9c5b59","reference":"vibrant"},{"closest":[{"hex":"#cdc5c2","name":"Silver","reference":"crayola"},{"hex":"#c0c0c0","name":"silver","reference":"css4"}],"hex":"#c1c4c4","name":"#c1c4c4","reference":"vibrant"},{"closest":[{"hex":"#414a4c","name":"Outer Space","reference":"crayola"},{"hex":"#2f4f4f","name":"darkslategrey","reference":"css4"}],"hex":"#4b4a4a","name":"#4b4a4a","reference":"vibrant"}],"depicts":["1159160617"],"medium":"image","mimetype":"image/jpeg","sizes":{"b":{"extension":"jpg","height":682,"mimetype":"image/jpeg","secret":"PH2VfvpO7NRUAyns7QxdvOO2YfwYhyib2QK5FleNQ34LmzliMWYN","width":1024},"c":{"extension":"jpg","height":533,"mimetype":"image/jpeg","secret":"PH2VfvpO7NRUAyns7QxdvOO2YfwYhyib2QK5FleNQ34LmzliMWYN","width":800},"d":{"extension":"jpg","height":320,"mimetype":"image/jpeg","secret":"PH2VfvpO7NRUAyns7QxdvOO2YfwYhyib2QK5FleNQ34LmzliMWYN","width":320},"dd":{"extension":"jpg","height":533,"mimetype":"image/jpeg","secret":"PH2VfvpO7NRUAyns7QxdvOO2YfwYhyib2QK5FleNQ34LmzliMWYN","width":800},"n":{"extension":"jpg","height":213,"mimetype":"image/jpeg","secret":"PH2VfvpO7NRUAyns7QxdvOO2YfwYhyib2QK5FleNQ34LmzliMWYN","width":320},"o":{"extension":"jpg","height":2400,"mimetype":"image/jpeg","secret":"ACqgPxrMRzHdzGRJfkBqGVMtP2L9gTrn7mgfhMJesqhjXWJpmRK7","width":3600},"sq":{"extension":"jpg","height":320,"mimetype":"image/jpeg","secret":"PH2VfvpO7NRUAyns7QxdvOO2YfwYhyib2QK5FleNQ34LmzliMWYN","width":320},"z":{"extension":"jpg","height":426,"mimetype":"image/jpeg","secret":"PH2VfvpO7NRUAyns7QxdvOO2YfwYhyib2QK5FleNQ34LmzliMWYN","width":640}},"source":"user","status_id":0},"media:source":"sfomuseum","media:status_id":1,"mz:hierarchy_label":1,"mz:is_approximate":1,"mz:is_current":-1,"sfomuseum:placetype":"image","src:geom":"unknown","wof:belongsto":[102527513,102087579,1159341477,85688637,1159396315,1159396157,1159396321,102191575,85633793,85922583],"wof:breaches":[],"wof:country":"US","wof:created":1528920235,"wof:depicts":[1159396315,102527513,1360516119,1159396321,1159396157,1159160617],"wof:geomhash":"86e9b7d5fe1f6f1a6479fa62588a1dea","wof:hierarchy":[{"building_id":1159396321,"campus_id":102527513,"concourse_id":1159396315,"continent_id":102191575,"country_id":85633793,"county_id":102087579,"locality_id":85922583,"media_id":1159341477,"neighbourhood_id":-1,"region_id":85688637,"wing_id":1159396157}],"wof:id":1159341477,"wof:lastmodified":1577131152,"wof:name":"Installation view of \"The Nation’s Game: A History of the National Football League\"","wof:parent_id":1159396315,"wof:placetype":"media","wof:repo":"sfomuseum-data-media","wof:superseded_by":[],"wof:supersedes":[],"wof:tags":[]},"type":"Feature"}
...and so on

Or:

$> bin/emit --format-json /usr/local/data/sfomuseum-data-media/data/

{
  "id": 1377012109,
  "type": "Feature",
  "properties": {
    "edtf:cessation": "2017-01-30",
    "edtf:inception": "2016-05-26",
    "geom:area": 0,
    ...
  },
  "geometry": "..."
}
and so on...

If you want to emit records as a valid JSON list then enable the -json flag.

Inline queries

You can also specify inline queries by passing a -query parameter which is a string in the format of:

{PATH}={REGULAR EXPRESSION}

Paths follow the dot notation syntax used by the tidwall/gjson package and regular expressions are any valid Go language regular expression. Successful path lookups will be treated as a list of candidates and each candidate's string value will be tested against the regular expression's MatchString method.

For example:

$> bin/emit /usr/local/data/sfomuseum-data-media/data/ \
	-query 'properties.wof:belongs_to=\b102087579\b' \

   | wc -l

1122

You can pass multiple -query parameters. The default query mode is to ensure that all queries match but you can also specify that only one or more queries need to match by passing the -query-mode ANY flag.

For example, this is how you would query the sfomuseum-data-flights-2020-07 Git repository filtering for records involving either Boeing 737-8 or Airbus A321 aircraft. The results are emitted as a JSON list and piped to the jq tool which prints their sfomuseum:flight_id property:

> bin/emit \
	-json \
	-query 'properties.icao:aircraft=B738' \
	-query 'properties.icao:aircraft=A321' \
	-query-mode ANY \
	-data-source git:// \
	https://github.com/sfomuseum-data/sfomuseum-data-flights-2020-07.git \

   | jq '.[]["properties"]["sfomuseum:flight_id"]'
   
"20200701-A-DAL-696"
"20200701-A-DAL-807"
"20200701-A-DAL-958"
"20200701-A-JBU-115"
"20200701-A-JBU-1415"
"20200701-A-JBU-1833"
"20200701-A-JBU-415"
"20200701-A-JBU-577"
"20200701-A-JBU-915"
"20200701-A-SCX-395"
"20200701-A-SWA-1654"
"20200701-A-SWA-1817"
"20200701-A-SWA-2065"
"20200701-A-SWA-300"
"20200701-A-SWA-930"
...and so on
"20200721-D-SCX-396"
"20200721-D-SWA-1655"
"20200721-D-SWA-1693"
"20200721-D-SWA-3244"
"20200721-D-SWA-946"
"20200721-D-UAL-1273"
"20200721-D-UAL-1578"
"20200721-D-UAL-352"
"20200721-D-UAL-355"
"20200721-D-UAL-367"
"20200721-D-UAL-673"

OEmbed

It is also possible to emit OpenAccess records as OEmbed documents of type "photo".

For example:

$> bin/emit \
	-format-json \
	-oembed \
	/usr/local/data/sfomuseum-data-media/data/

{
  "version": "1.0",
  "type": "photo",
  "width": 640,
  "height": 453,
  "title": "Installation view of \"Before the 21st Century: An Ode to Boats, Cars, Motorcycles, Planes, and Trains\"",
  "url": "https://millsfield.sfomuseum.org/media/137/702/095/5/1377020955_GkHONnz4lqxYWQ9me6mBLNmZdthfrTKv_z.jpg",
  "author_name": "SFO Museum",
  "author_url": "https://millsfield.sfomuseum.org/id/1377020955",
  "provider_name": "SFO Museum",
  "provider_url": "https://millsfield.sfomuseum.org/",
  "object_uri": "wof://id/1377020955"
}
{
  "version": "1.0",
  "type": "photo",
  "width": 640,
  "height": 380,
  "title": "Installation view of \"Airline Identity: Marks, Brands and Logos\"",
  "url": "https://millsfield.sfomuseum.org/media/137/704/368/7/1377043687_agKvxo3EzdgRyNUahkBXadodNcV0Vvgx_z.jpg",
  "author_name": "SFO Museum",
  "author_url": "https://millsfield.sfomuseum.org/id/1377043687",
  "provider_name": "SFO Museum",
  "provider_url": "https://millsfield.sfomuseum.org/",
  "object_uri": "wof://id/1377043687"
}
... and so on

Image data for OEmbed records is expected to be found in one or more properties in the properties.media:properties dictionary of the WOF record. Specifically:

  • properties.media:properties.sizes.{STRING_LABEL} - A dictionary containing dimensions and other details for constructing a URL for an image identified by a string label.
  • properties.media:properties.uri_template - A valid RFC6570 URI template for constructing an image URL using the details derived from properties.media:properties.sizes.{STRING_LABEL}.

Here is an abbreviated example of a WOF record with media: properties.

{
  "id": 1159339627,
  "type": "Feature",
  "properties": {
    ...  		
    "media:created": 1508957796,
    "media:fingerprint": "fd6e55e1ea940673e8dd7edfdacf0c2d546b8d6a",
    "media:imagehash_avg": "a:c2e7e7effbff0000",
    "media:imagehash_diff": "d:868e8e9e92c2eaba",
    "media:medium": "image",
    "media:mimetype": "image/jpeg",
    "media:properties": {
      "medium": "image",
      "mimetype": "image/jpeg",
      "sizes": {
        ...      	       
        "z": {
          "extension": "jpg",
          "height": 480,
          "mimetype": "image/jpeg",
          "secret": "UaqY5CItyrimU82DjYTYxy6XfRZXO0tD1YfBHWYhLnxGK1id8sdf",
          "width": 320
        },
	"uri_template": "https://millsfield.sfomuseum.org/media/115/933/962/7/1159339627_{secret}_{label}.{extension}"
      }    			
    }
}

If a WOF record does contain a media:properties.uri_template property then the value of the -oembed-media-uri-template flag will be used to construct a URI template. For example:

	wof_id := 1159339627
	wof_tree := "115/933/962/7"
	media_template_uri = fmt.Sprintf(opts.MediaURITemplate, wof_tree, wof_id)

Important: Everything involving media in WOF documents remains a work in progress and is still subject to change.

If no relevant media information can be found in a WOF feature then the code will render the feature's geometry as an SVG image and assign a base-64 encoded data URL of the representation to the OEmbed record's url property. For example:

$> bin/emit \
	-format-json \
	-oembed \
	-query 'properties.wof:concordances.iata:code=SFO' \
	/usr/local/data/sfomuseum-data-whosonfirst/data/

{
  "version": "1.0",
  "type": "photo",
  "width": 800,
  "height": 640,
  "title": "San Francisco International Airport",
  "url": "data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iODAwLjAwMDAwMCIgaGVpZ2h0PSI2NDAuMDAwMDAwIiB2aWV3Qm94PSIwIDAgODAwIDY0MCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48cGF0aCBkPSJNNDguMDYzMTI4IDMwOC4yMDk3ODg ...(truncated for the sake of brevity)... IFoiLz48L3N2Zz4=",
  "author_name": "SFO Museum",
  "author_url": "https://millsfield.sfomuseum.org/id/102527513",
  "provider_name": "SFO Museum",
  "provider_url": "https://millsfield.sfomuseum.org/",
  "object_uri": "wof://id/102527513"
}

If you put the value of the url property in to an HTML <img /> tag you'd see this:

URI Templates

TBW

author_url

TBW

Variable Description
{wof_id} TBW

media_uri

TBW

Variable Description
{wof_id} TBW
{wof_tree} TBW

See also