Skip to content

Commit

Permalink
PR #53 from pvgenuchten updates Setup and First chapters
Browse files Browse the repository at this point in the history
review of the setup document & fix typos in ld article
  • Loading branch information
justb4 committed Aug 8, 2022
2 parents f2e355c + 23e1714 commit 654bc23
Show file tree
Hide file tree
Showing 3 changed files with 208 additions and 160 deletions.
37 changes: 26 additions & 11 deletions workshop/content/docs/advanced/json-ld.md
Expand Up @@ -6,22 +6,29 @@ Title: pygeoapi and the semantic web

The work on pygeoapi touches on 3 aspects of semantic web:

- [Search engines](#Search_engines)
- [Publish spatial data in the semantic web](#Publish_spatial_data_in_the_semantic_web)
- [Proxy to semantic web](#Proxy_to_semantic_web)
- [Search engines](#search-engines)
- [Publish spatial data in the semantic web](#publish-spatial-data-in-the-semantic-web)
- [Proxy to semantic web](#proxy-to-semantic-web)


## Search engines

Search engines use technology similar to semantic web, to facilitate capturing structured data (aka rich snippets) from web pages. pygeoapi supports this use case via embedding a schema.org oriented json-ld snippet in its html output, read more at [../seo/index](Search Engine Optimisation). The schema.org ontology is not a formal semantic web ontology, it is therefore disconnected from the rest of the semantic web.
Search engines use technology similar to semantic web, to facilitate capturing structured data (aka rich snippets) from web pages. pygeoapi supports this use case via embedding a `schema.org` json-ld snippet in the html encoding, read more at [Search Engine Optimisation](../seo/index). The `schema.org` ontology is not a formal semantic web ontology, it is therefore a bit disconnected from the rest of the semantic web.

## Publish spatial data in the semantic web

OGC API Common adopted a number of W3C conventions, which brings the API's closer to those of semantic web. At this moment pygeaopi does not aim to be a full implementation of semantic web, however it is possible to advertise some aspects of semantic web, so the data can be traversed by a semantic aware client.
OGC API Common adopted a number of W3C conventions, which bring the API's closer to the standards of semantic web, compared to the OWS standards.
At this moment pygeaopi does not aim to be a full implementation of semantic web, however it is possible to advertise some aspects of semantic web,
so the data can be traversed by semantic aware clients.

!!! question "Use a SPARQL client to query pygeoapi"

[SPARQL](https://en.wikipedia.org/wiki/SPARQL) is commonly known as the query language to query triple stores. However you can also use SPARQL to query graphs of linked web resources. The SPARQL client traverses links between the resources to locate the requested triples. [Jena ARQ](https://jena.apache.org/documentation/query/) is a command line SPARQL client which is able to run such queries. Jena is quite difficult to set up, although there is a [Docker Image](https://hub.docker.com/r/stain/jena) available. As an alternative we'll use a webbased implementation of the ARQ engine. Navigate to https://demos.isl.ics.forth.gr/sparql-ld-endpoint/ and replace the query in the textbox with:
[SPARQL](https://en.wikipedia.org/wiki/SPARQL) is commonly known as the query language to query triple stores.
However you can also use SPARQL to query graphs of linked web resources. The SPARQL client traverses links between
the resources to locate the requested triples. [Jena ARQ](https://jena.apache.org/documentation/query/) is a command
line SPARQL client which is able to run such queries. Jena is quite difficult to set up, although there is a
[Docker Image](https://hub.docker.com/r/stain/jena) available. As an alternative we'll use a webbased implementation
of the ARQ engine. Navigate to [https://demos.isl.ics.forth.gr/sparql-ld-endpoint](https://demos.isl.ics.forth.gr/sparql-ld-endpoint/) and replace the query in the textbox with:


``` {.sql linenums="1"}
Expand All @@ -44,7 +51,7 @@ OGC API Common adopted a number of W3C conventions, which brings the API's close
}
```

Notice that the SPARQL client fails if you hardcode the html format (which has the jsonld snippet embedded).
Notice that the SPARQL client fails if you hardcode the html format.

``` {.sql linenums="1"}
SELECT * WHERE {
Expand All @@ -54,9 +61,13 @@ OGC API Common adopted a number of W3C conventions, which brings the API's close
}
```

Jsonld as expected by search engines has some challenges for semantic web tools. So why does it work if the format is not hardcoded? The SPARQL engine `negotiates` with the endpoint to evaluate which (rdf) encodings are available, and based on the content negotiation it requests the `jsonld` encoding.
Jsonld as expected by search engines has some challenges for semantic web tools. So why does it work if the format is not hardcoded?
The SPARQL engine `negotiates` with the endpoint to evaluate which (rdf) encodings are available, and based on the content negotiation
it requests the `jsonld` encoding.

pygeoapi adopted conventions of the [json-ld](https://json-ld.org) community to annotate json as RDF. Each property (column in a source table) is annotated by a semantic concept. The configuration how to apply the annotations is managed in the context element in the pygeoapi config file. Read more in the [pygeoapi documentation](https://docs.pygeoapi.io/configuration#Linked_data).
pygeoapi adopted conventions of the [json-ld](https://json-ld.org) community to annotate json as RDF. Each property (column in a source table)
is annotated by a semantic concept. The configuration how to apply the annotations is managed in the context element in the pygeoapi config file.
Read more in the [pygeoapi documentation](https://docs.pygeoapi.io/configuration#Linked_data).

``` {.yaml linenums="1"}
context:
Expand All @@ -72,6 +83,10 @@ context:

## Proxy to semantic web

Spatial data engineers are generally challenged when importing and visualising fragments of the semantic web. The number of spatial [clients supporting SQARQL](https://plugins.qgis.org/plugins/sparqlunicorn/) interaction is limited and requires expert knowledge to use. A group within the pygeoapi community aims to facilitate semantic web access for spatial data engineers by introducing pygeoapi as a proxy between the typical GIS clients and the semantic web.
Spatial data engineers are generally challenged when importing and visualising fragments of the semantic web. The number of spatial
[clients supporting SQARQL](https://plugins.qgis.org/plugins/sparqlunicorn/) interaction is limited and requires expert knowledge to use.
A group within the pygeoapi community aims to facilitate semantic web access for spatial data engineers by introducing pygeoapi as a proxy
between the typical GIS clients and the semantic web.

A [new feature](https://github.com/geopython/pygeoapi/pull/615) is being prepared which introduces a SPARQL provider to pygeoapi. The provider enables to browse the results of a SPARQL query as an OGC API Features collection.
A [new feature](https://github.com/geopython/pygeoapi/pull/615) is being prepared which introduces a SPARQL provider to pygeoapi.
The provider enables to browse the results of a SPARQL query as an OGC API Features collection.
147 changes: 83 additions & 64 deletions workshop/content/docs/publish/first.md
Expand Up @@ -7,13 +7,13 @@ title: Exercise 1 - Publish your first dataset with pygeoapi
In this section you are going to publish a vector dataset using `pygeoapi`.

We will use the CSV dataset [free-wifi-florence.csv](https://github.com/geopython/diving-into-pygeoapi/blob/main/workshop/docker/data/free-wifi-florence.csv), free WIFI
access points in Florence, kindly provided by https://opendata.comune.fi.it.
access points in Florence, kindly provided by [opendata.comune.fi.it](https://opendata.comune.fi.it).
You can find this dataset in the `workshop/docker/data` folder.

This exercise consists of two major steps:

* adapt the `docker.config.yml` to define this dataset as an OAPIF *Collection*
* make sure that pygeoapi can find the data file
* Adapt the `docker.config.yml` to define this dataset as an OGC API Features *Collection*
* Make sure that pygeoapi can find the data file

We will use the `docker-compose.yml` file provided.

Expand All @@ -27,56 +27,70 @@ setup provided to you is actually working. Two files are relevant:

To test:

* type `docker-compose up`
* open http://localhost:5000 in your browser, verify datasets
* close by typing Control-C
!!! question "Test the workshop configuration"

NB you may also run the Docker Container in the background (detached):
1. In a terminal shell navigate to the workshop folder and type:
```console
docker-compose up`
```
1. Open `http://localhost:5000` in your browser, verify some collections
1. Close by typing Control-C

* type `docker-compose up -d`
* type `docker ls`, verify `pygeoapi` Container is running
* open http://localhost:5000 in your browser, verify datasets
* view logging: `docker logs --follow pygeoapi`
* `docker-compose stop`
NB you may also run the Docker container in the background (detached):

## Setting up the pygeoapi config file
!!! question "Docker in the background"

* Open the file `workshop/docker/pygeoapi/docker.config.yml` in a text editor.
* Look for the commented config section starting with `# START - EXERCISE 1 - Your First Collection`
* Uncomment all lines until `# END - EXERCISE 1 - Your First Collection`
* make sure that the indentation aligns (hint: directly under `# START ...)
1. Type `docker-compose up -d`
1. Type `docker ls`; verify that the pygeoapi container is running
1. open http://localhost:5000 in your browser, verify some collections
1. view logging: `docker logs --follow pygeoapi`
1. `docker-compose stop`

## Publish first dataset

You are ready to publish your first dataset.

!!! question "Setting up the pygeoapi config file"

1. Open the file `workshop/docker/pygeoapi/docker.config.yml` in a text editor.
1. Look for the commented config section starting with `# START - EXERCISE 1 - Your First Collection`
1. Uncomment all lines until `# END - EXERCISE 1 - Your First Collection`

Make sure that the indentation aligns (hint: directly under `# START ...)

The config section reads:

free_wifi_florence:
type: collection
title: Free WIFI Florence
description: The dataset shows the location of the places in the Municipality of Florence where a free wireless internet connection service (Wifi) is available.
keywords:
- wifi
- florence
links:
- type: text/csv
rel: canonical
title: data
href: https://opendata.comune.fi.it/?q=metarepo/datasetinfo&id=fb5b7bac-bcb0-4326-9388-7e3f3d671d71
hreflang: it-IT
extents:
spatial:
bbox: [11, 43.6, 11.4, 43.9]
crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
providers:
- type: feature
name: CSV
data: /data/free-wifi-florence.csv
id_field: name-it
geometry:
x_field: lon
y_field: lat

The most relevant part is the `providers` section. Here we define a *CSV Provider*,
``` {.yml linenums="185"}
free_wifi_florence:
type: collection
title: Free WIFI Florence
description: The dataset shows the location of the places in the Municipality of Florence where a free wireless internet connection service (Wifi) is available.
keywords:
- wifi
- florence
links:
- type: text/csv
rel: canonical
title: data
href: https://opendata.comune.fi.it/?q=metarepo/datasetinfo&id=fb5b7bac-bcb0-4326-9388-7e3f3d671d71
hreflang: it-IT
extents:
spatial:
bbox: [11, 43.6, 11.4, 43.9]
crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
providers:
- type: feature
name: CSV
data: /data/free-wifi-florence.csv
id_field: name-it
geometry:
x_field: lon
y_field: lat
```

The most relevant part is the `providers` section. Here we define a `CSV Provider`,
pointing the file path to the `/data` directory we will mount (see next) from the local
dir into the Docker Container above. As a CSV is not a spatial file, we tell `pygeoapi`
dir into the Docker container above. Because a CSV is not a spatial file, we tell `pygeoapi`
that the longitude and latitude (x,y) is mapped from the columns `lon` and `lat`.

!!! Tip
Expand All @@ -89,43 +103,48 @@ that the longitude and latitude (x,y) is mapped from the columns `lon` and `lat`
pygeoapi includes a [number of data providers](https://docs.pygeoapi.io/en/latest/data-publishing/ogcapi-features.html#providers) which enable access to a variety of data formats. Via the OGR/GDAL plugin the number of supported formats is almost limitless.
Read on the [data provider page](https://docs.pygeoapi.io/en/latest/data-publishing/ogcapi-features.html#providers) how you can set up a connection to your dataset of choice. You can always copy a relevant example configuration and place it in the datasets section of the pygeoapi config file for your future project.

## Making configuration and data available in the Docker Container
## Making data available in the Docker container

As the Docker Container, here named `pygeoapi`, cannot directly access files on your
local host system, we will use *Docker Volume Mounts*. This can be defined
As the Docker container, here named `pygeoapi`, cannot directly access files on your
local host system, we will use `Docker volume mounts`. This can be defined
in the `docker-compose.yml` file:

* open the file `workshop/docker/docker-compose.yml`
* look for the commented section `# Exercise 1 - `
* uncomment that line `- ./data:/data`
!!! question "Configure access to the data"

1. Open the file `workshop/docker/docker-compose.yml`
1. Look for the commented section `# Exercise 1 - `
1. Uncomment that line `- ./data:/data`

The relevant lines read:

volumes:
- ./pygeoapi/docker.config.yml:/pygeoapi/local.config.yml
- ./data:/data # Exercise 1 - Ready to pull data from here
``` {.yml linenums="43"}
volumes:
- ./pygeoapi/docker.config.yml:/pygeoapi/local.config.yml
- ./data:/data # Exercise 1 - Ready to pull data from here
```

The local `./pygeoapi/docker.config.yml` file was already mounted. Now
we have also mounted (made available) the entire local directory `./data`.

## Test

Moment of truth!
!!! question "Start with updated configuration"

* start by typing `docker-compose up`
* observe logging output
* if no errors: open http://localhost:5000
* look for the Free WIFI Collection
* browse through the collection
1. Start by typing `docker-compose up`
1. Observe logging output
1. If no errors: open http://localhost:5000
1. Look for the Free WIFI Collection
1. Browse through the collection

## Debugging configuration errors

Incidentally you may run into various errors, breifly discussed here:
Incidentally you may run into errors, briefly discussed here:

* A file can not be found, a typo in the configuration,
* the file format is not fully supported.
* A file can not be found, a typo in the configuration.
* The format or structure of the spatial file is not fully supported.
* The port (5000) is already taken. Is a previous pygeoapi still running? If you change the port, consider that you also have to update the pygeoapi config file.

* There are 2 parameters in the config file which help to address these issues.
There are 2 parameters in the config file which help to address these issues.
You can set the logging level to `DEBUG` and indicate a path to a log file.

!!! tip
Expand Down

0 comments on commit 654bc23

Please sign in to comment.