Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a predicate for the location of the NetCDF file #34

Open
adamml opened this issue Mar 3, 2020 · 8 comments
Open

Adding a predicate for the location of the NetCDF file #34

adamml opened this issue Mar 3, 2020 · 8 comments

Comments

@adamml
Copy link
Contributor

adamml commented Mar 3, 2020

Rationale

In discussions on the group telecon, it came to light that the Binary Array LD (BALD) specification would describe the contents of the NetCDF file for NetCDF-LD, but not provide a link to the NetCDF file itself. It became apparent that an extra predicate would be needed in the RDF representation of a Binary Array file in order to support this.

The file location should be an optional, user-specified parameter supplied at runtime.

Approach

A number of options have been considered:

Due to the stability and maturity of the vocabularies, it was decided to focus on the Schema.org or DCAT options.

A further consideration was the grouping of NetCDF files into collections, which may be acheived in either Schema.org or DCAT if the contents of the NetCDF file are considered to be a Dataset and the collection of the NetCDF files a DataCatalog. The ability to nest, or to create heirarchies of catalogues was also considered, such as a collection of NetCDF files being available with other files or collections through a THREDDS server. While we do not provide an implementation pathway for this, the consideration motivated us to focus on DCAT which at the time of writing supports nesting catalogues, whereas Schema.org does not.

Boilerplate code

First an addition to the BALD ontology will be required:

@prefix dcat: <http://www.w3.org/ns/dcat#>.

bald:Container a dcat:Dataset. 

Then the following boilerplate would allow a software agent to traverse the graph to find the file to download the NetCDF data from:

@base <http://foo.bar/my-netcdf-file.nc>.

@prefix bald: <https://www.opengis.net/def/binary-array-ld/>.
@prefix dcat: <http://www.w3.org/ns/dcat#>.
@prefix dct: <http://purl.org/dc/terms/>.

<./> a bald:Container;
	dcat:distribution [
		a dcat:Distribution;
		dcat:downloadURL <>;
		dcat:mediaType [
			a dct:MediaType;
			dct:identifier "application/x-netcdf"
		];
		dct:format [
			a dct:MediaType;
			dct:identifier <http://vocab.nerc.ac.uk/collection/M01/current/NC/>
		]
	].

Graph of the abover TTL

Further Considerations

  • If a supplied file name ends in a '/', then the base URL should not have the '/' appended in as the subject of the graph.

Questions

@jyucsiro, @marqh a couple of questions/topics for discussion:

  1. Does this look like the approach we discussed on the call?
  2. I think there may be a subtlty I am missing in the way @base is parsed, at least one library I have used ignored the filename beyond the final slash when converting to RDF/XML. We may want to have a discussion about using the full URI if we take this to production.
  3. Are we ok with the introduction of blank nodes here?
  4. Is there a better URI which defines NetCDF than the oine I have used here?
  5. The MIME type I used is not actually registered with IANA, and also there is a suggestion that THREDDS also has a different MIME type for NetCDF 3 and NetCDF 4. Can we handle this?
@marqh
Copy link
Member

marqh commented Apr 2, 2020

@adamml
@jyucsiro

Rob has raised a query on the PR looking to update the vocabulary with respect to
opengeospatial/NamingAuthority#39

Should

@prefix dcat: <http://www.w3.org/ns/dcat#>.

bald:Container a dcat:Dataset. 

be

@prefix dcat: <http://www.w3.org/ns/dcat#>.

bald:Container rdfs:subClassOf dcat:Dataset. 

?

Please may you consider this question?

thank you
mark

@marqh
Copy link
Member

marqh commented Apr 2, 2020

fwiw, i think that the use of rdfs:subClassOf is valid here

@adamml
Copy link
Contributor Author

adamml commented Apr 2, 2020

@marqh I've been trying to find headspace to think about this between meetings. I'd agree with you that it's valid here, yes.

@marqh
Copy link
Member

marqh commented Apr 3, 2020

many thanks @adamml

I have updated the request for change with the OGC NA

@marqh
Copy link
Member

marqh commented Apr 6, 2020

The update to the BALD vocabulary has now been adopted

opengeospatial/NamingAuthority#39

A bald:Container instance is now also a dcat:Dataset

https://www.opengis.net/def/binary-array-ld

@marqh
Copy link
Member

marqh commented Apr 28, 2020

is there a more definitive definition of a netCDF file than

dct:format [
	a dct:MediaType;
	dct:identifier <http://vocab.nerc.ac.uk/collection/M01/current/NC/>
	]

@adamml
Copy link
Contributor Author

adamml commented Aug 14, 2020

We should consider adding this infpormation to the Schema.org representation of BALD as well, e.g.;

{
   "@context": "https://schema.org/",
   "@type": "Dataset",
   "distribution": {
     "@type": "DataDownload",
     "contentUrl": "http://",
     "encodingFormat": [
       "application/x-netcdf",
       "http://vocab.nerc.ac.uk/collection/M01/current/NC/"
     ]
   }
}

@simonoakesepimorphics
Copy link

Sorry if this discussion is already closed, I can raise a new issue instead if appropriate.
The containment part of section 6 states that groups can be "contained by" files, which I interpret as:

<file.nc> a bald:Container ;
    bald:contains <file.nc/> .

Or, "the root group is contained by the file". Is this interpretation valid / should the wording of that section be changed to reflect the intentions discussed above?

marqh added a commit to binary-array-ld/bald that referenced this issue Feb 16, 2021
* Addressing Schema.org of opengeospatial/netcdf-ld#34

* Add distribution ENUM

* Resolving conflicts

* Trying to import Enum

* Added SchemaOrg class to __init__.py

With distribution method

* enabling running of nc2rdf with schemaOrg code

* remove print stmt

* test schemOrg class

* Updating Schema.org output

* Working on Schema.org output

* Working on Schema.org output

* Working on Schema.org output

* Working on Schema.org output

* Working on Schema.org output

* Working on Schema.org output

* Working on Schema.org output

* Edited Schema.org test TTL

* Changed Schema.org test TTL

* Editing Schema.org test TTL

* Working on Schema.org test

* Updating Schema.org output for tests

* update results

Co-authored-by: Jonathan Yu <jonathan.yu@csiro.au>
Co-authored-by: marqh <markh@metarelate.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants