Controlled vocabularies allow an accurate and controlled approach in describing physical and digital assets (e.g., data). One of such controlled vocabulary is Datacite Controlled Vocabulary. This controlled vocabulary is produced based on description of Datacite Schema V4.4. The work of creating this controlled vocabulary is part of FAIRware project which if funded by RoRi.
sheet2rdf
and OntoStack
, are used to build and serve Datacite Controlled Vocabulary, while PURL, is used to persist identifiers for the vocabulary:
http://purl.org/datacite/v4.4/
This repository hosts automatic workflow, executed by means of Github actions, and underlying shell and python scripts which:
- Fetches Google Sheet from Google Drive and stores is as
xlsx
andcsv
files - Converts fetched sheet to machine-actionable and FAIR RDF vocabulary using xls2rdf
- Tests the resulting RDF vocabulary using qSKOS
- Commits conversion results and tests logs to this repository
- and deploy RDF vocabulary to OntoStack to be served to humans and machines
This workflow is an extension of excel2rdf.
In case you want to use sheet2rdf in your own work you need to:
-
Follow gsheets Quickstart and generate client_secrets.json and storage.json
-
Create following Github secrets:
Secret | Explanation | Datacite Controlled Vocabulary |
---|---|---|
DB_USER | user name of Jena Fuseki user account that has privilages to PUT RDF vocabulary to the database | **** |
DB_PASS | password of for the above account Jena Fuseki account | **** |
FILE_NAME | file name that will be used when converting Google sheet to .ttl (RDF), .xlsx , and .csv files |
vocabulary |
GRAPH | graph in the database under which the above RDF vocabulary should be stored | http://purl.org/datacite/v4.4/ |
SHEET_ID | unique ID of the sheet that will be fetched from Google drive | 1vmsxnnCRKkKRcJoRRkoQ5499U-IZgKD6ZBtUu41zz1M |
SPARQL_ENDPOINT | endpoint to which RDF vocabulary is PUT | **** |
STORAGE | access token to Google Drive hosting Google sheet with controlled terms definitions, content of client_secret.json | **** |
CLIENT | configuration for client (i.e., sheetrdf) that is fetching Google sheet, content of storage.json | **** |
In case you are using this workflow the author kindly requests you to cite this repository in your publications such as:
Nikola Vasiljevic. (2021, January 11). sheet2rdf: First release (Version v0.1). Zenodo. http://doi.org/10.5281/zenodo.4432136
For any other citation format visit http://doi.org/10.5281/zenodo.4432136
This work is licensed under Apache 2.0 License
OntoStack is a set of orchestrated micro-services configured and interfaced such that they can intake vocabularies and resolve their terms and RDF properties upon requests either by humans or machines.
Some of OntoStack micro-services are:
- Jena Fuseki a graph database
- SKOSMOS a web-based SKOS browser acting as a front-end for the vocabularies persisted by the graph database
- Træfik an edge router responsible for proper serving of URL requests
Currently three instances of OntoStack are available:
- Departamental instance of DTU Wind Energy: http://data.windenergy.dtu.dk/ontologies
- National (Danish) instance ran by DeiC: http://ontology.deic.dk/
- International instance ran by FAIR Data Collective: http://vocab.fairdatacollective.org