Skip to content

metadatacenter/datacite-controlled-vocabulary

Repository files navigation

incentive logo

incentive logo

CI

Controlled vocabularies allow an accurate and controlled approach in describing physical and digital assets (e.g., data). One of such controlled vocabulary is Datacite Controlled Vocabulary. This controlled vocabulary is produced based on description of Datacite Schema V4.4. The work of creating this controlled vocabulary is part of FAIRware project which if funded by RoRi.

sheet2rdf and OntoStack, are used to build and serve Datacite Controlled Vocabulary, while PURL, is used to persist identifiers for the vocabulary:

http://purl.org/datacite/v4.4/

Tooling

DOI sheet2rdf

This repository hosts automatic workflow, executed by means of Github actions, and underlying shell and python scripts which:

  • Fetches Google Sheet from Google Drive and stores is as xlsx and csv files
  • Converts fetched sheet to machine-actionable and FAIR RDF vocabulary using xls2rdf
  • Tests the resulting RDF vocabulary using qSKOS
  • Commits conversion results and tests logs to this repository
  • and deploy RDF vocabulary to OntoStack to be served to humans and machines

This workflow is an extension of excel2rdf.

Configuring sheet2rdf

In case you want to use sheet2rdf in your own work you need to:

  1. Follow gsheets Quickstart and generate client_secrets.json and storage.json

  2. Create following Github secrets:

Secret Explanation Datacite Controlled Vocabulary
DB_USER user name of Jena Fuseki user account that has privilages to PUT RDF vocabulary to the database ****
DB_PASS password of for the above account Jena Fuseki account ****
FILE_NAME file name that will be used when converting Google sheet to .ttl (RDF), .xlsx, and .csv files vocabulary
GRAPH graph in the database under which the above RDF vocabulary should be stored http://purl.org/datacite/v4.4/
SHEET_ID unique ID of the sheet that will be fetched from Google drive 1vmsxnnCRKkKRcJoRRkoQ5499U-IZgKD6ZBtUu41zz1M
SPARQL_ENDPOINT endpoint to which RDF vocabulary is PUT ****
STORAGE access token to Google Drive hosting Google sheet with controlled terms definitions, content of client_secret.json ****
CLIENT configuration for client (i.e., sheetrdf) that is fetching Google sheet, content of storage.json ****

Citation

In case you are using this workflow the author kindly requests you to cite this repository in your publications such as:

Nikola Vasiljevic. (2021, January 11). sheet2rdf: First release (Version v0.1). Zenodo. http://doi.org/10.5281/zenodo.4432136

For any other citation format visit http://doi.org/10.5281/zenodo.4432136

License

This work is licensed under Apache 2.0 License

OntoStack

OntoStack is a set of orchestrated micro-services configured and interfaced such that they can intake vocabularies and resolve their terms and RDF properties upon requests either by humans or machines.

Some of OntoStack micro-services are:

  • Jena Fuseki a graph database
  • SKOSMOS a web-based SKOS browser acting as a front-end for the vocabularies persisted by the graph database
  • Træfik an edge router responsible for proper serving of URL requests

Currently three instances of OntoStack are available:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published