Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skohub support for OpenRefine #11

Closed
tombaker opened this issue Oct 1, 2022 · 9 comments
Closed

Skohub support for OpenRefine #11

tombaker opened this issue Oct 1, 2022 · 9 comments
Assignees
Projects

Comments

@tombaker
Copy link

tombaker commented Oct 1, 2022

In a project at NAL with @woody544, "NALT for the Machine Age", we frequently need to reconcile domain vocabularies, documented in CSV files, with NALT in order to associate domain terms with NALT URIs.

We currently do this with reconcile-csv, which performs fuzzy matching between a column in the community vocabulary spreadsheet with a column in a spreadsheet of NALT labels.

reconcile-csv works fine for fuzzy matching of labels, but we see no way to integrate the tool into our own workflow scripts or even to configure the process automatically. Running reconcile-csv involves clicking on menu choices in a specific order and manually filling in fields on forms - a fiddly process that we document, internally, with annotated screenshots. It also requires that we generate a CSV of NALT labels solely for the purpose of reconciliation. We note that reconcile-csv has remained at version 0.1.3-SNAPSHOT since 2015.

We wonder if a more sophisticated reconciliation process might take into account more than just labels - for example, definitions, scope notes, and related concepts.

Ideally, we'd also like to automate the reconciliation process as much as possible. For example, might a configuration file be used to specify which column in a given spreadsheet is to be reconciled against the concept scheme loaded in Skohub?

We have indicated our interest in support for OpenRefine reconciliation in issues for two other SKOS environments: Skosmos and Annif.

We note with interest the development of an improved Reconciliation Service API in a W3C community group.

@acka47 acka47 changed the title Skohub support for OpenRefine r Skohub support for OpenRefine Oct 10, 2022
@sroertgen sroertgen transferred this issue from skohub-io/skohub.io Feb 13, 2023
@sroertgen
Copy link
Contributor

Hello @tombaker,

I want to start working on SkoHub Reconcile to facilitate reconciliation with SKOS vocabularies in OpenRefine using the above mentioned spec. Trying to build a user story from what you've written I was just asking myself, if the vocabularies you would use are always the files you linked above or if the SKOSMOS API would also be available in this scenario. I just found the links to the search, but not to the SKOSMOS API on the NALT website.

@tombaker
Copy link
Author

tombaker commented Mar 8, 2023

@sroertgen We do use Skosmos for NALT - in fact, @osma is part of the NAL project team (though with his Annif hat). Implementing an OpenRefine API for Skosmos is, as far as I know, still an open issue - see NatLibFi/Skosmos#23 . Since we plan to use Skosmos for all vocabularies, large and small, having such an API would solve the problem for us.

@sroertgen
Copy link
Contributor

@tombaker Thanks!

I think the SkoHub Reconcile module might be designed so generic that it could use the already existing Skosmos API. I'm just looking for the "entrypoint", i.e. the Concept Scheme to get all relevant data. I'm not too familiar with Skosmos, so bare with me, when this is too obvious, but right now I have trouble receiving the Concept Schemes data.

I can receive concept data with:

curl --request GET \
  --url https://lod.nal.usda.gov/nalt/38784.rdf \
  --header 'accept: application/json'

and in the response I find the link to the concept scheme, i.e. https://lod.nal.usda.gov/nalt

but a curl to the concept scheme directs me to the html search page, e.g.

curl --request GET \
  --url https://lod.nal.usda.gov/nalt \
  --header 'accept: application/json'

(I tried with .rdf and without, but no difference)

If I can get the concept scheme data it should be possible to use that for reconciliation.

CC @osma

@osma
Copy link

osma commented Mar 8, 2023

@sroertgen What exactly do you mean by concept scheme data? Just the triples where the subject is the skos:ConceptScheme instance of NALT? Or do you mean the whole SKOS file / graph containing all concepts?

Skosmos has a REST API with methods to access individual concept data, download the whole vocabulary etc.

But the NAL installation you refer to is actually not a pure Skosmos instance, it is a Drupal site (iirc) with Skosmos running in the background, and they expose some, but not nearly all, the Skosmos functionality through the Drupal facade. I'm not sure if they make the Skosmos REST API available at all.

@sroertgen
Copy link
Contributor

@osma

What exactly do you mean by concept scheme data? Just the triples where the subject is the skos:ConceptScheme instance of NALT? Or do you mean the whole SKOS file / graph containing all concepts?

The first one. A JSON representation of the Concept Scheme with its top concepts etc.

But the NAL installation you refer to is actually not a pure Skosmos instance, it is a Drupal site (iirc) with Skosmos running in the background, and they expose some, but not nearly all, the Skosmos functionality through the Drupal facade. I'm not sure if they make the Skosmos REST API available at all.

I guess that is the reason why I don't get any results with the above mentioned curl for the concept scheme. A drupal site sits on top of it I guess.

Thanks!

@osma
Copy link

osma commented Mar 9, 2023

If you want the triples of the concept scheme as JSON (JSON-LD actually), you can get them from a Skosmos REST API endpoint using an URL like this:

https://api.finto.fi/rest/v1/yso/data?uri=http://www.yso.fi/onto/yso/&format=application/ld%2Bjson

This of course won't necessarily help with the NAL installation, because in my understanding, it doesn't expose the Skosmos REST API at all.

The Skosmos REST API (unlike SkoHub, I think) has a little bit of indirection between the API URLs and the URIs of concepts and concept schemes. Basically the URLs of the REST API are independent of the concept and concept scheme URIs. So each REST API method generally takes the URI as a parameter. It's maybe not pretty, but it's often necessary because the URI namespace of the vocabulary is often quite different from the URL where Skosmos has been installed.

@sroertgen
Copy link
Contributor

Hello @tombaker,

a quick update on the current status.

We prototyped a reconciliation service for SKOS vocabularies that can be used in Open Refine.
There will also be a small webform which one can use to upload your vocabularies to the service and then you will be given a service URL you can use as a reconciliation service in Open Refine.

I hope that we will be able to deploy this to a public test system within the next 2 weeks. I will let you know and if you are interested, you are kindly invited to try out and provide feedback.

@sroertgen
Copy link
Contributor

Hello @tombaker ,

just wanted to let you know I deployed the nalt core dataset to our prototype reconcile service.

https://reconcile.skohub.io/_reconcile?language=en&account=nalt&dataset=https://lod.nal.usda.gov/nalt

Using this URL you can add a reconcile service in open refine (or use this in any other tools implementing the reconciliation spec).

Matches might look like this:

image

And you can also search:

image

I would love to hear if this is somehow useful for you and the behavior you would expect.

I had to make three smal adjustments to the NALT Dataset:

  1. I had to remove one skos:ConceptScheme. The Reconcile Publish Service (https://reconcile-publish.skohub.io/) can currently just handle one concept scheme per file. If two concept schemes per file are a common thing, please give me some feedback on this, then I will think about how to implement this.

  2. I had to add a vann:preferredNamespaceUri: vann:preferredNamespaceUri <https://lod.nal.usda.gov/nalt/> ; The service needs this info to build URLs that can point back to your vocabularies. I know that this is not optimal, but it is kind of needed since the reconciliation spec requires an identifierSpace

  3. I had to comment out rdfs:label of the concept scheme. To be honest, I still have to figure out why, since I'm not using it ( at least I think so), but that should be easy to fix.

If you have any questions, please ask!

Best
Steffen

@sroertgen
Copy link
Contributor

will close this now since SkoHub Reconcile now basically supports OpenRefine (at least again with 68d2ac4 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
SkoHub
Backlog
Development

No branches or pull requests

3 participants