Skip to content

projekt-opal/metadata-refinement

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OPAL metadata refinement

  • Language Detection based on Apache OpenNLP updates language tags of title and description literals of 4 languages.
  • Geographic data based on LauNuts adds geo data of 8,495 places in Germany.

Usage with Apache Maven

Add the following lines to your pom.xml configuration file:

<dependencies>
	<dependency>
		<groupId>org.dice-research.opal</groupId>
		<artifactId>metadata-refinement</artifactId>
		<version>[1,2)</version>
	</dependency>
</dependencies>

<repositories>
	<repository>
		<id>maven.aksw.internal</id>
		<name>AKSW Repository</name>
		<url>http://maven.aksw.org/archiva/repository/internal</url>
	</repository>
	<repository>
		<id>maven.aksw.snapshots</id>
		<name>AKSW Snapshot Repository</name>
		<url>http://maven.aksw.org/archiva/repository/snapshots</url>
	</repository>
</repositories>

Available versions are listed at maven.aksw.org.

Examples

Language tags

import java.io.File;
import org.apache.jena.rdf.model.Model;
import org.dice_research.opal.common.utilities.FileHandler;
import org.dice_research.opal.metadata.GeoData;
import org.dice_research.opal.metadata.LanguageDetection;
public class Example {

	/**
	 * Updates language tags of title and description literals.
	 * 
	 * @param turtleInputFile  A TURTLE file to read
	 * @param turtleOutputFile A TURTLE file to write results
	 * @param datasetUri       A URI of a dcat:Dataset inside the TURTLE data
	 * 
	 * @see https://www.w3.org/TR/turtle/
	 * @see https://www.w3.org/TR/vocab-dcat/
	 */
	public void updateLanguageTags(File turtleInputFile, File turtleOutputFile, String datasetUri) throws Exception {

		// Load TURTLE file into model
		Model model = FileHandler.importModel(turtleInputFile);

		// The call of initialize() is optional. It can be used to trigger the download
		// of the required language model (10 MB).
		LanguageDetection languageDetection = new LanguageDetection();
		languageDetection.initialize();

		// Update model
		languageDetection.processModel(model, datasetUri);

		// Write updated model into TURTLE file
		FileHandler.export(turtleOutputFile, model);
	}

Example input:

<http://example.org/>
        a       <http://www.w3.org/ns/dcat#Dataset> ;
        <http://purl.org/dc/terms/title>
                "Places in Berlin" .

Example output:

<http://example.org/>
        a       <http://www.w3.org/ns/dcat#Dataset> ;
        <http://purl.org/dc/terms/title>
                "Places in Berlin"@en .

Geographic data

import java.io.File;
import org.apache.jena.rdf.model.Model;
import org.dice_research.opal.common.utilities.FileHandler;
import org.dice_research.opal.metadata.GeoData;
import org.dice_research.opal.metadata.LanguageDetection;
public class Example {

	/**
	 * Creates geo data based on names of places that are found in the title and
	 * description of the specified dataset.
	 * 
	 * @param turtleInputFile  A TURTLE file to read
	 * @param turtleOutputFile A TURTLE file to write results
	 * @param datasetUri       A URI of a dcat:Dataset inside the TURTLE data
	 * 
	 * @see https://www.w3.org/TR/turtle/
	 * @see https://www.w3.org/TR/vocab-dcat/
	 */
	public void createGeoData(File turtleInputFile, File turtleOutputFile, String datasetUri) throws Exception {

		// Load TURTLE file into model
		Model model = FileHandler.importModel(turtleInputFile);

		// Update model
		new GeoData().processModel(model, datasetUri);

		// Write updated model into TURTLE file
		FileHandler.export(turtleOutputFile, model);
	}

Example input:

<http://example.org/>
        a       <http://www.w3.org/ns/dcat#Dataset> ;
        <http://purl.org/dc/terms/title>
                "Places in Berlin" .

Example output:

<http://example.org/>
        a       <http://www.w3.org/ns/dcat#Dataset> ;
        <http://purl.org/dc/terms/spatial>
                [ a       <http://projekt-opal.de/Location> , <http://purl.org/dc/terms/Location> ;
                  <http://www.w3.org/2000/01/rdf-schema#label>
                          "Berlin" ;
                  <http://www.w3.org/ns/dcat#centroid>
                          "POINT(52.5005 13.4022)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>
                ] ;
        <http://purl.org/dc/terms/title>
                "Places in Berlin" .

Note

Version alpha can be found at branch metadata-alpha. It includes Language Detection based on Apache OpenNLP, Named Entity Recognition based on FOX, and a JavaScript word picker as well as configurations for Docker usage and webservices.

Credits

Data Science Group (DICE) at Paderborn University

This work has been supported by the German Federal Ministry of Transport and Digital Infrastructure (BMVI) in the project Open Data Portal Germany (OPAL) (funding code 19F2028A).

About

Language detection and geographic data (D3.3)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages