LOD Cloud Quality Assessment

This repository contains a number of scripts that enables the quality assessment of the LOD Cloud.

There are two modules in the GIT repository

LODObserver Module

This module contains scripts that crawls the LOD Cloud snapshot and create an observer metadata:

<http://purl.org/obs/resource#2001-spanish-census-to-rdf> a ns1:Dataset ;
    ns2:modified "2015-12-18T13:23:22.256298"^^xsd:dateTime ;
    ns2:source <http://datahub.io/dataset/2001-spanish-census-to-rdf> ;
    ns1:category "government"^^xsd:string ;
    ns1:namespace <> ;
    ns1:payLevelDomain <http://dataweb.infor.uva.es/census2001> ;
    ns3:dataDump <http://visualdataweb.infor.uva.es/censo/RDFData.html>,
        <http://visualdataweb.infor.uva.es/censo/census90M.n3.gz>,
        <http://visualdataweb.infor.uva.es/census/resource/edificios>,
        <http://visualdataweb.infor.uva.es/census/resource/hogares>,
        <http://visualdataweb.infor.uva.es/census/resource/nucleos>,
        <http://visualdataweb.infor.uva.es/census/resource/personas> ;
    ns3:sparqlEndpoint <http://visualdataweb.infor.uva.es/sparql> .

In this module there are three scripts:

lodobserver.py - crawls the snapshot and create the metadata;
lodobserver_withCategory.py - same as lodobserver but adds categories (assigned from the LOD cloud) to the metadata;
lodExperiments.py - create statistics out of the observed data.

LODQA Module

This module deals with the quality assessment. The Luzzu Quality Assessment framework (https://github.com/eis-bonn/Luzzu/) is required to be installed and running beforehand.

In this module there the following files/scripts

main.py - the main script for running the quality assessment. For this, the quality metrics have to be defined in config.ttl;
generateCategoriesForLuzzu.py - this script generates a file with categories for each dataset, which then should be used for the assessment of the Reuse Existing Terms metric;
preprocess.sh - downloads the datasets' data dumps and pre-process them prior to assessment (if a dataset's dump is already downloaded, it is not redownloaded).

Scripts folder

In the scripts folder, there are a number of installation scripts (for ubuntu) that are required to run these experiments. We suggest that such installation and experiments are performed on a virtual machine or docker instances.

Steps:

$ sudo chmod +x preInstall.sh
$ sudo chmod +x luzzu.sh
$ sudo ./preInstall.sh
$ sudo ./luzzu.sh

Once everything is installed, run Luzzu as per the instructions in (https://github.com/eis-bonn/Luzzu/).

License

This work is licensed under the MIT licensed

How to Cite

@article{debattistalod,
  title={Are LOD Datasets Well Represented? A Data Representation Quality Survey.},
  author={Debattista, Jeremy and Lange, Christoph and Auer, S{\"o}ren},
  url={https://www.researchgate.net/publication/301765676_Are_LOD_Datasets_Well_Represented_A_Data_Representation_Quality_Survey}
}

Publications

Are LOD Cloud Datasets Well Represented? A Data Representation Quality Survey (Under Review) - pdf

Acknowledgements

I would like to thank Sören Auer, Christoph Lange, and Aidan Hogan for their valuable contribution towards this work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LOD Cloud Quality Assessment

LODObserver Module

LODQA Module

Scripts folder

License

How to Cite

Publications

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

LOD Cloud Quality Assessment

LODObserver Module

LODQA Module

Scripts folder

License

How to Cite

Publications

Acknowledgements