Datasets to explore use of RDF in biodiversity informatics
These datasets where described in my blog post TDWG Challenge - what is RDF good for?, see also Reflections on the TDWG RDF “Challenge”.
The core files are:
- uniprot.rdf Uniprot RDF for frogs in GenBank
- ion.rdf Index of Organism Names (ION) RDF for taxonomic names for frogs (filtered to just those names that are also in GenBank, the RDF comes from ION LSIDs)
- crossref.rdf CrossRef RDF for DOIs for publications that published new frog names (obtaining using CrossRef’s support for Linked Data for DOIs)
- dbpedia.rdf Dbpedia RDF for frogs in GenBank (most fields removed to keep file size manageable)
Because these files have little to link them, I also created some “glue” files:
- linkout.rdf The list of links between NCBI and Dbpedia, based on mapping in iPhylo LinkOut http://dx.doi.org/10.1371/currents.RRN1228
- ion_doi.rdf A subset of publications listed in ION have DOIs, this file links the corresponding ION LSIDs to those DOIs (this file came from a project which eventually became BioNames
In the blog posts I explore what can be done with these files. For example, in Reflections on the TDWG RDF “Challenge” I created a table listing the name, conservation status, publication DOI and date, and (where available) image from Wikipedia for frog species with sequences in GenBank. The SPARQL for this is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpowl: <http://dbpedia.org/ontology/>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX tname: <http://rs.tdwg.org/ontology/voc/TaxonName#>
PREFIX tcommon: <http://rs.tdwg.org/ontology/voc/Common#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?name ?status ?doi ?date ?thumbnail WHERE {
?ncbi uniprot:scientificName ?name .
?ncbi rdfs:seeAlso ?dbpedia .
?dbpedia dbpowl:conservationStatus ?status .
?ion tname:nameComplete ?name .
?ion tcommon:publishedInCitation ?doi .
?doi dcterms:date ?date .
OPTIONAL
{
?dbpedia dbpowl:thumbnail ?thumbnail
}
}
ORDER BY ASC(?status)
There is a live version of this dataset at http://dydra.com/rdmpage/tdwg-challenge
Below is the output of this SPARQL query: