## Retrieving bibliographic records related to the Spanish Civil War from the Biblioteca Virtual Miguel de Cervantes LOD repository

This example employs a SPARQL query to retrieve up to 100 works related to the subject "España -- Historia -- 1936-1939 (Guerra civil)" from the Biblioteca Virtual Miguel de Cervantes LOD repository available at https://data.cervantesvirtual.com/sparql.

This example shows how to use <a href="https://www.w3.org/TR/sparql11-query/">SPARQL</a> as a query language in Linked Open Data repositories.

### First of all, we init the SPARQLWrapper service with the SPARQL endpoint

In [23]:
from SPARQLWrapper import SPARQLWrapper

sparql = SPARQLWrapper("http://data.cervantesvirtual.com/openrdf-sesame/repositories/data")

### Then we define our CONSTRUCT query to extract the metadata

In [24]:
sparql.setQuery("""
PREFIX rdac: <http://rdaregistry.info/Elements/c/>
PREFIX rdaw: <http://rdaregistry.info/Elements/w/>

CONSTRUCT {
    ?s rdf:type rdac:Work .
    ?s rdaw:author ?author .
    ?s dc:subject "España -- Historia -- 1936-1939 (Guerra civil)" .
    ?s rdaw:titleOfTheWork ?title .
    ?s rdaw:formOfWork ?form
}WHERE {
    ?s rdf:type rdac:Work .
    ?s rdaw:author ?author .
    ?s dc:subject "España -- Historia -- 1936-1939 (Guerra civil)" .
    ?s rdaw:titleOfTheWork ?title .
    OPTIONAL {?s rdaw:formOfWork ?form}
} 
LIMIT 100
"""
)

### Finally, we serialise the result

In [25]:
results = sparql.queryAndConvert()
print(results.serialize())

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rdaw: <http://rdaregistry.info/Elements/w/> .

<https://data.cervantesvirtual.com/work/1048971> a <http://rdaregistry.info/Elements/c/Work> ;
    dc:subject "España -- Historia -- 1936-1939 (Guerra civil)" ;
    rdaw:author <https://data.cervantesvirtual.com/person/108269> ;
    rdaw:formOfWork <http://id.loc.gov/authorities/genreForms/gf2014026168> ;
    rdaw:titleOfTheWork "El carlismo ante la reorganización de las derechas. De la Segunda Guerra Carlista a la Guerra Civil" .

<https://data.cervantesvirtual.com/work/1049318> a <http://rdaregistry.info/Elements/c/Work> ;
    dc:subject "España -- Historia -- 1936-1939 (Guerra civil)" ;
    rdaw:author <https://data.cervantesvirtual.com/person/9728> ;
    rdaw:formOfWork <http://id.loc.gov/authorities/genreForms/gf2014026168> ;
    rdaw:titleOfTheWork "Una encuesta política en la España de la preguerra" .

<https://data.cervantesvirtual.com/work/1055461> a <http://rdaregistry.info

### We can save the results to a file

In [26]:
with open("output/bibliographic.ttl", "w") as text_file:
    text_file.write(results.serialize())

### We can also provide metadata about the extracted dataset using ontologies and controlled vocabularies

In [27]:
from rdflib import Graph, URIRef, Literal, Namespace
from rdflib.namespace import FOAF, RDF, DCTERMS, VOID, DC, SKOS, OWL
import datetime

In [28]:
domain = 'https://example.org/'

g = Graph()
g.bind("foaf", FOAF)
g.bind("rdf", RDF)
g.bind("dcterms", DCTERMS)
g.bind("dc", DC)
g.bind("void", VOID)
g.bind("skos", SKOS)
g.bind("owl", OWL)

schema = Namespace("https://schema.org/")
g.bind("schema", schema)

viaf = Namespace("https://viaf.org/viaf/")
g.bind("viaf", viaf)

wd = Namespace("http://www.wikidata.org/entity/")
g.bind("wd", wd)

In [29]:
dataset = URIRef(domain + "dataset/bibliographic-metadata-bvmc")

g.add((dataset, RDF.type, schema.Dataset))
g.add((dataset, schema.url, URIRef("https://www.cervantesvirtual.com")))
g.add((dataset, schema.description, Literal("This example is based on bibliographic records related to the Spanish Civil War from the Biblioteca Virtual Miguel de Cervantes LOD repository.")))
g.add((dataset, schema.name, Literal("Bibliographic records related to the Spanish Civil War from the Biblioteca Virtual Miguel de Cervantes LOD repository")))
g.add((dataset, DC.title, Literal("Bibliographic records related to the Spanish Civil War from the Biblioteca Virtual Miguel de Cervantes LOD repository")))
g.add((dataset, schema.license, URIRef('https://creativecommons.org/publicdomain/zero/1.0/')))

now = datetime.datetime.now()
g.add((dataset, schema.dateCreated, Literal(str(now)[:10])))

<Graph identifier=N857dcd68bd9b4b5c80933ca7a6746d17 (<class 'rdflib.graph.Graph'>)>

Let's store the metadata generated

In [30]:
g.serialize(destination="output/metadata-bibliographic-bvmc.ttl") 

<Graph identifier=N857dcd68bd9b4b5c80933ca7a6746d17 (<class 'rdflib.graph.Graph'>)>

### Finally we can analyse the metadata generated

In [31]:
input_file = "output/bibliographic.ttl"
g = Graph().parse(input_file)

Let's check the number of properties

In [32]:
print('##### Number of properties:')

# Query the data in g using SPARQL
q = """
    SELECT (count(distinct ?prop) as ?properties)
    WHERE {
        ?s ?prop ?o .
    }
"""

# Apply the query to the graph and iterate through results
for r in g.query(q):
    print(r["properties"])

##### Number of properties:
5


We can also check the total number of triples

In [33]:
print('##### Number of triples:')
    
# Query the data in g using SPARQL
q = """
    SELECT (COUNT(*) as ?triples) 
    WHERE { ?s ?p ?o } 
"""

# Apply the query to the graph and iterate through results
for r in g.query(q):
    print(r["triples"])

##### Number of triples:
468
