## The Europeana SPARQL endpoint

Europeana is an initiative of the European Union aimed at enhancing the digital access to the collections of more than 3000 cultural heritage institutions acress Europe. Over the past few years, more than 50 million objects have been accumulated from these institutions. On the platform, users can find detailed information about a wide variety of cultural heritage artefacts, including paintings, books, music, and videos and audio files. 

The data that have been collected can also be searched using SPARQL on Europeana’s SPARQL endpoint, available at http://sparql.europeana.eu/. 

The cell below defines the SPARQL endpoint and creates a function to run queries against this endpoint. 

In [None]:
!pip install SPARQLWrapper

In [1]:
from SPARQLWrapper import SPARQLWrapper, JSON
from IPython.core.display import display, HTML
import pandas as pd
import re

import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

endpoint = 'http://sparql.europeana.eu/'
sparql = SPARQLWrapper(endpoint)

def run_query(sparql_query):
    sparql.setQuery(sparql_query)
    sparql.setReturnFormat(JSON)
    data = sparql.query().convert()
    df = pd.json_normalize(data['results']['bindings'])  
    return df
    
def print_results(df):
    for i,row in df.iterrows():
        for column in df.columns:
            if re.search('value$' , column):
                print( f'{column} => {row[column].strip()}' )
        print('-------')

Before you start to work with the Europeana data, it is useful to familiarise yourself with the Europeana data model.  

The model can be visualised as follows:

![](Europeana.svg)

A central resource in the datamodel is `edm:providedCHO`. 

This `edm:providedCHO` is part of two `ore:Aggregations`. There is a local `ore:Aggregation`, next to an `edm:EuropeanaAggregation`. Europeana makes a distinction between the metadata that are supplied by the local data provider on the one hand, and the metadata that has been added by Europeana on the other. 

The term 'Aggregation' is taken from the [*Object Reuse and Exchange*](https://www.openarchives.org/ore/) vocabulary, which was developed to describe compound digital objects.  

Both aggregations contain `ore:Proxy` resources. These `ore:Proxy` resources contain most of the descriptive metadata about the heritage objects. The ‘local’ Proxy (i.e. the `ore:Proxy` resource connected to `ore:Aggregation`) contains most of the metadata that have been supplied by the contributing instution. The `ore:Proxy` resource connected to the `edm:EuropeanaAggregation` contains some additional metadata. 

More detailed information about the Europeana Data Model can be found in the [Primer](http://www.openarchives.org/ore/1.0/primer.html).  

The explanation of this model may have seemed slightly complicated, but, to work with the Europeana data, you mainly need to know the predicates that are available for these various resources. 

The `ore:Proxy` in the `ore:Aggregation` can be described using the following properties:

* dc:title
* dc:creator
* dc:relation
* dc:type
* dcterms:created
* dcterms:extent
* dc:publisher
* dc:identifier
* dc:created
* dc:source
* edm:type
* dc:coverage
* dc:description
* dc:format
* dc:language
* dc:subject
* dcterms:isPartOf
* dcterms:spatial

You can use these predicates to search metadata harvested from the local data provider. 


The query below searches for objects within all the data available at [Europeana.eu](https://www.europeana.eu/en).  It is specified that the objects should be of type 'IMAGE', The type is indicated in the `dc:type` property. The query shows the title, the type and the date of creation of a small collection of objects.

In [55]:
query = '''
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
PREFIX edm:     <http://www.europeana.eu/schemas/edm/>
PREFIX ore:     <http://www.openarchives.org/ore/terms/>

SELECT ?object ?title ?type ?date
WHERE {

    ?object dc:title ?title .
    ?object dc:date ?date .
    ?object dc:type ?type .
    ?object dc:identifier ?id
    OPTIONAL{
    ?object dc:publisher ?pub .
    ?object dc:created ?cr . } .
    
    ?object dc:type 'IMAGE' . 

}
ORDER BY ?title
LIMIT 15
'''

df = run_query(query)
for i,row in df.drop_duplicates(subset='object.value', keep="last").iterrows():
    print(f"{row['title.value'].strip()}\n{row['date.value'].strip()}\nType: {row['type.value']}\n")

Brambovska
1881
Type: IMAGE

Gôrska cvetica
1881
Type: IMAGE

Lahko noč
1880
Type: IMAGE

Ljubezen do dóma
1881
Type: IMAGE

Mojemu narodu
1881
Type: IMAGE

Pri oknu sva molče slonela
1922
Type: IMAGE

Samo
1880
Type: IMAGE

Slovanska pesem
1897
Type: IMAGE

Sporočilo
1880
Type: IMAGE

Strunam
1880
Type: IMAGE

Έλληνες συλλαμβάνονται και απολύονται υπό Τούρκων τρομοκρατών Τούρκοι τραυματίζονται υπό των ομοφύλων των
29/3/1964
Type: IMAGE

Θερμοκρασίαι και Βροχόπτωσις
6/3/1963
Type: IMAGE



It must be noted, however, that the values in the dc:type predicate have not been entered consistently by all institutions. As a results of this, you will not necessarily receive all images. 

The `ore:Proxy` object in the local ore:Aggregation has descriptive metadata supplied by the data provided, as was mentioned. Data about the data provider itself and about the rights associated with the object can be found in the `ore:Aggregation` resource this local Proxy belongs to. The `ore:Proxy` is connected to the `ore:Aggregation` via the `ore:proxyIn` predicate. 

The local `ore:Aggregation` is described using the following properties: 

* edm:dataProvider
* edm:rights
* edm:isShownAt
* edm:isShownBy
* edm:object
* edm:provider


The query below firstly generates a list of all the institutions that contribute to Europeana in the Netherlands.

In [116]:
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>

SELECT ?inst , COUNT(?inst)
WHERE {

?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .

?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .

?local_aggr edm:dataProvider ?inst .
?eur_aggr edm:country 'netherlands' .
}
GROUP BY ?inst

"""

df = run_query(query)

print( f'{df.shape[0]} Dutch institutions contribute to Europeana.' )

159 Dutch institutions contribute to Europeana.


The query groups the results by institution and counts the number of items for each institution. Using this query, we can establish that there are 159 Dutch institutions which added objects to Europeana.

The code below lists of all the institutions which have contributed more than 10,000 objects.

In [57]:
print( f'\nThe following institutions have contributed more that 10000 objects:' )

df = df.sort_values(by=['inst.value']  )

count = 0 
for i,row in df.iterrows():
    if int(row['callret-1.value']) > 10000:
        count += 1
        print( f"{count}. {row['inst.value']}: {row['callret-1.value']}" )


The following institutions have contributed more that 10000 objects:
1. Beeldbank Wageningen: 12792
2. Brabants Historisch Informatie Centrum: 163454
3. CODA Apeldoorn: 30699
4. CODA Museum: 10580
5. Circus Museum: 16580
6. Deventer Musea: 37394
7. Digitale bibliotheek voor de Nederlandse letteren, DBNL, Nederland: 17043
8. Erfgoed Rijssen-Holten: 17221
9. Euregionaal Historisch Centrum Sittard-Geleen (ehc.sittard-geleen.eu): 53054
10. EuroPhoto ANP provider: 155448
11. Gemeente Schouwen-Duiveland: 37755
12. Gemeentearchief Ede-Barneveld: 41483
13. Gemeentearchief Roosendaal: 63465
14. Gemeentearchief Schiedam: 38651
15. Gemeentearchief Tholen: 27160
16. Gemeentearchief Weert: 75024
17. Gemeentearchief Zaanstad: 85843
18. Gemeentemuseum Den Haag: 19558
19. Gooi en Vecht Historisch: 56505
20. Haags Gemeentearchief: 210084
21. Het Nationaal Glasmuseum: 72631
22. Historisch Centrum Leeuwarden: 44268
23. Historisch Centrum Limburg (rhcl.nl): 13759
24. Historisch Museum Ede: 15016
25. Hist

It is also possible to request the links of the digital objects. This link is usually provided as an object followung the `edm:isShownBy` predicate. 

`edm:isShownAt` mostly contains a link to the landing page describing the object. 

In [114]:
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>

SELECT ?title ?url ?type
WHERE {

?object dc:title ?title .
?object ore:proxyIn ?local_aggr .
?local_aggr edm:dataProvider "Rijksdienst voor het Cultureel Erfgoed" .

?local_aggr edm:isShownBy ?url .

}
ORDER BY ?title
LIMIT 5
"""

df = run_query(query)

In [115]:
for i,row in df.drop_duplicates(subset='url.value',keep='last').iterrows():
    print(f"{row['title.value']}\n{row['url.value']}\n")

Huis Vecht en Dijk
http://images.memorix.nl/rce/thumb/800x800/83239dd6-758c-c053-45ce-83d6df52fe05.jpg

Huis Vecht en Dijk
http://images.memorix.nl/rce/thumb/800x800/a26ee4a9-4179-a9fc-348f-1a8b57766adf.jpg

Huis Vecht en Dijk
http://images.memorix.nl/rce/thumb/800x800/4fa94eb9-68c2-2f1c-1178-3435e31596b7.jpg

Huis Vecht en Dijk
http://images.memorix.nl/rce/thumb/800x800/2c223afb-3005-d0b3-0042-172566d2fbe7.jpg

Huis Vecht en Dijk
http://images.memorix.nl/rce/thumb/800x800/acd32e49-930c-c315-ac3b-79ada9ce82ef.jpg



The `ore:Proxy` resource that is connected to a `ore:Aggregation` is also a proxy for an `edm:aggregatedCHO` resource. This latter resource is also part of an `edm:EuropeanaAggregation`. This resource can be described using the following predicates:
    
* edm:collectionName
* edm:country
* edm:landingPage
* edm:language

Once you have found the `edm:EuropeanaAggregation` associated with an `edm:aggregatedCHO`, you ca search for objects managed by a cultural heritage institution in a specific country, or for objects in a specific langauge. 

The SPARQL query focuses on 'TEXT' objects and counts the number of items in the various languages. 

In [173]:
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>

SELECT ?lang COUNT(?lang)
WHERE {
?object edm:type "TEXT" .
?object dc:title ?title .
?object dc:creator ?creator .

?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .

?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .

?eur_aggr edm:language ?lang .
}
GROUP BY ?lang
"""

df = run_query(query)


In [174]:
df['callret-1.value'] = df['callret-1.value'].astype(int)

for i,row in df.sort_values(by='callret-1.value',ascending=False).iterrows():
    print(f"{row['lang.value']} => {row['callret-1.value']}")

no => 2808227
de => 1579741
es => 1480041
fr => 1122632
nl => 942155
mul => 702935
it => 445845
pl => 369171
en => 355850
sl => 301694
sv => 280420
el => 245680
hu => 164495
sr => 114410
pt => 53486
da => 38870
uk => 32879
fi => 28344
bg => 26395
ru => 23735
hr => 23134
et => 22942
cs => 18811
lt => 12619
ro => 11921
is => 5106
ca => 4331
mk => 117
be => 100
lv => 11
sk => 7
ga => 5
bs => 5
eu => 4
