## Rijksmuseum

This notebook explains how you can use Europeana's SPARQL endpoint to explore the contents of the collection of a specific cultural heritage institution - that of the Rijksmuseum. 

In [None]:
!pip install SPARQLWrapper

import requests
from os.path import basename
def download(url):
    response = requests.get(url)
    if response:
        new_file_name = basename(url)
        print(f"{new_file_name} is downloaded!")
        out = open(new_file_name,'w',encoding='utf-8')
        out.write(response.text)
        out.close()
        
download('https://raw.githubusercontent.com/peterverhaar/europeana_research_webinar/refs/heads/main/europeana_sparql.py')

In [None]:
from europeana_sparql import *

As you can see, the query groups the results by institution and counts the number of items for each institution. Using this query, we can establish that there are 159 Dutch institutions which added objects to Europeana. 

The code below lists of all the institutions which have contributed more than 10,000 objects. 

Using COUNT() in the query below, we can verify the information about the number of items added by the Rijksmuseum. 

In [None]:
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>

SELECT COUNT( DISTINCT ?object )
WHERE {


?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .
?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .

?local_aggr edm:dataProvider 'Rijksmuseum' .
}
"""

df = run_query(query)
print( f"There are {df['callret-0.value'].iloc[0]} items in this collection." )


Which kinds of objects can we find in this collection? We can explore the contents of this collection by examing the values supplied for `dc:type`.

For each type, we can also request a human-understandable label, using the `skos:prefLabel` predicate. 

In [None]:
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?label , COUNT(?label)
WHERE {

?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .
?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .

?local_aggr edm:dataProvider 'Rijksmuseum' .
?object dc:type ?type .
?type skos:prefLabel ?label . 
FILTER( lang(?label) = 'en' )
}
GROUP BY ?label
"""

types_df = run_query(query)
types_df['callret-1.value'] = types_df['callret-1.value'].astype(int)
types_df = types_df.sort_values(by=['callret-1.value'] , ascending = False )

for i,row in types_df.iloc[:20].iterrows():
    print( f"{row['label.value']} => {row['callret-1.value']}" )


In this overview, we can see that the type "easel paintings (paintings by form)", from the Getty Vocabularies has been assigned very frequently, namely, more than 3440 times. 

Using the results of the following query, we can display a number of examples of works in this category. 

In [None]:

query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?title ?label ?url ?landing
WHERE {

?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .
?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .

?local_aggr edm:dataProvider 'Rijksmuseum' .

?object dc:title ?title .
?object dc:type ?type .
?type skos:prefLabel ?label .
FILTER( regex(lang(?title),'en') )
FILTER( regex(?label,'easel paintings') )
?local_aggr edm:isShownBy ?url .
?local_aggr edm:isShownAt ?landing .
}
LIMIT 30
"""

df =run_query(query) 
print( f'{df.shape[0]} results.' )

df.columns
for i,row in df.iterrows():
    print( row['title.value'] )
    print( row['landing.value'] )
    img = row['url.value']
    display(HTML(f'<a target="_new" href="{img}"><img src="{img}" style="width: 200px;"/></a><br/><br/>'))


It can also be interesting to search for works created bny a specific artist. In the records from the Rijksmuseum, the artist has been described using the `dc:contributor` predicate.

The code below creates a list of all the works created by Johannes Vermeer. 

In [None]:

query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?object ?title ?contributor WHERE {

?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .
?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .

?local_aggr edm:dataProvider 'Rijksmuseum' .

  ?object dc:contributor ?contributor .
  ?object dc:title ?title .
  FILTER( regex(?contributor,'Vermeer, Johannes') )
} 
LIMIT 10
"""

df = run_query(query)
for i,row in df.iterrows():
    print( f"{row['object.value']}")
    print( f"{row['title.value']}")
    print( f"{row['contributor.value']}\n" )

Many of the works of art contributed by the Rijksmuseum have been described using [Iconclass](https://iconclass.org).

The SPARQL query below enables us to find all the works in the Rijksmuseum that have been assigned the Iconclass code '[49N](https://iconclass.org/49N)', describing the act of reading. 

In [None]:
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?object ?title ?created ?url ?landing
WHERE {

VALUES ?iconclass { <http://iconclass.org/49N> <http://iconclass.org/25F36> } 

?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .
?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .

?local_aggr edm:dataProvider 'Rijksmuseum' .

?object dc:title ?title .
?object dcterms:created ?created . 
?object dc:subject ?iconclass .

?local_aggr edm:isShownBy ?url .
?local_aggr edm:isShownAt ?landing .

}
LIMIT 30
"""

df = run_query(query)
df = df.drop_duplicates(subset=['object.value'])
print( f'There {df.shape[0]} objects displaying swans.' )


for i,row in df.iterrows():

    print(row['title.value'])
    img = row['url.value']
    url = row['landing.value']
    display(HTML(f'<a target="_new" href="{url}"><img src="{img}" style="width: 200px;"/></a>'))



Using a similar method, we can find all the works of art that depict books. The Iconclass URI `ttp://iconclass.org/49M32` refers to books. 

In [None]:
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?object ?title ?url ?landing
WHERE {

?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .
?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .

?local_aggr edm:dataProvider 'Rijksmuseum' .

?object dc:title ?title .
?object dc:subject <http://iconclass.org/49M32> .

?local_aggr edm:isShownBy ?url .
?local_aggr edm:isShownAt ?landing .

}
LIMIT 20

"""

subjects_df = run_query(query)
print( f'{subjects_df.shape[0]} results.' )


for i,row in subjects_df.iterrows():
    print( row['title.value'] )
    print( row['landing.value'] )
    img = row['url.value']
    
    display(HTML(f'<a target="_new" href="{img}"><img src="{img}" style="width: 200px;"/></a><br/><br/>'))


The query below request information about the Iconclass subjects that have been assignment most frequently. 

In [None]:

query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?label, COUNT( DISTINCT ?object )
WHERE {

?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .
?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .

?local_aggr edm:dataProvider 'Rijksmuseum' .
?object dc:subject ?subject .
?subject skos:prefLabel ?label .
FILTER(regex(?subject, 'iconclass' ))
FILTER(lang(?label) = 'en')
}
GROUP BY ?label 

"""

subjects_df = run_query(query)

subjects_df['callret-1.value'] = subjects_df['callret-1.value'].astype(int)
subjects_df  = subjects_df.sort_values(by=['callret-1.value'] , ascending = False )

for i,row in subjects_df.iloc[:15].iterrows():
    print( f"{row['label.value']}\t{row['callret-1.value']}" )

In this overview, we can see that the collection contains many panoramas or silhouettes of cities. These objects have been described with iconclass URI `http://iconclass.org/25I12`.

The query below firstly selects all of these works. For each of these works, it also gathers the date of creation. 

In [None]:
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
PREFIX html: <http://www.w3.org/1999/xhtml/vocab#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT *
WHERE {
?object ore:proxyIn ?local_aggr .
?object ore:proxyFor ?cho .
?eur_aggr edm:aggregatedCHO ?cho .
?eur_aggr a edm:EuropeanaAggregation .

?local_aggr edm:dataProvider 'Rijksmuseum' .

?object dc:subject ?value .

FILTER(?value IN (<http://iconclass.org/49N>,<http://iconclass.org/49N1>,<http://iconclass.org/49N2>,<http://iconclass.org/49N3>))
?object dcterms:created ?created . 
}
"""

df = run_query(query)
df = df.drop_duplicates(subset=['object.value'])
print( f"There are {df.shape[0]} works of art depicting the act of reading." )

When were these works of art produced?

In [None]:
dates = []

def standardise_date(date):
    date = re.sub( r'\D' , '' , date )
    date = date.strip()
    return int(date.strip())

for i,row in df.iterrows():
    date = row['created.value']
    
    if re.search( r'-' , date ):
        
        parts = re.split( r'-' , date )
        nr1 = standardise_date(parts[0])
        nr2 = standardise_date(parts[1])

        if re.search( r'\d{4}' , str(nr1)) and re.search( r'\d{4}' , str(nr2)):
            mean_date = round((nr1+nr2)/2)
            dates.append(mean_date)
            
    else:
        dates.append(standardise_date(date))
        
dates_freq = dict()
for year in dates:
    dates_freq[year] = dates_freq.get(year,0)+1
        

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns


plt.figure( figsize = ( 10,5 ) )

graph = sns.lineplot( x= dates_freq.keys() , y= dates_freq.values() , color = '#131875' )


graph.set_xlabel('Years' , size = 16 )
graph.set_ylabel('Number of objects' , size = 16 )
graph.set_title( 'Works of art depicting reading' , size = 22 )


plt.show()