# Creating Collections as Data Using Federated Queries / BNF Query Building (2)
## About the Notebook
This notebook demonstrates how to use [SPARQL](https://www.w3.org/TR/sparql11-query/) to query Linked Open Data repositories. Specifically, it showcases how to perform federated queries by combining data from [Wikidata](https://www.wikidata.org) and the [Bibliothèque nationale de France (BNF)](https://data.bnf.fr), which has published its catalog as Linked Open Data.

[![Launch](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/semanticnoodles/federated-cad/HEAD?filepath=notebooks/01-query-building/LOD-BNF-Federated-Query.ipynb)

<!--🔖 **How to Cite**: [![DOI](https://zenodo.org/badge/DOI_NUMBER.svg)](https://doi.org/DOI_NUMBER) fix once we get the CITATION.cff set up in the GitHub repo-->

---

## Getting Started
First of all, we start by importing the `SPARQLWrapper` library, which is used to interact with SPARQL endpoints.
Then, the SPARQL endpoint for Wikidata is set up, and JSON is the return format specified for our query.

In [19]:
# Import necessary libraries
from SPARQLWrapper import SPARQLWrapper, JSON

# Set up the SPARQL endpoint for Wikidata
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setReturnFormat(JSON)

## Writing Our First SPARQL Query

In this section, we construct a federated SPARQL query that begins by retrieving the BNF identifier (`P268`) for a specific Wikidata entity, in this case, the author Jorge Juan y Santacilia (`wd:Q2085725`). What makes this query federated is the use of the `SERVICE` keyword, which allows us to access the BNF SPARQL endpoint and combine its data with Wikidata.

Differently from the previous example, we are using [*Dublin Core*](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/) metadata terms to retrieve the bibliographic information; we do so by querying the endpoint to retrieve biblographic expressions (`?expression`) associated to this author. 

Following the [*Functional Requirements for Bibliographic Records (FRBR)*](https://repository.ifla.org/items/ffb50f46-46ab-4ec4-8970-b00e2b0d2811), an *expression* is understood as a specific realization of a work (e.g. a particular edition or translation), while a *manifestation* is the physical or digital embodiment of that expression. Although this model underlies [*RDA (Resource Description and Access)*](https://www.cilip.org.uk/members/group_content_view.asp?group=215283&id=776197#:~:text=RDA%3A%20Resource%20Description%20and%20Access%20is%20a%20content%20standard%20providing,and%20cultural%20heritage%20bibliographic%20metadata.), RDA defines its own vocabulary and applies descriptive elements at specific entity levels. Therefore, in this case descriptive metadata for a single work are associated with `?manifestation`, not expression.

As such, for each manifestation, the query attempts to collect additional metadata: title (`?title`), edition (`?edition`), place of publication (`?placeOfPublication`), and year of publication (`?yearOfPublication`). The language (`?langCode`) is then retrieved from the expression, though this may also exist at the manifestation level in some records. Many of these fields are optional and will only be included if present in the data.

In [None]:
sparql.setQuery(
    """
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdarelationships: <http://rdvocab.info/RDARelationshipsWEMI/>
PREFIX rdagroup1elements: <http://rdvocab.info/Elements/>

SELECT ?author ?expression ?title ?edition ?placeOfPublication ?yearOfPublication ?langCode WHERE {
wd:Q2085725 wdt:P268 ?id
BIND(uri(concat(concat("http://data.bnf.fr/ark:/12148/cb", ?id),"#about")) as ?author)
SERVICE <http://data.bnf.fr/sparql> {
  ?expression <http://id.loc.gov/vocabulary/relators/aut> ?author .
  OPTIONAL {?expression dcterms:language ?langCode .}
  OPTIONAL {?manifestation dcterms:publisher ?edition .}
  ?manifestation rdarelationships:expressionManifested ?expression .
  ?manifestation dcterms:title ?title .
  ?manifestation dcterms:date ?yearOfPublication .
  OPTIONAL{ ?manifestation rdagroup1elements:placeOfPublication ?placeOfPublication .}
  }
}
LIMIT 1000 # Check if it's the limit that causes issues
"""
)

## Storing the Query Results and Creating a DataFrame
In this section, we will create a dataframe from the results of our SPARQL query. We will use the `pandas` library to create a dataframe and populate it with the data retrieved from the BNF SPARQL endpoint.

If there are missing values in the data, the code handles them by assigning empty strings (`""`) to the corresponding fields. This ensures that the `data` list contains consistent dictionaries, even when some fields are not available in the SPARQL query results.

In [43]:
# Preview a limited number of results from the SPARQL query
try:
    ret = sparql.queryAndConvert()

    # Limit the number of results to preview
    preview_limit = 7
    # Check if the results contain any bindings
    if "bindings" not in ret["results"]:
        print("No results found.")
        exit(1)
    count = 0

    # Iterate through the results and print the desired fields
    for r in ret["results"]["bindings"]:
        if count >= preview_limit:
            break

    # Print only the most relevant fields
        print(f"Title: {r['title']['value']}")
        print(f"Author: {r['author']['value']}")
        if "yearOfPublication" in r:
            print(f"Year: {r['yearOfPublication']['value']}")
        if "placeOfPublication" in r:
            print(f"Place: {r['placeOfPublication']['value']}")
        if "expression" in r:
            print(f"Expression: {r['expression']['value']}")
        if "edition" in r:
            print(f"Edition: {r['edition']['value']}")
        if "langCode" in r:
            print(f"Language: {r['langCode']['value']}")

        print("---")
        count += 1

except Exception as e:
    print("Exception:")
    print(e)

Title: Dissertation historique et géographique sur le méridien de démarcation entre les domaines d'Espagne et de Portugal, par don Georges Juan,... et don Antoine de Ulloa,... Traduit de l'espagnol
Author: http://data.bnf.fr/ark:/12148/cb119974653#about
Year: 1776
Place: Paris, A. Boudet
Expression: http://data.bnf.fr/ark:/12148/cb32292716n#Expression
Edition: Paris, A. Boudet , 1776. In-8°, 205 p., carte
Language: http://id.loc.gov/vocabulary/iso639-2/fre
---
Title: Voyage historique de l'Amérique méridionale fait par ordre du roi d'Espagne
Author: http://data.bnf.fr/ark:/12148/cb119974653#about
Year: 1752
Place: Paris
Expression: http://data.bnf.fr/ark:/12148/cb32292718b#Expression
Edition: Paris : C.-A. Jombert , 1752
Language: http://id.loc.gov/vocabulary/iso639-2/fre
---
Title: Relacion historica del viage a la America meridional hecho de orden de S. Mag. para medir algunos grados de meridiano terrestre, y venir por ellos en conocimiento de la verdadera figura, y magnitud de la ti

## Storing the Query Results and Creating a DataFrame
In this section, we will create a dataframe from the results of our SPARQL query. We will use the `pandas` library to create a dataframe and populate it with the data retrieved from the BNF SPARQL endpoint.

If there are missing values in the data, the code handles them by assigning empty strings (`""`) to the corresponding fields. This ensures that the `data` list contains consistent dictionaries, even when some fields are not available in the SPARQL query results.

In [45]:
# Initialize an empty list to store the processed data
data = []

# Iterate through the results from the SPARQL query
for r in ret["results"]["bindings"]:
    data.append(
        {
            "author": r.get("author", {}).get("value", ""),
            "expression": r.get("expression", {}).get("value", ""),
            "title": r.get("title", {}).get("value", ""),
            "langCode": r.get("langCode", {}).get("value", ""),
            "edition": r.get("edition", {}).get("value", ""),
            "yearOfPublication": r.get("yearOfPublication", {}).get("value", ""),
            "placeOfPublication": r.get("placeOfPublication", {}).get("value", ""),
        }
    )

# Print the first 15 items
data[0:15]

[{'author': 'http://data.bnf.fr/ark:/12148/cb119974653#about',
  'expression': 'http://data.bnf.fr/ark:/12148/cb32292716n#Expression',
  'title': "Dissertation historique et géographique sur le méridien de démarcation entre les domaines d'Espagne et de Portugal, par don Georges Juan,... et don Antoine de Ulloa,... Traduit de l'espagnol",
  'langCode': 'http://id.loc.gov/vocabulary/iso639-2/fre',
  'edition': 'Paris, A. Boudet , 1776. In-8°, 205 p., carte',
  'yearOfPublication': '1776',
  'placeOfPublication': 'Paris, A. Boudet'},
 {'author': 'http://data.bnf.fr/ark:/12148/cb119974653#about',
  'expression': 'http://data.bnf.fr/ark:/12148/cb32292718b#Expression',
  'title': "Voyage historique de l'Amérique méridionale fait par ordre du roi d'Espagne",
  'langCode': 'http://id.loc.gov/vocabulary/iso639-2/fre',
  'edition': 'Paris : C.-A. Jombert , 1752',
  'yearOfPublication': '1752',
  'placeOfPublication': 'Paris'},
 {'author': 'http://data.bnf.fr/ark:/12148/cb119974653#about',
  'exp

To make this code more robust and handling missing data more efficiently, we can use `.get()` and providing default values for optional fields, e.g. "Unknown xyz". This would prevent errors and ensure that the data processing step works even when some fields are not available in the query results.

In [48]:
# Initialize an empty list to store the processed data
data = []

# Iterate through the results from the SPARQL query
for r in ret["results"]["bindings"]:
    # Use .get() to provide default values for optional fields
    author = r.get("author", {}).get("value", "")
    work = r.get("expression", {}).get("value", "Unknown Work")
    title = r.get("title", {}).get("value", "Unknown Title")
    placeOfPublication = r.get("placeOfPublication", {}).get("value", "Unknown Place")
    yearOfPublication = r.get("yearOfPublication", {}).get("value", "Unknown Year")
    langCode = r.get("langCode", {}).get("value", "Unknown Language")
    edition = r.get("edition", {}).get("value", "Unknown Edition")

    # Append a dictionary containing the extracted data to the list
    data.append(
        {
            "author": author,
            "work": work,
            "title": title,
            "placeOfPublication": placeOfPublication,
            "yearOfPublication": yearOfPublication,
            "langCode": langCode,
            "edition": edition,
        }
    )

# Print the first 15 lines of the dictionary to check the results
data[:15]

[{'author': 'http://data.bnf.fr/ark:/12148/cb119974653#about',
  'work': 'http://data.bnf.fr/ark:/12148/cb32292716n#Expression',
  'title': "Dissertation historique et géographique sur le méridien de démarcation entre les domaines d'Espagne et de Portugal, par don Georges Juan,... et don Antoine de Ulloa,... Traduit de l'espagnol",
  'placeOfPublication': 'Paris, A. Boudet',
  'yearOfPublication': '1776',
  'langCode': 'http://id.loc.gov/vocabulary/iso639-2/fre',
  'edition': 'Paris, A. Boudet , 1776. In-8°, 205 p., carte'},
 {'author': 'http://data.bnf.fr/ark:/12148/cb119974653#about',
  'work': 'http://data.bnf.fr/ark:/12148/cb32292718b#Expression',
  'title': "Voyage historique de l'Amérique méridionale fait par ordre du roi d'Espagne",
  'placeOfPublication': 'Paris',
  'yearOfPublication': '1752',
  'langCode': 'http://id.loc.gov/vocabulary/iso639-2/fre',
  'edition': 'Paris : C.-A. Jombert , 1752'},
 {'author': 'http://data.bnf.fr/ark:/12148/cb119974653#about',
  'work': 'http://

Now  let's convert our dictionary into a pandas DataFrame. We will use the `pd.DataFrame()` function to create a DataFrame from the list of dictionaries. Each dictionary in the list will represent a row in the DataFrame, and the keys of the dictionaries will become the column names.

In [53]:
# Load required libraries
import pandas as pd

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)

# Preview the first 10 rows
df.head(10)

Unnamed: 0,author,work,title,placeOfPublication,yearOfPublication,langCode,edition
0,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb32292716n#Expr...,Dissertation historique et géographique sur le...,"Paris, A. Boudet",1776,http://id.loc.gov/vocabulary/iso639-2/fre,"Paris, A. Boudet , 1776. In-8°, 205 p., carte"
1,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb32292718b#Expr...,Voyage historique de l'Amérique méridionale fa...,Paris,1752,http://id.loc.gov/vocabulary/iso639-2/fre,"Paris : C.-A. Jombert , 1752"
2,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb31511027v#Expr...,Relacion historica del viage a la America meri...,Unknown Place,1748,http://id.loc.gov/vocabulary/iso639-2/spa,", 1748"
3,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb47433920k#Expr...,[Illustrations de Voyage historique de l'Améri...,[S.l.],1752,http://id.loc.gov/vocabulary/iso639-2/fre,"[S.l.] : [s.n.] , [1752 ?]"
4,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb31511029j#Expr...,Voyage historique de l'Amérique méridionale fa...,Amsterdam. - Leipzig,1752,http://id.loc.gov/vocabulary/iso639-2/fre,"Amsterdam ; Leipzig : Arkstée et Merkus , 1752"
5,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb31511017j#Expr...,Dissertation historique et géographique sur le...,Paris,1776,http://id.loc.gov/vocabulary/iso639-2/fre,"Paris : A. Boudet , 1776"
6,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb315110255#Expr...,"Observaciones astronómicas y phísicas, hechas ...",Madrid,1748,http://id.loc.gov/vocabulary/iso639-2/spa,"Madrid : J. de Zúñiga , 1748"
7,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb31511024t#Expr...,Jorge Juan y Antonio de Ulloa. Noticias secret...,Madrid,1918,http://id.loc.gov/vocabulary/iso639-2/spa,"Madrid : Editorial-América , 1918"
8,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb31511015v#Expr...,Description de l'Amérique méridionale d'après ...,Tours,1845,http://id.loc.gov/vocabulary/iso639-2/fre,"Tours : A. Mame , 1845"
9,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb36663304d#Expr...,Noticias secretas de América,Unknown Place,1991,http://id.loc.gov/vocabulary/iso639-2/spa,": Historia 16 , 1991"


In [54]:
# Some basic statistics about the DataFrame
df.describe()

Unnamed: 0,author,work,title,placeOfPublication,yearOfPublication,langCode,edition
count,32,32,32,32,32,32,32
unique,1,32,29,14,18,5,27
top,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb32292716n#Expr...,Dissertation historique et géographique sur le...,Madrid,1744,http://id.loc.gov/vocabulary/iso639-2/spa,"[S.l.] : [s.n.] , [1744]"
freq,32,1,2,9,6,18,4


In [56]:
# Checking extant data types
df.dtypes

author                object
work                  object
title                 object
placeOfPublication    object
yearOfPublication     object
langCode              object
edition               object
dtype: object

In [57]:
# Sort the DataFrame by yearOfPublication
sorted_df = df.sort_values(by="yearOfPublication")

# Preview the first 10 rows
sorted_df.head(10)

Unnamed: 0,author,work,title,placeOfPublication,yearOfPublication,langCode,edition
27,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb40758685h#Expr...,Plan sénographique (sic) de la cité des rois o...,[S.l.],1744,http://id.loc.gov/vocabulary/iso639-2/fre,"[S.l.] : [s.n.] , [1744]"
24,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb40714961j#Expr...,Carta de la meridiana medida en el reyno de Qu...,Matriti,1744,http://id.loc.gov/vocabulary/iso639-2/spa,"Matriti : [s.n.] , [1744]"
25,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb40756914b#Expr...,Grundrifs von der Conceptions-bay... in Jahre ...,[S.l.],1744,http://id.loc.gov/vocabulary/iso639-2/ger,"[S.l.] : [s.n.] , [1744]"
26,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb40733276t#Expr...,Grundriss von dem Meerbusen und Hafen Valparay...,[S.l.],1744,http://id.loc.gov/vocabulary/iso639-2/ger,"[S.l.] : [s.n.] , [1744]"
28,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb407586845#Expr...,Plan sénographique (sic) de la cité des rois o...,[S.l.],1744,http://id.loc.gov/vocabulary/iso639-2/fre,"[S.l.] : [s.n.] , [1744]"
29,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb40755700v#Expr...,"El puerto de El Callao, en el Mar Pacyfyco, od...",[S.l.],1744,http://id.loc.gov/vocabulary/iso639-2/spa,"[S.l.] : [s.n.] , 1744"
31,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb459969115#Expr...,Relacion historica del viage a la América meri...,Madrid,1748,http://id.loc.gov/vocabulary/iso639-2/spa,"Madrid : [s.n.] , 1748"
2,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb31511027v#Expr...,Relacion historica del viage a la America meri...,Unknown Place,1748,http://id.loc.gov/vocabulary/iso639-2/spa,", 1748"
6,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb315110255#Expr...,"Observaciones astronómicas y phísicas, hechas ...",Madrid,1748,http://id.loc.gov/vocabulary/iso639-2/spa,"Madrid : J. de Zúñiga , 1748"
18,http://data.bnf.fr/ark:/12148/cb119974653#about,http://data.bnf.fr/ark:/12148/cb30665853n#Expr...,Dissertación histórica y geográphica sobre el ...,En Madrid,1749,http://id.loc.gov/vocabulary/iso639-2/spa,"En Madrid : en la impr. de A. Marín , 1749"
