# SPARQL

The __Semantic Web__ is an extension of the World Wide Web that aims to make information on the web more meaningful and interconnected. It's based on the idea that web content should not only be readable by humans but also interpretable by machines. The goal is to enable machines to understand the context, meaning, and relationships between different pieces of information. Key components of the Semantic Web include:

1. __Resource Description Framework (RDF)__ is a standard model for representing and interchanging data on the Web. RDF's basic unit of information is given as a __triple__ composed of a subject (s), a predicate (p), and an object (o). In each triple: 
    - the __subject__ represents a fact on a thing being 
    - the __predicate__ represents a specific property 
    - the __object__ a given value. (Curâe and Blin. 2014)

    
2. __Web Ontology Language (OWL)__: A language for creating ontologies, which define relationships and categories of things in a specific domain. OWL allows for the formal description of concepts and their relationships.

3. __SPARQL__ is a recursive acronym for SPARQL Protocol and RDF Query Language. SPARQL is able to retrieve and manipulate data stored in Resource Description Framework (RDF) format and allows for a query to consist of triple patterns. A __SPARQL Endpoint__ is web service that allows you to query and interact with RDF (Resource Description Framework) data using SPARQL.


__Linked Open Data (LOD)__ is essentially a specific implementation or application of Semantic Web principles.

## Useful Sites

#### General
- [Introduction to SPARQL, RDF, and LOD](https://github.com/o-date/sparql-and-lod/blob/master/sparql-intro.ipynb), a Jupyter Notebook by Shawn Graham
- [SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/)
- [SPARQL Query Validator](https://sparql.org/query-validator.html)

#### Nomisma.org
- [Nomisma.org SPARQL Endpoint](http://nomisma.org/sparql/)
- [SPARQL Examples from Nomisma.org](http://nomisma.org/documentation/sparql/)

#### Wikidata
- [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page)
- [Wikidata's SPARQL Tutorial](https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial)
- [Wikidata Query Service](https://query.wikidata.org/)
- [Tutorial on how to build SPARQL queries on Wikidata](https://www.youtube.com/watch?v=YC6jyl4hAxQ)
- [Petscan](https://petscan.wmflabs.org/)

#### Imports

In [8]:
#pip3 install sparqlwrapper
from SPARQLWrapper import SPARQLWrapper, JSON
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 3)

__SPARQLWrapper__ is a simple Python wrapper around a SPARQL service to remotelly execute your queries. It helps in creating the query invokation and, possibly, convert the result into a more manageable format. [Link](https://rdflib.dev/sparqlwrapper/doc/1.8.5/main.html)

#### Make Clickable
The function `make_clickable` takes a string `link` as an argument and returns a formatted string containing an HTML anchor (`<a>`) element. 

In [None]:
def make_clickable(link):
    return f'<a href="{link}" target="_blank">{link}</a>'

---

## Nomisma.org

#### Prefixes

#### A simple query. Example from [Wikidata](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Cats)

In [9]:
ns_dict = {"crm": "http://www.cidoc-crm.org/cidoc-crm/",
            "dcterms": "http://purl.org/dc/terms/",
            "dcmitype": "http://purl.org/dc/dcmitype/",
            "foaf": "http://xmlns.com/foaf/0.1/",
            "geo": "http://www.w3.org/2003/01/geo/wgs84_pos#",
            "nm": "http://nomisma.org/id/",
            "nmo": "http://nomisma.org/ontology#",
            "org": "http://www.w3.org/ns/org#",
            "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
            "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
            "skos": "http://www.w3.org/2004/02/skos/core#",
            "xsd": "http://www.w3.org/2001/XMLSchema#"}

## these "xsd" and whatnot are called 'prefixes' in sparql

In [10]:
sparql = SPARQLWrapper("http://nomisma.org/query")

In [13]:
sparql.setQuery("""
SELECT ?identifier

WHERE {
?identifier a  <http://nomisma.org/ontology#Hoard>

}
LIMIT 10
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

In [14]:
results

{'head': {'vars': ['identifier']},
 'results': {'bindings': [{'identifier': {'type': 'uri',
     'value': 'http://coinhoards.org/id/igch1670'}},
   {'identifier': {'type': 'uri',
     'value': 'http://coinhoards.org/id/igch1664'}},
   {'identifier': {'type': 'uri',
     'value': 'http://coinhoards.org/id/igch1794'}},
   {'identifier': {'type': 'uri',
     'value': 'http://coinhoards.org/id/igch1795'}},
   {'identifier': {'type': 'uri',
     'value': 'http://coinhoards.org/id/igch1472'}},
   {'identifier': {'type': 'uri',
     'value': 'http://nomisma.org/id/lliria_hoard'}},
   {'identifier': {'type': 'uri',
     'value': 'http://coinhoards.org/id/igch0271'}},
   {'identifier': {'type': 'uri',
     'value': 'http://coinhoards.org/id/igch1667'}},
   {'identifier': {'type': 'uri',
     'value': 'http://coinhoards.org/id/igch0083'}},
   {'identifier': {'type': 'uri',
     'value': 'http://coinhoards.org/id/igch1395'}}]}}

In [16]:
result_df = pd.json_normalize(results['results']['bindings'])
result_df[['identifier.value']] #type is dataFrame with two brackets

Unnamed: 0,identifier.value
0,http://coinhoards.org/id/igch1670
1,http://coinhoards.org/id/igch1664
2,http://coinhoards.org/id/igch1794
3,http://coinhoards.org/id/igch1795
4,http://coinhoards.org/id/igch1472
5,http://nomisma.org/id/lliria_hoard
6,http://coinhoards.org/id/igch0271
7,http://coinhoards.org/id/igch1667
8,http://coinhoards.org/id/igch0083
9,http://coinhoards.org/id/igch1395


---

## Wikipedia


#### [Wikidata identifiers](https://www.wikidata.org/wiki/Wikidata:Identifiers)
Each Wikidata entity is identified by an entity ID, which is a number prefixed by a letter.

- items, also known as Q-items, are prefixed with Q (e.g. Q12345),
- properties are prefixed by P (e.g. P569) and
- lexemes are prefixed by L (e.g. L1).

Entity IDs can also be used as globally unique URIs that follow the pattern http://www.wikidata.org/entity/ID where ID is an entity ID.

In SPARQL, prefixes are a way to define shorthand abbreviations for URIs in order to make queries more concise and readable. They are used to create a namespace for the URIs, allowing you to refer to them with a shorter alias.

In [20]:
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

In [22]:
# From https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Cats

sparql.setQuery("""
SELECT ?item ?itemLabel 

WHERE{

?item wdt:P31 wd:Q146 .

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
result_df = pd.json_normalize(results['results']['bindings'])
result_df

Unnamed: 0,item.type,item.value,itemLabel.xml:lang,itemLabel.type,itemLabel.value
0,uri,http://www.wikidata.org/entity/Q27745008,en,literal,Luca
1,uri,http://www.wikidata.org/entity/Q27745009,en,literal,Seri
2,uri,http://www.wikidata.org/entity/Q27745011,en,literal,Marble
3,uri,http://www.wikidata.org/entity/Q28114532,en,literal,Nala Cat
4,uri,http://www.wikidata.org/entity/Q28665865,en,literal,Myka
5,uri,http://www.wikidata.org/entity/Q28792126,en,literal,Gli
6,uri,http://www.wikidata.org/entity/Q30600575,en,literal,Orlando
7,uri,http://www.wikidata.org/entity/Q42442324,en,literal,Kiisu Miisu
8,uri,http://www.wikidata.org/entity/Q43260736,en,literal,Paddles
9,uri,http://www.wikidata.org/entity/Q48895080,en,literal,Hamilton


In [None]:
#horses below

In [27]:
# From https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Cats

sparql.setQuery("""
SELECT ?item ?itemLabel 

WHERE{

?item wdt:P31 wd:Q726 .

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
result_df = pd.json_normalize(results['results']['bindings'])
result_df

Unnamed: 0,item.type,item.value,itemLabel.xml:lang,itemLabel.type,itemLabel.value
0,uri,http://www.wikidata.org/entity/Q26798,en,literal,Nijinsky
1,uri,http://www.wikidata.org/entity/Q48857,en,literal,Skowronek
2,uri,http://www.wikidata.org/entity/Q114218,en,literal,Man o' War
3,uri,http://www.wikidata.org/entity/Q166541,en,literal,Florizel
4,uri,http://www.wikidata.org/entity/Q176271,en,literal,Marengo
5,uri,http://www.wikidata.org/entity/Q180387,en,literal,Sunday Silence
6,uri,http://www.wikidata.org/entity/Q201598,en,literal,Bucephalus
7,uri,http://www.wikidata.org/entity/Q210956,en,literal,Hatsushiba O
8,uri,http://www.wikidata.org/entity/Q218422,en,literal,Allabaculia
9,uri,http://www.wikidata.org/entity/Q239442,en,literal,Huaso


---

#### Dogs

In [30]:
# From https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Cats

sparql.setQuery("""
SELECT ?item ?itemLabel 

WHERE{

?item wdt:P31 wd:Q144 .

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
result_df = pd.json_normalize(results['results']['bindings'])
result_df['item.value'] = result_df['item.value'].apply(make_clickable)
result_df

Unnamed: 0,item.type,item.value,itemLabel.xml:lang,itemLabel.type,itemLabel.value
0,uri,"<a href=""http://www.wikidata.org/entity/Q42235...",en,literal,Abuwtiyuw
1,uri,"<a href=""http://www.wikidata.org/entity/Q15569...",en,literal,Blondi
2,uri,"<a href=""http://www.wikidata.org/entity/Q18648...",en,literal,Hachikō
3,uri,"<a href=""http://www.wikidata.org/entity/Q28057...",en,literal,Pickles
4,uri,"<a href=""http://www.wikidata.org/entity/Q38434...",en,literal,Mancs
5,uri,"<a href=""http://www.wikidata.org/entity/Q49239...",en,literal,Moose
6,uri,"<a href=""http://www.wikidata.org/entity/Q49526...",en,literal,Snuppy
7,uri,"<a href=""http://www.wikidata.org/entity/Q64156...",en,literal,Red Dog
8,uri,"<a href=""http://www.wikidata.org/entity/Q68772...",en,literal,Sergeant Stubby
9,uri,"<a href=""http://www.wikidata.org/entity/Q69709...",en,literal,Strongheart


---

In [28]:
def make_clickable(link):
    return f'<a href="{link}" target="_blank">{link}</a>'

#### American policians who were born in Chicago

In [44]:
sparql.setQuery("""
SELECT ?politician ?politicianLabel

WHERE
{
    ?politician wdt:P31 wd:Q5 .  # instance of human
    ?politician wdt:P27 wd:Q30 . # citizenship USA
    ?politician wdt:P19 wd:Q1297 . # born in Chicago
    ?politician wdt:P106 wd:Q82955 . # occupation is politician
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 10

""")

sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results_df = pd.json_normalize(results['results']['bindings'])
results_df = results_df[['politician.value','politicianLabel.value']]
# results_df['item.value'] = results_df['item.value'].apply(make_clickable)
results_df.style.format()
results_df


Unnamed: 0,politician.value,politicianLabel.value
0,http://www.wikidata.org/entity/Q365323,Adolph Dubs
1,http://www.wikidata.org/entity/Q376645,John Seymour
2,http://www.wikidata.org/entity/Q380306,Dan Boyle
3,http://www.wikidata.org/entity/Q436272,Medill McCormick
4,http://www.wikidata.org/entity/Q437599,Todd Stern
5,http://www.wikidata.org/entity/Q440445,Elmer L. Andersen
6,http://www.wikidata.org/entity/Q440885,Jan Schakowsky
7,http://www.wikidata.org/entity/Q311141,John Ashcroft
8,http://www.wikidata.org/entity/Q321457,Jim McDermott
9,http://www.wikidata.org/entity/Q325960,Phil Crane


In [None]:
### in wdt:P31 wd:Q5 .
## the wdt is for properties, wd is for items
# the . is for table format

---

#### American policians who were not born in Chicago

In [None]:
sparql.setQuery("""

""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results_df = pd.json_normalize(results['results']['bindings'])
results_df = results_df[['item.value', 'itemLabel.value']]
results_df['item.value'] = results_df['item.value'].apply(make_clickable)
results_df.style.format()

---

#### American policians who were born in Chicago after 1950 and are still alive

In [None]:
sparql.setQuery("""


""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results_df = pd.json_normalize(results['results']['bindings'])
results_df = results_df[['item.value', 'itemLabel.value']]
results_df['item.value'] = results_df['item.value'].apply(make_clickable)
results_df.style.format()

---

#### Greek policians after 1900 and are still alive, along with their place of birth

In [None]:
sparql.setQuery("""


""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results_df = pd.json_normalize(results['results']['bindings'])
results_df = results_df.loc[:,~results_df.columns.str.contains('.type')] # removes the columns that contain '.type'
results_df = results_df.loc[:,~results_df.columns.str.contains('.lang')] # removes the columns that contain '.lang'


---

#### Greek policians whose father was a politician

In [None]:
sparql.setQuery("""


""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results_df = pd.json_normalize(results['results']['bindings'])
results_df = results_df.loc[:,~results_df.columns.str.contains('.type')] # removes the columns that contain '.type'
results_df = results_df.loc[:,~results_df.columns.str.contains('.lang')] # removes the columns that contain '.lang'
results_df[['item.value', 'itemLabel.value', 'father.value', 'fatherLabel.value']]

---

Advanced queries:
- [Presidents and their causes of death](https://query.wikidata.org/#%23Presidents%20and%20their%20causes%20of%20death%20ranking%0A%23defaultView%3ABubbleChart%0ASELECT%20%3Fcid%20%3Fcause%20%28count%28%2a%29%20as%20%3Fcount%29%0AWHERE%0A%7B%0A%20%20%3Fpid%20wdt%3AP39%20wd%3AQ11696%20.%0A%20%20%3Fpid%20wdt%3AP509%20%3Fcid%20.%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%3Fcid%20rdfs%3Alabel%20%3Fcause%20FILTER%20%28lang%28%3Fcause%29%20%3D%20%22en%22%29%20.%0A%20%20%7D%0A%7D%0AGROUP%20BY%20%3Fcid%20%3Fcause%0AORDER%20BY%20DESC%28%3Fcount%29%20ASC%28%3Fcause%29)
- [Greek policians after 1900 and are still alive, along with their place of birth](https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3Fbirthplace%20%3FbirthplaceLabel%20%3Fgeo%0AWHERE%0A%7B%0A%20%20%20%20%3Fitem%20wdt%3AP31%20wd%3AQ5%20.%20%23%20instance%20of%20human%0A%20%20%20%20%3Fitem%20wdt%3AP27%20wd%3AQ41%20.%20%23%20citizenship%20Greece%0A%20%20%20%20%3Fitem%20wdt%3AP106%20wd%3AQ82955%20.%20%23%20occupation%20politician%0A%20%20%20%20%0A%20%20%20%20%3Fitem%20wdt%3AP19%20%3Fbirthplace%20.%0A%20%20%20%20%3Fbirthplace%20wdt%3AP625%20%3Fgeo%20.%0A%20%20%20%20%20%20%20%20%0A%20%20%20%20%0A%20%20%20%20%3Fitem%20wdt%3AP569%20%3Fbirthdate%20.%20%20%23%20date%20of%20birth%0A%20%20%20%20%0A%20%20%20%20FILTER%28YEAR%28%3Fbirthdate%29%20%3E%201900%29%0A%20%20%0A%20%20%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%0A%7D%0A)

#### Athenian writers of fifth-century BCE

In [None]:
sparql.setQuery("""


""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results_df = pd.json_normalize(results['results']['bindings'])
results_df = results_df.loc[:,~results_df.columns.str.contains('.type')] # removes the columns that contain '.type'
results_df = results_df.loc[:,~results_df.columns.str.contains('.lang')] # removes the columns that contain '.lang'
results_df

---

#### Use Wikidata API to search for entities (items or properties)

In [33]:
import requests

def search_entities(search_string, entity_type, language='en'):
    endpoint_url = "https://www.wikidata.org/w/api.php"

    # Define the parameters for the API call
    params = {
        'action': 'wbsearchentities',
        'format': 'json',
        'search': search_string,
        'type': entity_type,
        'language': language   
    }

    # Make the API call
    response = requests.get(endpoint_url, params=params)
    data = response.json()

    # Extract and return entity IDs and labels
    entities = data.get('search', [])
    return entities

def display_entities(entities, entity_type):
    print(f"\n{entity_type.capitalize()}:")
    for entity in entities:
        print(f"Entity ID: {entity['id']} - Label: {entity['label']} - Description: {entity['display']['description']['value']} - {entity['concepturi']}")

if __name__ == "__main__":
    search_string = input("Enter the string to search for: ")
    
    # Search for items
    items = search_entities(search_string, 'item')
    display_entities(items, 'items')

    # Search for properties
    properties = search_entities(search_string, 'property')
    display_entities(properties, 'properties')

Enter the string to search for: politician

Items:
Entity ID: Q82955 - Label: politician - Description: person involved in politics; person who holds or seeks positions in government - http://www.wikidata.org/entity/Q82955
Entity ID: Q64581291 - Label: Politician - Description: drawing in the National Gallery of Art (NGA 7042) - http://www.wikidata.org/entity/Q64581291
Entity ID: Q104854338 - Label: Politician - Description: 1985 novel by Piers Anthony - http://www.wikidata.org/entity/Q104854338
Entity ID: Q4699959 - Label: Rohit Pal - Description: public figure - http://www.wikidata.org/entity/Q4699959
Entity ID: Q51556674 - Label: Politician - Description: song by Cream - http://www.wikidata.org/entity/Q51556674
Entity ID: Q18658290 - Label: Politician - Description: a person whose job is in politics, especially one who is a member of parliament or of the government - http://www.wikidata.org/entity/Q18658290
Entity ID: Q65022317 - Label: Politician - Description: print in the Nationa

---