# **Knowledge Graph Tools in Python**

## **Introduction**
This notebook demonstrates how to use knowledge graph tools in Python, focusing on DBpedia, with SPARQL to query the knowledge graph. It will cover the basics of setting up the environment, interacting with a knowledge graph, and visualizing it.

## **Table of Contents**
1. [What are Knowledge Graphs?](#what-are-knowledge-graphs)
2. [DBpedia](#dbpedia)
4. [Conclusion and Further Reading](#conclusion-and-further-reading)

---


## **What are Knowledge Graphs?**

Knowledge graphs are structured representations of information, where entities (nodes) are connected by relationships (edges). They are often used to represent large-scale information in a way that machines can easily process. Popular examples of publicly available knowledge graphs include:
- **DBpedia**: A knowledge graph extracted from Wikipedia, containing millions of structured facts about various entities.
- **Yago**: A large knowledge base that combines information from Wikipedia and WordNet.

Both DBpedia and Yago can be queried using the **SPARQL** language, which allows users to retrieve data from RDF (Resource Description Framework) datasets. We're focusing on DBpedia because it has more documentation and suport than Yago does.


## **DBpedia**

In this section, we will demonstrate how to query the DBpedia knowledge graph using SPARQL. Specifically, we will retrieve a list of all adventure movies.

### Prerequisites
Make sure that you have the necessary libraries installed. If you haven't installed `SPARQLWrapper` yet, you can using the following command:

```bash
!pip install SPARQLWrapper
```

### Connecting to the DBpedia SPARQL Endpoint

First, we need to set up a connection to the DBpedia SPARQL endpoint using the SPARQLWrapper library.


In [82]:
from SPARQLWrapper import SPARQLWrapper, JSON

# Define the DBpedia SPARQL endpoint
sparql = SPARQLWrapper("https://dbpedia.org/sparql")

### Constructing and Executing a SPARQL Query

We will construct a SPARQL query to retrieve a list of all adventure movies.


In [83]:
# Define the SPARQL query
query_all_adventure_films = """
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>

SELECT DISTINCT ?film, ?number, ?abstract, ?name
WHERE
     {
        ?film dbo:wikiPageWikiLink dbr:Adventure_film .
        ?film dbo:wikiPageID ?number .
        ?film rdfs:comment ?abstract .
        ?film dbp:name ?name .
      }
"""

# Set the query and return format
sparql.setQuery(query_all_adventure_films)
sparql.setReturnFormat(JSON)


### Processing and Displaying the Results

After executing the query, we can process and display the results in a readable format.



In [84]:
try:
    # Execute the query
    results = sparql.query().convert()

    # Process and display the results
    for result in results["results"]["bindings"]:
        film = result["film"]["value"]
        number = result["number"]["value"]
        abstract = result["abstract"]["value"]
        name = result["name"]["value"]

        print(f"Film: {film}")
        print(f"WikiPageID: {number}")
        print(f"Abstract: {abstract}")
        print(f"Name: {name}")
        print("-----")

except Exception as e:
    print("An error occurred:", e)


Film: http://dbpedia.org/resource/Cafe_Moscow
WikiPageID: 43287691
Abstract: Cafe Moscow (Hungarian:Café Moszkva) is a 1936 Hungarian adventure film directed by Steve Sekely and starring Anna Tõkés, Gyula Csortos and Ferenc Kiss. Art direction was by . It is also known by the alternative title Only One Night. The film is set during the First World War on the Eastern Front between Russia and the Austro-Hungarian Empire. The film was intended to convey an anti-war message.
Name: Cafe Moscow
-----
Film: http://dbpedia.org/resource/Call_of_the_Sea
WikiPageID: 20637087
Abstract: Call of the Sea is a 1930 British adventure film directed by Leslie S Hiscott.
Name: Call of the Sea
-----
Film: http://dbpedia.org/resource/Call_of_the_Wild_(1935_film)
WikiPageID: 18505739
Abstract: The Call of the Wild és una pel·lícula estatunidenca dirigida per William A. Wellman i estrenada l'any 1935.
Name: Call of the Wild
-----
Film: http://dbpedia.org/resource/Call_of_the_Wild_(1935_film)
WikiPageID: 18505

### Advanced Queries

Now we'll retrieve information about countries. The query will include multiple SPARQL features like FILTER, OPTIONAL, ORDER BY, and LIMIT. Specifically, we'll query for countries, its capital, population, GDP, and region. The query will filter for countries with a population greater than 10 million.

In [92]:
query_countries = ("""
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX dbr: <http://dbpedia.org/resource/>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    
    SELECT DISTINCT ?country ?name ?capital ?population ?gdp ?region
    WHERE
    {
        # Get the country and its name
        ?country a dbo:Country .
        ?country foaf:name ?name .
                   
        FILTER (STR(?name) != "")
        
        # Get the population (mandatory)
        ?country dbo:populationTotal ?population .
        
        # Get the capital and GDP (optional)
        ?country dbo:capital ?capital .
        FILTER (STR(?capital) != "")
                              
        OPTIONAL { ?country dbo:grossDomesticProduct ?gdp } .
        
        # Get the region or continent (optional)
        OPTIONAL { ?country dbo:region ?region } .
        
        # Filter to include countries with a population greater than 10 million
        FILTER (?population > 10000000) .
    }
    ORDER BY ?name
    LIMIT 20
""")

sparql.setQuery(query_countries)

# Set the return format to JSON
sparql.setReturnFormat(JSON)

try:
    # Execute the query and convert results to JSON
    results = sparql.query().convert()

    # Print the raw results to check the data
    print("Raw Results:", results)

    # Iterate over the results and print them
    if results["results"]["bindings"]:
        for result in results["results"]["bindings"]:
            country = result["country"]["value"]
            name = result["name"]["value"]
            
            # Optional fields
            population = result["population"]["value"]
            capital = result.get("capital", {}).get("value", "N/A")
            gdp = result.get("gdp", {}).get("value", "N/A")
            region = result.get("region", {}).get("value", "N/A")

            print(f"Country: {name}")
            print(f"DBpedia URI: {country}")
            print(f"Population: {population}")
            print(f"Capital: {capital}")
            print(f"GDP: {gdp}")
            print(f"Region: {region}")
            print("-----")
    else:
        print("No results found")

except Exception as e:
    print("An error occurred:", e)


Raw Results: {'head': {'link': [], 'vars': ['country', 'name', 'capital', 'population', 'gdp', 'region']}, 'results': {'distinct': False, 'ordered': True, 'bindings': [{'country': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Afghanistan'}, 'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Afghanistan'}, 'capital': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Kabul'}, 'population': {'type': 'typed-literal', 'datatype': 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger', 'value': '38346720'}}, {'country': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Algeria'}, 'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Algeria'}, 'capital': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Algiers'}, 'population': {'type': 'typed-literal', 'datatype': 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger', 'value': '44700000'}}, {'country': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Angola'}, 'name': {'type': 'literal', 'xml:lang': 'en', 'val

## **Conclusion and Further Reading**

In this notebook, we covered the basics of working with knowledge graphs, performing SPARQL queries on DBpedia, and visualizing graphs using Python. For further reading, you can check out:

* [DBpedia Documentation](http://dev.dbpedia.org/)
* [SPARQL Documentation](https://www.w3.org/TR/rdf-sparql-query/)