In [1]:
import os

os.chdir('..')

# Extend to DBPedia

[DBpedia](https://www.dbpedia.org/) extracts structured information from Wikipedia and make it available on the web as a public knowledge base. It was launched in 2007 and is maintained by the DBpedia Association, a non-profit organization based in Leipzig, Germany.

In this notebook, we demonstrate how to extend SRTK to DBPedia.

## 1. Know the inferfaces

We defined multiple interfaces that abstract common interactions with knowledge graphs. `KnowledgeGraphBase` serves as a base class for all knowledge graphs.

To extend to a new knowledge graph, you need to implement the following interfaces:

- `deduce_leaves`: deduce leave entities from source entity following the path.
- `get_label`: get the label of an entity.
- `get_neighbor_relations`: get n-hop neighbor relations of a node.
- `search_one_hop_relations`: search one-hop relations between two nodes.
- `search_two_hop_relations`: search two-hop relations between two nodes.

`get_entity_label` and `get_relation_label` are optional. They are default to `get_label`.

To be more specific, you mainly need to compose the SPARQL query for each interface.

In [4]:
from srtk.knowledge_graph import KnowledgeGraphBase
[prop for prop in dir(KnowledgeGraphBase) if not prop.startswith('_')]

['deduce_leaves',
 'get_entity_label',
 'get_label',
 'get_neighbor_relations',
 'get_relation_label',
 'search_one_hop_relations',
 'search_two_hop_relations']

## 2. Implement the interfaces

As it is impossible to define a class step by step in a notebook, for demonstration purpose, we do it in a dirty way: we define an empty class first, then assgin member methods to it step by step.

### 2.1 Initialization

At this step, we define the [prefixes of DPBedia](https://dbpedia.org/sparql/?help=nsdecl) to the class, and define a SPARQL query handler.

In [14]:
from SPARQLWrapper import SPARQLWrapper

class DBPedia(KnowledgeGraphBase):
    PREFIXES: str = """PREFIX dbo: <http://dbpedia.org/ontology/>
                       PREFIX dbr: <http://dbpedia.org/resource/>
                       """
    def __init__(self, endpoint, prepend_prefixes=False):
        self.sparql = SPARQLWrapper(endpoint)
        self.sparql.setReturnFormat(JSON)
        self.prepend_prefixes = prepend_prefixes
        self.name = 'dbpedia'
    
    def queryDBPedia(self, query):
        if self.prepend_prefixes:
            query = self.PREFIXES + query

        self.sparql.setQuery(query)
        try:
            ret = self.sparql.queryAndConvert()
            result = ret['results']['bindings']
        except Exception as exeption:
            print(f'Failed executing query: {query}')
            print(f'Exception: {exeption}')
            result = []
        return result

### 2.2 `search_one_hop_relations`

In [5]:
help(KnowledgeGraphBase.search_one_hop_relations)

Help on function search_one_hop_relations in module srtk.knowledge_graph.graph_base:

search_one_hop_relations(self, src: str, dst: str) -> List[List[str]]
    Search one hop relations between src and dst.
    
    Args:
        src (str): source entity
        dst (str): destination entity
    
    Returns:
        list[list[str]]: list of paths, each path is a list of PIDs



In [15]:
def search_one_hop_relations(self, src, dst):
    query = f"""
            SELECT DISTINCT ?r WHERE {{
                dbr:{src} ?r dbr:{dst}.
                FILTER regex(str(?r), "^http://dbpedia.org/ontology/")
            }}
            """
    paths = self.queryWikidata(query)
    # Keep only PIDs in the paths
    paths = [[self.get_pid_from_uri(path['r']['value'])] for path in paths]
    return paths

### 2.3 `search_two_hop_relations`