In [None]:
import os

os.chdir('..')

# Extend to DBPedia

[DBpedia](https://www.dbpedia.org/) extracts structured information from Wikipedia and make it available on the web as a public knowledge base. It was launched in 2007 and is maintained by the DBpedia Association, a non-profit organization based in Leipzig, Germany.

In this notebook, we demonstrate how to extend SRTK to DBPedia.

## 1. Know the inferfaces

We defined multiple interfaces that abstract common interactions with knowledge graphs. `KnowledgeGraphBase` serves as a base class for all knowledge graphs.

To extend to a new knowledge graph, you need to implement the following interfaces:

- `deduce_leaves`: deduce leave entities from source entity following the path.
- `get_label`: get the label of an entity.
- `get_neighbor_relations`: get n-hop neighbor relations of a node.
- `search_one_hop_relations`: search one-hop relations between two nodes.
- `search_two_hop_relations`: search two-hop relations between two nodes.

`get_entity_label` and `get_relation_label` are optional. They are default to `get_label`.

To be more specific, you mainly need to compose the SPARQL query for each interface.

In [None]:
from srtk.knowledge_graph import KnowledgeGraphBase
[prop for prop in dir(KnowledgeGraphBase) if not prop.startswith('_')]

['deduce_leaves',
 'get_entity_label',
 'get_label',
 'get_neighbor_relations',
 'get_relation_label',
 'search_one_hop_relations',
 'search_two_hop_relations']

## 2. Implement the KG interfaces

As it is impossible to define a class step by step in a notebook, for demonstration purpose, we do it in a dirty way: we define an empty class first, then assgin member methods to it step by step.

### 2.1 Initialization

At this step, we define the [prefixes of DPBedia](https://dbpedia.org/sparql/?help=nsdecl) to the class, and define a SPARQL query handler.

In [None]:
from SPARQLWrapper import SPARQLWrapper

class DBPediaInit(KnowledgeGraphBase):
    PREFIXES: str = """PREFIX dbo: <http://dbpedia.org/ontology/>
                       PREFIX dbr: <http://dbpedia.org/resource/>
                       """
    def __init__(self, endpoint, prepend_prefixes=False):
        self.sparql = SPARQLWrapper(endpoint)
        self.sparql.setReturnFormat(JSON)
        self.prepend_prefixes = prepend_prefixes
        self.name = 'dbpedia'
    
    def queryDBPedia(self, query):
        if self.prepend_prefixes:
            query = self.PREFIXES + query

        self.sparql.setQuery(query)
        try:
            ret = self.sparql.queryAndConvert()
            result = ret['results']['bindings']
        except Exception as exeption:
            print(f'Failed executing query: {query}')
            print(f'Exception: {exeption}')
            result = []
        return result

### 2.2 `search_one_hop_relations`

In [None]:
help(KnowledgeGraphBase.search_one_hop_relations)

Help on function search_one_hop_relations in module srtk.knowledge_graph.graph_base:

search_one_hop_relations(self, src: str, dst: str) -> List[List[str]]
    Search one hop relations between src and dst.
    
    Args:
        src (str): source entity
        dst (str): destination entity
    
    Returns:
        list[list[str]]: list of paths, each path is a list of PIDs



To find out how to implement the query, we can do the actual experiments on a [DBpedia SPARQL Query Editor](https://dbpedia.org/sparql). For example, we can find out the relations between Elizabeth II and Charles III by the following query:


```sparql
SELECT ?relation
WHERE {
  dbr:Elizabeth_II ?relation dbr:Charles_III.
}
```

This is the result:
```bash
relation
http://dbpedia.org/ontology/wikiPageWikiLink
http://dbpedia.org/property/after
http://dbpedia.org/property/successor
http://dbpedia.org/ontology/child
http://dbpedia.org/ontology/successor
```

We'd like to only keep the relation with prefix *http://dbpedia.org/ontology*, as the property is largely duplicated with it. Besides, the prefix *http://dbpedia.org/ontology/wikiPageWikiLink* is not usefull for our purpose. Therefore, we add two filters and modify the query to the following:

```sparql
SELECT ?relation
WHERE {
  dbr:Elizabeth_II ?relation dbr:Charles_III.
  FILTER regex(str(?r), "^http://dbpedia.org/ontology/")
  FILTER (?r != dbo:wikiPageWikiLink)
}
```

The result becomes:
```
relation
http://dbpedia.org/ontology/child
http://dbpedia.org/ontology/successor
```

It looks much better now. We can now implement the interface.

Note: at the time of writing, it's the coronation of Charles III, therefore this example comes to my mind. I apologize if it's not a good example.

In [None]:
def search_one_hop_relations(self, src, dst):
    query = f"""
            SELECT DISTINCT ?r WHERE {{
                dbr:{src} ?r dbr:{dst}.
                FILTER regex(str(?r), "^http://dbpedia.org/ontology/")
            }}
            """
    paths = self.queryWikidata(query)
    # Keep only PIDs in the paths
    paths = [[self.get_pid_from_uri(path['r']['value'])] for path in paths]
    return paths

### 2.3 The Rest

Following similar steps, we can implement the rest of the interfaces. To avoid repetition, we only show the final code here.

In [None]:
class DBPedia(DBPediaInit):
    
    def search_one_hop_relations(self, src, dst):
        query = f"""
                SELECT DISTINCT ?r WHERE {{
                    dbr:{src} ?r dbr:{dst}.
                    FILTER regex(str(?r), "^http://dbpedia.org/ontology/")
                    FILTER (?r != dbo:wikiPageWikiLink)
                }}
                """
        paths = self.queryDBPedia(query)
        paths = [[self.get_id_from_uri(path['r']['value'])] for path in paths]
        return paths
    
    def search_two_hop_relations(self, src: str, dst: str) -> List[List[str]]:

        query = f"""
                SELECT DISTINCT ?r1 ?r2 WHERE {{
                    dbr:{src} ?r1 ?mid.
                    ?mid ?r2 dbr:{dst}.
                    FILTER regex(str(?r1), "^http://dbpedia.org/ontology/")
                    FILTER regex(str(?r2), "^http://dbpedia.org/ontology/")
                    FILTER (?r1 != dbo:wikiPageWikiLink)
                    FILTER (?r2 != dbo:wikiPageWikiLink)
                }}
                """
        paths = self.queryDBPedia(query)
        # Keep only identifiers in the paths
        paths = [[self.get_id_from_uri(path['r1']['value']), self.get_id_from_uri(path['r2']['value'])] for path in paths]
        return paths

    def deduce_leaves(self, src, path) :
        if len(path) > 3:
            raise NotImplementedError('Deduce leaves for paths longer than 3 is not implemented.')
        
        if len(path) == 0:
            return [src]
        
        if len(path) == 1:
            query = f"""
                SELECT DISTINCT ?dst WHERE {{
                    dbr:{src} dbo:{path[0]} ?dst.
                }}
            """
        else:
            query = f"""
                SELECT DISTINCT ?dst WHERE {{
                    dbr:{src} dbo:{path[0]} ?mid.
                    ?mid dbo:{path[1]} ?dst.
                    }}
                """
        leaves = self.queryDBPedia(query)
        leaves = [self.get_id_from_uri(leaf['dst']['value']) for leaf in leaves]
        return leaves

    def get_neighbor_relations(self, src, hop=1, limit=100):
        if hop > 2:
            raise NotImplementedError('Get neighbor relations for hop larger than 2 is not implemented.')

        if hop == 1:
            query = f"""
                SELECT DISTINCT ?r
                WHERE {{
                    dbr:Charles_III ?r ?neighbor .
                    FILTER (STRSTARTS(STR(?r), "http://dbpedia.org/ontology/") && !STRSTARTS(STR(?r), "http://dbpedia.org/ontology/wiki"))
                }}
                LIMIT {limit}
                """
        else:
            query = f"""
                SELECT DISTINCT ?r1 ?r2 WHERE {{
                    dbr:{src} ?r1 ?mid.
                    ?mid ?r2 ?dst.
                    FILTER (STRSTARTS(STR(?r1), "http://dbpedia.org/ontology/") && !STRSTARTS(STR(?r1), "http://dbpedia.org/ontology/wiki"))
                    FILTER (STRSTARTS(STR(?r2), "http://dbpedia.org/ontology/") && !STRSTARTS(STR(?r2), "http://dbpedia.org/ontology/wiki"))
                }}
                LIMIT {limit}
                """

        relations = self.queryDBPedia(query)

        if hop == 1:
            relations = [self.get_id_from_uri(relation['r']['value'])
                         for relation in relations]
        else:
            relations = [(self.get_id_from_uri(relation['r1']['value']),
                          self.get_id_from_uri(relation['r2']['value']))
                         for relation in relations]
        return relations

    def get_label(self, identifier):
        query = f"""
                SELECT (str(?label) AS ?name)
                WHERE {{
                dbr:{identifier} rdfs:label ?label .
                FILTER (lang(?label) = "en")
                }}
                LIMIT 1
                """
        labels = self.queryDBPedia(query)
        if len(labels) == 0:
            print(f'No label found for {identifier}')
            return None
        label = labels[0]['name']['value']
        return label

## 3. (Optional) Implement the Entity Linking Interface

If you are going to work on unlinked raw texts, you can further implement the `LinkerBase` interface. It is used to link raw text to entities in the knowledge graph.

We utilize [DBpedia spotlight](https://www.dbpedia-spotlight.org/) API to implement the interface.

You only have to implement one method `annotate`, which takes a raw text as input and returns a dictionary, with `question_entities` being list of entities.

In [None]:
import requests

from srtk.entity_linking.linker_base import LinkerBase


class DBpediaLinker(LinkerBase):
    """Link entitiy mentions to DBpedia entities with the DBpedia Spotlight endpoint"""
    def __init__(self, endpoint):
        """Initialize the linker
        
        Args:
            endpoint (str): The endpoint of the DBpedia Spotlight service
                e.g. https://api.dbpedia-spotlight.org/en/annotate
        """
        self.endpoint = endpoint

    def annotate(self, text):
        """Annotate a text with the entities in the DBpedia knowledge graph

        Args:
            text (str): The text to annotate

        Returns:
            dict: A dictionary with the following keys:
                question: The input text
                question_entities: The DBpedia entities in the text
                spans: The spans of the entities in the text
                similarity_scores: The similarity scores of the entities in the text
        """
        params = {'text': text}
        headers = {'Accept': 'application/json'}
        response = requests.get(self.endpoint, params=params, headers=headers,
                                timeout=60).json()
        resources = response['Resources']
        question_entities = []
        spans = []
        similarity_scores = []
        for resource in resources:
            uri = resource['@URI']
            if not uri.startswith('http://dbpedia.org/resource/'):
                continue
            entity = uri[len('http://dbpedia.org/resource/'):]
            offset = int(resource['@offset'])
            surface_form = resource['@surfaceForm']
            similarity_score = float(resource['@similarityScore'])
            span = (offset, offset + len(surface_form))
            question_entities.append(entity)
            spans.append(span)
            similarity_scores.append(similarity_score)
        linked = {
            "question": text,
            "question_entities": question_entities,
            "spans": spans,
            "similarity_scores": similarity_scores,
        }
        return linked