# Target Entity Type Identification Evaluation

In this exercise, you'll need to implement lenient evaluation measures for the target entity type identification task.

As a reminder, _target entity type identification_ is the task of finding the target types of a given input query, from a type taxonomy, such that these types correspond to most specific types of entities that are relevant to the query.  Target types cannot lie on the same branch in the taxonomy.

Our final measure is normalized discounted cumulative gain (NDCG), but we need to compute the gain values of answer types based on their distance from ground truth types in the type taxonomy.

In [None]:
import ipytest
import math
import pytest
from typing import Callable, List, Optional, Set

ipytest.autoconfig()

## Type taxonomy

We use the DBpedia Ontology as our type taxonomy. It is given to you in a preprocessed format in `data/dbpedia_types.tsv`, where each line corresponds to a type, and the tab-separated columns, respectively, are: type identifier, depth in the hierarchy, and parent type.

**TODO** Complete the class below such that you can compute the distance between two types in the taxonomy. You're free to choose how you want to internally represent the type hierarchy.

In [None]:
class TypeTaxonomy:
    
    ROOT = "owl:Thing"
    
    def __init__(self, tsv_filename: str) -> None:
        """Initializes the type taxonomy by loading it from a TSV file.
        
        Args:
            tsv_filename: Name of TSV file, with type_id, depth, and parent_id columns.
        """
        pass
    
    def max_depth(self) -> int:
        """Returns the maximum depth of the type taxonomy."""
        return 0
    
    def depth(self, type_id: str) -> int:
        """Returns the depth of a type in the taxonomy.
        
        Args:
            type_id: Type ID.
            
        Returns:
            The depth of the type in the hierarchy (0 for root).
        """
        return 0

    def parent(self, type_id: str) -> Optional[str]:
        """Returns the parent type of a type in the taxonomy.
        
        Args:
            type_id: Type ID.
            
        Returns:
            Parent type ID, or None if the input type is root.
        """
        return 0
        
    
    def dist(self, type_id1: str, type_id2: str) -> float:
        """Computes the distance between two types in the taxonomy.
        
        Args:
            type_id1: ID of first type.
            type_id2: ID of second type.
            
        Returns:
            The distance between the two types in the type taxonomy, which is
            the number of steps between them if they lie on the same branch,
            and otherwise `math.inf`.
        """
        return 0

Tests.

In [None]:
%%run_pytest[clean]

@pytest.fixture
def dbpedia_types():
    return TypeTaxonomy("data/dbpedia_types.tsv")

def test_max_depth(dbpedia_types):
    assert dbpedia_types.max_depth() == 7

@pytest.mark.parametrize("type_id,depth", [
    ("owl:Thing", 0),
    ("dbo:Agent", 1),
    ("dbo:SportFacility", 4),
    ("dbo:RaceTrack", 5)
])
def test_depth(dbpedia_types, type_id, depth):
    assert dbpedia_types.depth(type_id) == depth
    
@pytest.mark.parametrize("type_id,parent", [
    ("owl:Thing", None),
    ("dbo:Agent", "owl:Thing"),
    ("dbo:SportFacility", "dbo:ArchitecturalStructure"),
    ("dbo:RaceTrack", "dbo:SportFacility")
])
def test_depth(dbpedia_types, type_id, parent):
    assert dbpedia_types.parent(type_id) == parent

@pytest.mark.parametrize("type_id1,type_id2,distance", [
    ("dbo:Agent", "dbo:Agent", 0),  # same type
    ("dbo:Agent", "dbo:Person", 1),  # type2 is more specific
    ("dbo:Artist", "dbo:Agent", 2),  # type2 is more generic
    ("dbo:Artist", "dbo:Broadcaster", math.inf)  # different branch
])  
def test_distance(dbpedia_types, type_id1, type_id2, distance):
    assert dbpedia_types.dist(type_id1, type_id2) == distance

## Computing gain values

For simplicity, refer to this global variable in the gain computations.

In [None]:
type_taxonomy = TypeTaxonomy("data/dbpedia_types.tsv")

When defined in a _linear_ fashion, the gain of a type is computed as:

$$r(y) = \max_{\hat{y} \in \hat{\mathcal{T}}_q} \big( 1 - \frac{d(y,\hat{y})}{h} \big)$$

where $\hat{\mathcal{T}}_q$ is the set of ground truth types, $\hat{y}$ is a ground truth type, $y$ is an answer type, $d(y, \hat{y})$ is the distance between types in the taxonomy, and $h$ is the maximum depth of the type taxonomy.

In [None]:
def gain_linear(gt_types: Set[str], answer_type_id: str) -> float:
    """Computes the gain of an answer type in a linear fashion.
    
    Args:
        gt_types: Set of ground truth type IDs.
        answer_type_id: Answer type ID.
    
    Returns:
        Gain value.
    """
    return 0

Alternatively, the gain of an answer type can be defined using an _exponential_ decay function:

$$r(y) = \max_{\hat{y} \in \hat{\mathcal{T}}_q} \big ( b^{-d(y,\hat{y})} \big )$$

where $b$ is the base of the exponential function (here: $b=2$).

In [None]:
def gain_exponential(gt_types: Set[str], answer_type_id: str) -> float:
    """Computes the gain of an answer type using exponential decay.
    
    Args:
        gt_types: Set of ground truth type IDs.
        answer_type_id: Answer type ID.
    
    Returns:
        Gain value.
    """
    return 0

Tests.

In [None]:
# TODO: It's your task to write some tests.

## Putting everything together

Plug the gain values computed using either linear or exponential into the NDCG computation to get a final evaluation score.

The DCG and NDCG computation parts are given. The only part that needs completing is the construction of the ideal ranking.

**TODO** Complete the `get_ideal_ranking` function. Note that unlike in document ranking, the ground truth answers alone are insufficient. They should be complemented by related types along the hierarchy, which would yield partial matches.

In [None]:
def get_ideal_ranking(gt_types: Set[str]) -> List[str]:
    """Generates an ideal ranking corresponding to a set of ground truth types.
    
    Args:
        gt_types: Set of ground truth types.
    
    Returns:
        A ranked list of types that constitute an ideal ranking gain-wise.
    """
    # TODO
    return []

In [None]:
def dcg(relevances: List[float], k: int) -> float:
    """Computes DCG@k, given the corresponding relevance levels for a ranked list of types.
    
    Args:
        relevances: List with the relevance levels corresponding to a ranked list of types.
        k: Rank cut-off.
        
    Returns:
        DCG@k (float).
    """
    return relevances[0] + sum(
        [relevances[i] / math.log(i + 1, 2) 
         for i in range(1, min(k, len(relevances)))]
    )

In [None]:
def ndcg(system_ranking: List[str], gt_types: Set[str], gain_function: Callable, k:int = 10) -> float:
    """Computes NDCG@k for a given system ranking.
    
    Args:
        system_ranking: Ranked list of answer type IDs (from most to least relevant).
        gt_types: Set of ground truth types.
        gain_function: Function for computing the gain of an answer type.
        k: Rank cut-off.
    
    Returns:
        NDCG@k (float).
    """
    # Relevance (gain) levels for the ranked docs.
    relevances = [gain_function(gt_types, type_id) for type_id in system_ranking]

    # Relevance levels (gains) of the idealized ranking.
    relevances_ideal = [gain_function(gt_types, type_id) 
                        for type_id in get_ideal_ranking(gt_types)]
    
    return dcg(relevances, k) / dcg(relevances_ideal, k)

Tests.

In [None]:
# TODO: It's your task to write some tests.