### Introduction

Concetual similarity based on a specific concept taxonomy is useful for many applications. For example, in document analysis, a conceptual graph can be constructed based on the concepts in the document, so important concepts or the main topics of the document can be identified using graph-based analysis techniques. One type of conceptual graph is concept similarity graph which is constructed from computing semantic similarity between concepts. 

To facilitate the computation of semantic similarity, we have implemented a Taxonomy interface and several state of art semantic similarity metrics based on the Taxonomy interfaces. In this notebook, we are going to show how to implement semantic similarity computation using our implementation. Basically, you need to implement the DataTransform interface. We are going to show the implementation of DBpedia ontology classes, Wikipedia category taxonomy, the Medical Subject Headings, using Sematch. Similar method can be applied to other hiearchical concepts such as Open Directory Project, the ACM Term Classification and many others.

#### DBpedia Ontology Class Taxonomy

To compute semantic similarity between concepts in DBpedia ontology classes. First, transform the ontology classes into tuples of nodes, edges and labels. Second, use taxonomy module to parse the tuples. Finally, compute the conceptual similarity using ConceptSimilarity module.

In [2]:
from sematch.semantic.graph import DataTransform, Taxonomy
from sematch.semantic.similarity import ConceptSimilarity
from sematch.ontology import DBpedia


class DBpediaDataTransform(DataTransform):

    def __init__(self):
        self._ontology = DBpedia()

    def transform(self):
        nodes =  map(lambda x:x.toPython(), self._ontology.classes)
        node_id = {n:i for i,n in enumerate(nodes)}
        labels = [self._ontology.token(value) for i,value in enumerate(self._ontology.classes)]
        edges = []
        for i, node in enumerate(nodes):
            children = self._ontology.subClass(node)
            children = [child for child in children if child in nodes]
            children_ids = map(lambda x:node_id[x], children)
            for child_id in children_ids:
                edges.append((i, child_id))
        return nodes, labels, edges

concept_sim = ConceptSimilarity(Taxonomy(DBpediaDataTransform()), 'models/dbpedia_type_ic.txt')

c1 = concet_sim.name2concept('species')
c2 = concet_sim.name2concept('organ')
print concept_sim.path(c1, c1)
print concept_sim.path(c1, c2)
print concept_sim.wup(c1,c2)
print concept_sim.li(c1, c2)
print concept_sim.res(c1,c2)
print concept_sim.jcn(c1, c2)
print concept_sim.wpath(c1,c2)

1.0
0.2
0.333333333333
0.241311925619
0.0
0.0
0.2
