This is a project in the context of the Linked Open Data (LOD) Seminar at AIFB at the Karlsruhe Institute of Technology.
Goal was basically to integrate multiple LOD sources (in a first step only DBPedia and Yago) to build a knowledge panel or fact box (as known from Google or Wikipedia) on that basis.
A major challenge was how to determine which properties of an entity, e.g. dbp:Karlsruhe are relevant and meaningful to be displayed to the user and which are not. Accordingly, a ranking of properties for specific entities or classes (rdf:type
) of entities had to be elaborated, which is capable of ranking properties among multiple, distinct sources.
While [1] already presented a good solution (although only working for one dataset, namely DBPedia) based on supervised machine learning, our approach is based of rather naive statistical metrics like TF-IDF.
Our evaluation is based on rank biased overlap (RBO), as described in [2].
[1] Dessi, A., & Atzori, M. (2016). A machine-learning approach to ranking RDF properties. Future Generation Computer Systems, 54, 366–377. http://doi.org/10.1016/j.future.2015.04.018
[2] Webber, W., Moffat, A., & Zobel, J. (2010). A similarity measure for indefinite rankings. ACM Transactions on Information Systems, 28(4), 1–38. http://doi.org/10.1145/1852102.1852106
The project consist of four software components.
- Preprocessing scripts: Responsible for extracting statistics from LOD graphs and calculating TF and IDF on that base
- Backend: Responsible for computing entity-specific, multi-source property ranking at runtime as well as constructing a combined JSON-LD serialized RDF graph from DBPedia and Yago on that base. Exposed as a RESTful webservice.
- Frontend: Single Page App as user interface, which queries the backend based in a user input and prints a knowledge panel based on the response's RDF graph.
- Evaluation: Scripts facilitating "manual" computation of RBO metrics for specific entities.
- Han Che
- Benny Rolle
- Ferdinand Mütsch
MIT