Matchmaker - a tool for semi-supervised label matching
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
src/main/java
README.md
instancematcher.iml
pom.xml

README.md

Matchmaker

A prototype of a tool for semi-automated (interactive) label matching.

The intended use-case scenario is focused on the task of matching labels extracted from text (noun phrases represented as bags of words) against the most relevant and semantically related labels of SKOS concepts from a given SKOS taxonomy, in order to annotate the text with SKOS concepts and/or extend the SKOS-based knowledge graph with new concepts originating in unstructured data.

Key components involved:

Example data includes:

This work is HEAVILY UNDER CONSTRUCTION!

A sample workflow:

  1. (supervised) training of a word matching classifier to account for inflectional variants of concepts and minor typos occurring in the source and target labels, e.g.:

logic - logics (true)

logic - logica (true)

logic - logically (true)

logic - login (false)

  1. applying the classifier for generating mappings from the words in the source and target labels to WordNet vocabulary;

intelligence -intelligentsia|intelligently|intelligent|intelligence

  1. generating bags of words out of noun phrases extracted from labels (here conference names and SKOS labels) e.g.:

logic programming artificial intelligence reasoning

Logic for Programming, Artificial Intelligence, and Reasoning - 19th International Conference, LPAR-19, Stellenbosch, South Africa, December 14-19, 2013. Proceedings, http://dblp.l3s.de/d2r/page/publications/conf/lpar/2013

automated reasoning

Automated reasoning (ACM:10003794)

  1. generating similarity matrices between source and target bags of words, e.g.:
+------------+--------------------+--------------------+--------------------+
|(NULL)      |automated           |reasoning           |(NULL)              |
+------------+--------------------+--------------------+--------------------+
|logic       |0.013245033112582781|0.043010752688172046|0.043010752688172046|
+------------+--------------------+--------------------+--------------------+
|programming |0.19444444444444445 |0.04395604395604396 |0.19444444444444445 |
+------------+--------------------+--------------------+--------------------+
|artificial  |0.013513513513513514|0.015625            |0.015625            |
+------------+--------------------+--------------------+--------------------+
|intelligence|0.053763440860215055|0.5                 |0.5                 |
+------------+--------------------+--------------------+--------------------+
|reasoning   |0.013333333333333334|1.0                 |1.0                 |
+------------+--------------------+--------------------+--------------------+
|(NULL)      |0.19444444444444445 |1.0                 |null                |
+------------+--------------------+--------------------+--------------------+

  1. Propagating the matching score information to the neighborhood SKOS concepts.

  2. Training a label matching classifier using users accept-reject responses to subsequently proposed matches.

  3. Generating mappings by means of the classifier