Skip to content
T2K Match is a matching algorithm optimised to match millions of web tables to a central knowledge base.
Java Shell
Branch: master
Clone or download

Latest commit

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data added data files May 18, 2017
scripts initial commit of T2K Match May 18, 2017
src Introduce Java Time Jul 14, 2017
LICENSE added LICENSE May 18, 2017
README.md added 'how to run' section May 19, 2017
pom.xml updated to WInte.r 1.1 Jul 17, 2017

README.md

T2K Match

T2K Match [1] is matching algorithm optimised to match millions of web tables against a central knowledge base.

Many web sites provide data in the form of HTML tables. Millions of such data tables have been extracted from the CommonCrawl web corpus by the Web Data Commons project [3]. Data from these tables can be used to fill missing values in large cross-domain knowledge bases such as DBpedia [2]. This project is an example of how pre-defined building blocks from the WInte.r framework are combined into an advanced, use-case specific integration method. The algorithm is optimized to match millions of Web tables against a central knowledge base describing millions of instances belonging to hundreds of different classes (such a people or locations) [2].

How to run

To run T2K Match, use the run_t2k_match script in the scripts directory.

  1. Copy the compiled T2K Match jar file to the /lib/ directory in your home or change the path in the script file
JAR="$HOME/lib/t2kmatch-2.0-jar-with-dependencies.jar"
  1. Unzip the files in the data directory
gunzip data/dbpedia/*
gunzip data/*.gz
  1. Run the script
./scripts/run_t2k_match

Acknowledgements

This project is a re-implementation of the original T2K Match algorithm developed at the Data and Web Science Group at the University of Mannheim using the WInte.r framework.

License

T2K Match can be used under the Apache 2.0 License.

References

[1] Ritze, D., Lehmberg, O., & Bizer, C. (2015, July). Matching html tables to dbpedia. In Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics (p. 10). ACM.

[2] Ritze, D., Lehmberg, O., Oulabi, Y., & Bizer, C. (2016, April). Profiling the potential of web tables for augmenting cross-domain knowledge bases. In Proceedings of the 25th International Conference on World Wide Web (pp. 251-261). International World Wide Web Conferences Steering Committee.

[3] Lehmberg, O., Ritze, D., Meusel, R., & Bizer, C. (2016, April). A large public corpus of web tables containing time and context metadata. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 75-76). International World Wide Web Conferences Steering Committee.

You can’t perform that action at this time.