Here, we publish source code of our entropy-based cross-collection topic model. This code is a reference implementation of our paper our paper My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections at the ACM/IEEE Joint Conference on Digital Libraries 2018.
If you use our work, please cite our paper as follows:
@inproceedings{risch2018approach,
author = {Risch, Julian and Krestel, Ralf},
booktitle = {Proceedings of the Joint Conference on Digital Libraries (JCDL)},
pages = {283-292},
title = {My Approach = Your Apparatus? Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections},
year = {2018}
}
Please also note our earlier short paper What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers on a related topic published at the International Conference on Theory and Practice of Digital Libraries (TDPL).
TopicModelCcLDA.java
implements entity-based cross-collection latent Dirichlet allocationRunTopicModel.java
starts the training of the topic model and the following evalutionCorpusBlogPostsFromFile.java
loads a corpus from a file.CorpusToy.java
loads a small example corpus of a few documents defined in the code.