KIDS-DMM uses an open-source Java package to implement the algorithm.
- Java (Version=1.8)
We procided the following three short text datasets for evaluation, SearchSnippets, GoogleNews, and Biomedical. All of corpus files and the corresponding label files have been prepared in the path ./datasets according to the survey, Short Text Topic Modeling Techniques, Applications, and Performance: A Survey.
Taking SearchSnippets as an example, the dataset file path is as follows.
datasets
SearchSnippets
word_wiki
SearchSnippets.txt
SearchSnippets_label.txt
SearchSnippets_vocab.txt
SearchSnippets_Word2VecSim.txt
For the corresponding word_wiki and the word2VecSim, you can download from this following this paper.
bash run.sh