Skip to content

A Model-based Approach for Text Clustering with Outlier Detection

Notifications You must be signed in to change notification settings

junyachen/GSDPMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GSDPMM

The datasets are in format of JSON like follows:
{"text": "centrepoint winter white gala london", "cluster": 65}
{"text": "mourinho seek killer instinct", "cluster": 96}
{"text": "roundup golden globe won seduced johansson voice", "cluster": 72}
{"text": "travel disruption mount storm cold air sweep south florida", "cluster": 140}
{"text": "wes welker blame costly turnover", "cluster": 89}
......

The output of GSDPMM are D (the number of documents in the dataset) lines. Each line contains the estimated cluster for that document.

Citation

Please cite the following paper for the data usage:

@article{chen2019nonparametric, title={A nonparametric model for online topic discovery with word embeddings}, author={Chen, Junyang and Gong, Zhiguo and Liu, Weiwen}, journal={Information Sciences}, volume={504}, pages={32--47}, year={2019}, publisher={Elsevier} }

About

A Model-based Approach for Text Clustering with Outlier Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages