Skip to content

kambizG/GDTM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GDTM - Graph-based Dynamic Topic Model

The software for the algorithm presented in the following paper:

  • To be added PDF

Description

GDTM is a single-pass DTM approach that combines a context-rich and incremental feature representation model, called Random Indexing (RI) with a novel online graph partitioning algorithm to address scalability and dynamicity in topic modeling over short texts. In addition, GDTM uses a rich language modeling approach based on the Skip-gram technique to account for sparsity.

Usage

#Synopsis
java -jar gdtm δ α γ

#Params:
- δ # Function words adjustment parameter {value >= 1}
- α # Partition expansion threshold {value = [0...1]}
- γ # Function word elimination threshold {value = [0...1]}

Following is a list of arbitrary parameters to costumize or enhance the performance relative to the volume of the stream.

  • RI Params
  • -dim: the dimension of the vector. {value >= 2}. default = 2000
  • -noz: the number of non zero elemtns. {value = [1...dim]}. default = 8
  • -win: the size of the moving window to construct the contex structures. default = 2
  • -mwt: RI vectors pruning parameter. {value = [0...1]}. default = 0.3
  • See also
  • -skip: Skip-gram value. {1 = bigram, 2 = 1-skip-bigram, 3 = 2-skip-bigram, ...}
  • -SN (snapshot): the algorithm will take a snapshot of the partitioned documents and clean the momry.
  • -intput: the input can be set arbitrarily.
  • -output: the output can be set arbitrarily

Input Data Format

Document1
Document2
...
Documentn

Output

T1:L11  T2:L12  ...  Tm:L1m
T1:L21  T2:L22  ...  Tm:L2m
...
T1:Ln1  T2:Ln2  ...  Tm:Lnm

Where Ti indicates the topic number i and Lji indicates the corresponding likelihood of the topic i for the document j.

Protocol

alt text

Contributors

  1. Kambiz Ghoorchian
  2. Magnus Sahlgren
  3. Magnus Boman

Acknowledgement

About

Graph-based Dynamic Topic Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published