Skip to content

Clustering of textual documents with time window

License

Notifications You must be signed in to change notification settings

medialab/stories

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

stories

Clustering of textual documents with time window

How to install

  1. Install cargo (see cargo documentation).

  2. Install stories

cargo install --git https://github.com/medialab/stories.git

How to run

Extract vocabulary

stories vocab my_file.csv --ngrams 2 > my_vocab.csv

Determine time window

WINDOW=`stories window my_file.csv --raw`

Apply clustering algorithm

stories nn my_vocab.csv my_file.csv -w $WINDOW --ngrams 2  --threshold 0.65 > nn.csv

Evaluate cluster quality

xsv join --left id my_file.csv id nn.csv | xsv select id,created_at,nearest_neighbor,thread_id,distance > nn_dated.csv
stories eval my_labels.csv nn_dated.csv --datecol created_at

About

Clustering of textual documents with time window

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published