Skip to content

rybesh/dth-topics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Requirements

make comes standard on Unix systems including MacOS; on Windows it will need to be installed. The other programs will need to be installed according to the instructions linked above.

Usage

Clone this repository:

git clone git@github.com:rybesh/dth-topics.git

All commands must be run from the dth-topics directory:

cd dth-topics

To download OCR data for newspaper pages from the Digital NC Daily Tar Heel archive:

make ocr

To install MALLET and use it to train topic models on the OCR data:

make models

To create visualizations of the topic models:

make viz

To view the visualizations for the n-topics model (e.g. 10-topics, 100-topics), open viz/n-topics/viz.html.

To create lists of the top (most closely associated) documents for each topic:

make top

To view the top documents per topic for the n-topics model (e.g. 10-topics, 100-topics), open viz/n-topics/topdocs.html.

About

topic modeling the Daily Tar Heel

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages