-
Notifications
You must be signed in to change notification settings - Fork 3
/
README
55 lines (40 loc) · 2.36 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
Block LDA (see http://www.cs.cmu.edu/~rbalasub/publications/sdm_2011.pdf for what this program tries to do)
It also includes some regularization stuff that's brand-new and can be ignored for now.
Author:Ramnath Balasubramanyan (rbalasub@cs.cmu.edu)
INSTALLATION
-------------
1. Unzip the tarball (which you probably did already given that you are reading this README :)
2. Install dlib
3. Change path to dlib in the Makefile
4. mkdir lib bin
5. make
HOW TO RUN
-----------
For running block lda in the most basic mode, you need to supply a train file with the network edges and/or a document corpus
A sample run for the program is
bin/link_lda --link_train_file SampleData/yeast.ppi \
--train_file SampleData/yeast.pubs \
--topics 8 \
--output_prefix BlockLDA-Output/yeast \
--link_attrs 1,1 \
--num_entity_types 3
The first two lines specify the corpus and network to train on.
The link_attrs flag indicates that links in the network are between proteins which in the document corpus is field #1 (0-based)
The last flag tells the program that there are three types of attributes in the corpus - words, proteins and authors
A report of the run will be saved in yeast-output-models.report
yeast-output-models.* in general has a whole bunch of output most of which should be easily interpretable from the file names
bin/link_lda --help will give you some info on how to specify other configuration values.
VISUALIZATION
--------------
python/view_model.py --report=BlockLDA-Output/yeast.report \
--entity_names='words,proteins,authors' \
--dict_prefix=SampleData/yeast.dict \
--output_prefix=BlockLDA-Output/HTML/yeast \
--index \
--topics \
--title "Yeast"
Navigate to BlockLDA-Output/HTML/yeast_index.html on a browswer to see topics.
WARNING
---------
This code is very much a work in progress and is very non-author-unfriendly.
Please email me if you have any problems running it. Also please don't distribute it to anyone else just yet. I plan to clean it up and release it publicly soon.