Skip to content
Newer
Older
100644 144 lines (115 sloc) 6.05 KB
1d4fed2 Adding the sample data and sample run script. Update USAGE.
Sanjeev Satheesh authored Feb 20, 2012
1 Refer run.sh for sample usage.
825c199 Added the README and USAGE. More code refactoring in the interfaces.
Sanjeev Satheesh authored Feb 20, 2012
2
3 -DataDir DIR
4 Use all txt files in DIR as data. The data files are organized such that
5 all instances corresponding to one class are listed in a single file
3e8f77a Some tweaks to the testing framework.
Sanjeev Satheesh authored Mar 5, 2012
6 named with its label. Each line in the text file is treated as a single
7 data instance. All the test data may be put together into a single
8 file with the same formatting if the labels are unknown. If the labels
9 are known, then use the same filename as you would for training data,
10 but prefix it with "test_". So for example, all training data for the class
11 "action_thriller" will go into "action_thriller.txt" and test data of type
12 "action_thriller" go into "test_action_thriller.txt".
13
14 If test labels are known, jrae will print out the final accuracy figures
15 when the test completes.
16
c2ededd Updated USAGE.
Sanjeev Satheesh authored Mar 5, 2012
17 -minCount MINCOUNT
18 Default value is 5. Only words which occur atleast MINCOUNT number of times
19 are included in the vocabulary.
825c199 Added the README and USAGE. More code refactoring in the interfaces.
Sanjeev Satheesh authored Feb 20, 2012
20
21 -TrainModel TRUE|FALSE
22 Indicates whether to train an RAE based on the data provided by the DIR
23 directive.
24
25 -ModelFile FILE
26 This parameter is compulsory. FILE indicates the file into which the
27 trained RAE model is saved if TRAINMODEL is set to TRUE. If not, it
28 indicates the model file to be loaded to perform testing or feature
29 extraction.
30
8777e61 Fixed the mistake. I was using the RAE's classifier as the final
Sanjeev Satheesh authored Mar 5, 2012
31 -ClassifierFile FILE
32 This parameter is compulsory. FILE indicates the file into which the
33 trained classifier is saved if TRAINMODEL is set to TRUE. If not, it
34 indicates the model file to be loaded in to perform testing.
35
825c199 Added the README and USAGE. More code refactoring in the interfaces.
Sanjeev Satheesh authored Feb 20, 2012
36 -FeaturesOutputFile FILE
37 This parameter is used only if not training. When not training, one of
38 FeaturesOutputFile or ProbabilitiesOutputFile is compulsory. The system
39 extracts features for the test data using the Model and saves it in FILE
1d4fed2 Adding the sample data and sample run script. Update USAGE.
Sanjeev Satheesh authored Feb 20, 2012
40
1a9259c Rearranged code to make the trees available to main.
Sanjeev Satheesh authored Mar 5, 2012
41 NOTE :: FILE should either not be a txt file or it should not point to the
42 same directory as the -DataDir option. The next time jrae processes the
43 directory, it will read in the probabilities as training data.
1d4fed2 Adding the sample data and sample run script. Update USAGE.
Sanjeev Satheesh authored Feb 20, 2012
44
825c199 Added the README and USAGE. More code refactoring in the interfaces.
Sanjeev Satheesh authored Feb 20, 2012
45 -ProbabilitiesOutputFile FILE
46 This parameter is used only if not training. When not training, one of
47 FeaturesOutputFile or ProbabilitiesOutputFile is compulsory. The system
48 classifies the test instances and saves the probabilities of the item
49 belonging to each class into FILE. The classes are numbered 0 to K-1,
50 and indicates a class label. The ordering of the class labels is the
51 alphabetical ordering of files in the -DataDir option. This will be
52 fixed in the next commit to list the labels themselves instead of the
53 label index.
1d4fed2 Adding the sample data and sample run script. Update USAGE.
Sanjeev Satheesh authored Feb 20, 2012
54
1a9259c Rearranged code to make the trees available to main.
Sanjeev Satheesh authored Mar 5, 2012
55 NOTE :: FILE should either not be a txt file or it should not point to the
56 same directory as the -DataDir option. The next time jrae processes the
57 directory, it will read in the probabilities as training data.
2f1f5e6 We can now dump out the learned trees
Sanjeev Satheesh authored Mar 13, 2012
58
59 -TreeDumpDir DIR
60 Set DIR to point to point to a directory where you want to dump out all
61 the trees. You can use this parameter both during training and testing.
62 For each data item, it writes out three files as follows with each line
63 containing information about a subtree of the RAE model.
64 (# stands for data item number, with index starting at 1):
65 * sent#_strings.txt
66 Each line is of the form <n word1 word2 ... wordn>
67 n indicates how long the subtree is.
68
69 * sent#_classifierOutput.txt
70 The probability emitted by the classifier indicating which
71 class it belongs to. It is increasing order of label.
72 NOTE: The ordering of the labels is listed in "labels.map".
73
74 * sent#_nodeVecs.txt
75 Each line contains the features calculated by the RAE model at
76 each node. This is the feature of the entire subtree underneath
77 this node.
78
79 There is also a treeStructures.txt file which lists the tree structure
80 built by the RAE, one data item per line. The first "n" values indicate
81 the index of the parent of the individual tokens in the data item. The
82 next "n-1" entries each correspond to the internal nodes built by the
83 RAE model.
1d4fed2 Adding the sample data and sample run script. Update USAGE.
Sanjeev Satheesh authored Feb 20, 2012
84
825c199 Added the README and USAGE. More code refactoring in the interfaces.
Sanjeev Satheesh authored Feb 20, 2012
85 -NumCores NUMCORES
86 Indicates how many parallel threads to use for feature processing.
87 Ideally it is the same as the number of cores on the machine but never
88 more. If this field is not set, it automatically sets this value to
89 the number of cores on the processor.
90
91 -NumFolds NUMFOLDS
92 While doing a full demo run, indicates the number of folds to split
93 the data into. Ignored by all other interfaces.
94
95 -MaxIterations MAXITERATIONS
96 The number of iterations for training the RAE. The default is 80.
97
98 -embeddingSize EMBEDDINGSIZE
99 The RAE performs feature extraction by first embedding each word in
100 the vocabulary into a high dimensional real space. Defaults to 50.
101 Higher values may result in severly increasing the training time.
102
103 -alphaCat ALPHA
104 ALPHA is in [0,1] which indicates the balance between optimizing for
105 classification loss against auto-encoder loss. In detail : feature
106 learning is performed by minimizing the reconstruction error. It is
107 also possible to minimize for a classification loss which is indicates
108 of how informative a single word is of the final class the sentence
109 belongs to. Defaults to 0.2.
110
111 For example, a word like "awesome" is highly indicative of a good
112 review. If your corpus is full of such words, increase the value of
113 this parameter.
114
115 -Beta BETA
116 BETA is in [0,1] and indicates the weight on the classification loss.
117 Defaults to 0.5.
118
119 -lambdaW LAMBDAW
120 All lambda are weights on the regularization terms. LambdaW is the weight
121 on the embedding W1 - W4 (Refer the paper for more details)
122
123 -lambdaL LAMBDAL
124 All lambda are weights on the regularization terms. LAMBDAL is the weight
125 on the embedding We (Refer the paper for more details)
126
2ec1806 Small bug fix in the extended features. Added support for a simple Cu…
Sanjeev Satheesh authored May 11, 2012
127 -lambdaCat LAMBDACAT
2f1f5e6 We can now dump out the learned trees
Sanjeev Satheesh authored Mar 13, 2012
128 All lambda are weights on the regularization terms. LAMBDACAT is the
129 weight on the classifier weights.
825c199 Added the README and USAGE. More code refactoring in the interfaces.
Sanjeev Satheesh authored Feb 20, 2012
130
2ec1806 Small bug fix in the extended features. Added support for a simple Cu…
Sanjeev Satheesh authored May 12, 2012
131 -lambdaRAE LAMBDARAE
2f1f5e6 We can now dump out the learned trees
Sanjeev Satheesh authored Mar 13, 2012
132 All lambda are weights on the regularization terms. LAMBDARAE is the
133 weight on the classifier weights. This differs from LAMBDACAT in that
134 this is applied in the second phase where the RAE is being fine-tuned
135 (Refer the paper for more details)
825c199 Added the README and USAGE. More code refactoring in the interfaces.
Sanjeev Satheesh authored Feb 20, 2012
136
2ec1806 Small bug fix in the extended features. Added support for a simple Cu…
Sanjeev Satheesh authored May 12, 2012
137 -CurriculumLearning
138 FLAG is set to False by default. Set to True to turn on Curriculum
139 learning. Refer to Bengio,Y ICML 09 for more details.
140
825c199 Added the README and USAGE. More code refactoring in the interfaces.
Sanjeev Satheesh authored Feb 20, 2012
141 --help
1d4fed2 Adding the sample data and sample run script. Update USAGE.
Sanjeev Satheesh authored Feb 20, 2012
142 Print this message and quit.
143
Something went wrong with that request. Please try again.