LatentDirichletAllocationPreprocessing

This tool was developed to preprocess data for LDA (Latent Dirichlet Allocation) topic modeling. The output is to be used in conjunction with the LDA C++ project available in my list of projects.

Usage

python LDA.py createKFold -h

usage: LDA.py createKFold [-h] [--test_frac TEST_FRAC] [--k K]
                          [--output OUTPUT] [--debug]
                          corpus vocabulary

positional arguments:
  corpus                path to the corpus file
  vocabulary            path to the vocabulary file

optional arguments:
  -h, --help            show this help message and exit
  --test_frac TEST_FRAC fraction of corpus documents to retain as test set
                        (default: 0)
  --k K                 number of folds (default: 1, no validation set)
  --output OUTPUT       output files directory (default: same directory of
                        input)
  --debug               debug flag (default: false)

Input corpus and vocabulary

The corpus and vocabulary files must be in the form of the datasets present at the url http://archive.ics.uci.edu/ml/datasets/Bag+of+Words.

#License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
libs		libs
.hgignore		.hgignore
LDA.py		LDA.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LatentDirichletAllocationPreprocessing

Usage

Input corpus and vocabulary

About

Releases

Packages

Languages

marcorighini/LatentDirichletAllocationPreprocessing

Folders and files

Latest commit

History

Repository files navigation

LatentDirichletAllocationPreprocessing

Usage

Input corpus and vocabulary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages