Classifier

Automated Document Classifier using Complementary NaiveBayes Algorithm.

Trainer.java Takes unprocessed data set and produces processed dataset as suitable for Mahout file format. Responsible for training Complementary Naive bayes algorithm and build a statistical model.

Classifier.java Takes an unclassified data directory and classifies the documents. Creates separate subdirectories for each category and writes the files onto the directory.

Setting Up Parameters in settings.properties file Bayesparameters

Gramsize=2 // Ngram size Algorithm=cbayes // our classification algorithm DefaultCategory=unknown // Default Category DataSource=hdfs // Hadoop File System Encoding=UTF-8 // Unicode Alpha=1.0 //Smoothing parameter

For Trainer.java

TrainSet=/home/developer/dataset_rev/freshrevs/train/ // training set location which containing subdirectories of each category ProcessedSet=/home/developer/dataset_rev/freshrevs/processedTrain/ // Processed Output Directory

For Classifier.java

ModelPath=/home/developer/dataset_rev/freshrevs/model/ // Path to store and retrieve Model IpDirPath=/home/developer/dataset_rev/freshrevs/test/pos/ // Unclassifed data set OpDirPath=/home/developer/dataset_rev/freshrevs/classified/ // Path to store classified documents

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
classify		classify
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Classifier

Automated Document Classifier using Complementary NaiveBayes Algorithm.

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rsudharshan/Classifier

Folders and files

Latest commit

History

Repository files navigation

Classifier

Automated Document Classifier using Complementary NaiveBayes Algorithm.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages