Skip to content

rsudharshan/Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Classifier

Automated Document Classifier using Complementary NaiveBayes Algorithm.

Trainer.java Takes unprocessed data set and produces processed dataset as suitable for Mahout file format. Responsible for training Complementary Naive bayes algorithm and build a statistical model.

Classifier.java Takes an unclassified data directory and classifies the documents. Creates separate subdirectories for each category and writes the files onto the directory.

Setting Up Parameters in settings.properties file Bayesparameters

Gramsize=2 // Ngram size Algorithm=cbayes // our classification algorithm DefaultCategory=unknown // Default Category DataSource=hdfs // Hadoop File System Encoding=UTF-8 // Unicode Alpha=1.0 //Smoothing parameter

For Trainer.java

TrainSet=/home/developer/dataset_rev/freshrevs/train/ // training set location which containing subdirectories of each category ProcessedSet=/home/developer/dataset_rev/freshrevs/processedTrain/ // Processed Output Directory

For Classifier.java

ModelPath=/home/developer/dataset_rev/freshrevs/model/ // Path to store and retrieve Model IpDirPath=/home/developer/dataset_rev/freshrevs/test/pos/ // Unclassifed data set OpDirPath=/home/developer/dataset_rev/freshrevs/classified/ // Path to store classified documents

About

Semi-Supervised Text/Document Classification using Complementary NaiveBayes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages