Text-Classifier

Background

This is a text-classifier program that uses supervised learning to characterize the authors of a corpus of documents. It utilizes the Naive Bayes Classifier, a very simple machine learning algorithm, to make its predictions. More information about the classifier can be found here: http://en.wikipedia.org/wiki/Naive_Bayes_classifier. Given that I have not tested the program on a very diverse or large training/testing dataset, I have yet to get good figures on the accuracy of the classifier as I have implemented it.

Use

The program makes uses of the Python natural language toolkit as well as the numpy module. Instructions for installing both of them on various platforms can be found here: http://nltk.org/install.html. Place the naivebayes.py file in a directory. In the same directory, place two subdirectories: traincorpus containing the training set to be used by the program and testcorpus containing the documents you wish to classify. The program as I have written it necessitates that all the documents in the training/testing corpuses have the author of each document as the first word of the document. The testing corpuses must have authors included so that the accuracy of the classifier can be calculated. Run ./naivebayes.py and the classifier will execute.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
naivebayes.py		naivebayes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Classifier

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text-Classifier

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages