Skip to content

kevalmorabia97/Naive-Bayes-Text-Classifier-in-Java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Naive Bayes Classifier in Java

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features. Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods. In spite of their apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering.
Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or neutral. It’s also known as opinion mining, deriving the opinion or attitude of a speaker. A common use case for this technology is to discover how people feel about a particular topic.
Your task is to classify whether a given review has a positive or negative tone using naive Bayes classifier.

Dataset used from - http://ai.stanford.edu/~amaas/data/sentiment/
Dataset has 12,500 positive and 12,500 negative reviews of movies for training and testing separately


Reference: Chapter6 - Machine Learning by Tom M. Mitchell Image


  • Steps to find the sentiment of any text:

The code I wrote only gives the sentiment as positive(1) or negative(0)
You cant know how much pos or neg

To get a sentiment, write each text on a line with first integer as the actual sentiment (for testing the accuracy but you can keep it anything if you want to predict) after that each word is of the form: <Index_in_vocab:freq> seperated by space

For example 10 0:2 1:1 3:1 4:1 Meaning the sentiment is 10 (positive) and contains
The 2 times
And 1 time
Of 1 time
To 1 time
(See Vocab.txt file) Put this content in test.data file Again, the value 10 is for test so if you know a sentiment you can verify that the classification should be 1 (pos)

To get the predicted value, you can add a print statement after line 80 in the code and comment out the code from line 81 to 95 which is just for finding the accuracy of prediction on test data

About

Naive Bayes Classification used to classify movie reviews as positive or negative

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages