Naive Bayesian Spam Filter

Basic info

OS: Windows.
Language: Java.
Third library: Google Guava. The library is included in the repository. However you have to manually add the JAR file into the library yourself.
Java competency: Experienced.
Prerequesties: A machine that is able to run java programs.

It takes in a training dataset and a text and produces a binary decision wether the text is ham or spam, the accuracy depends on the dataset.

Word is the class that represents a word. It records the frequency of each occurence classified as either spam or ham of the word. It also calculates the probability of that word being a spam or a ham if the parameter totHam and totSpam is passed into one of its function.
Bayes is the class that takes care of reading in two arguments, the former is the training dataset and the latter is the text to be classified. It also takes all the sum of probabilities of all words to decide wether the text is ham or spam.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Bayes.java		Bayes.java
README.md		README.md
Test.txt		Test.txt
Word.java		Word.java
spamTrain.txt		spamTrain.txt