Naive Bayesian Classification for Golang.
Go
Latest commit b89451d Aug 24, 2016 @jbrukh committed on GitHub Merge pull request #16 from rawoke083/master
New pull request for Tf-Idf work.
Permalink
Failed to load latest commit information.
LICENSE Added license. Nov 23, 2011
README.md More formatting and rebase Aug 24, 2016
bayesian.go Better error messages. Aug 24, 2016
bayesian_test.go Extra test comment,new pull request Aug 24, 2016
todo.txt Test commit. Sep 24, 2014

README.md

Naive Bayesian Classification with TF-IDF support

Perform naive Bayesian classification into an arbitrary number of classes on sets of strings.

Copyright (c) 2011. Jake Brukhman. (jbrukh@gmail.com). All rights reserved. See the LICENSE file for BSD-style license.

Forked from github.com/jbrukh/bayesian

Added TF-IDF (term frequency–inverse document frequency) capability. Gain quite a bit of accurancy !


Background

See code comments for a refresher on naive Bayesian classifiers.


Installation

Using the go command:

go get github.com/jbrukh/bayesian
go install !$

Documentation

See the GoPkgDoc documentation here.


Features

  • Conditional probability and "log-likelihood"-like scoring.
  • Underflow detection.
  • Simple persistence of classifiers.
  • Statistics.

Example 1 (plain no tf-idf)

To use the classifier, first you must create some classes and train it:

import . "bayesian"

const (
    Good Class = "Good"
    Bad Class = "Bad"
)

classifier := NewClassifier(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
                        []string{"tall", "girl"}
                     )

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
                        []string{"tall", "girl"}
                     )

Example 2 (TF-IDF)

To use the TF-IDF classifier, first you must create some classes and train it AND you need to call ConvertTermsFreqToTfIdf() AFTER training and before Classifying methods(LogScore,ProbSafeScore,ProbScore)

import . "bayesian"

const (
    Good Class = "Good"
    Bad Class = "Bad"
)

classifier := NewClassiferTfIdf(Good, Bad) // Extra constructor
goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

classifier.ConvertTermsFreqToTfIdf() // IMPORTANT !!

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
                        []string{"tall", "girl"}
                     )

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
                        []string{"tall", "girl"}
                     )

Use wisely.