Naive Bayesian Classification

Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports term frequency-inverse document frequency calculations (TF-IDF).

Background

This is meant to be an low-entry barrier Go library for basic Bayesian classification. See code comments for a refresher on naive Bayesian classifiers, and please take some time to understand underflow edge cases as this otherwise may result in innacurate classifications.

Installation

Using the go command:

go get github.com/jbrukh/bayesian
go install !$

Documentation

See the GoPkgDoc documentation here.

Features

Conditional probability and "log-likelihood"-like scoring.
Underflow detection.
Simple persistence of classifiers.
Statistics.
TF-IDF support.

Example 1 (Simple Classification)

To use the classifier, first you must create some classes and train it:

import "github.com/jbrukh/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad  bayesian.Class = "Bad"
)

classifier := bayesian.NewClassifier(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
                        []string{"tall", "girl"},
                     )

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
                        []string{"tall", "girl"},
                     )

Example 2 (TF-IDF Support)

To use the TF-IDF classifier, first you must create some classes and train it and you need to call ConvertTermsFreqToTfIdf() AFTER training and before calling classification methods such as LogScores, SafeProbScores, and ProbScores)

import "github.com/jbrukh/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad bayesian.Class = "Bad"
)

// Create a classifier with TF-IDF support.
classifier := bayesian.NewClassifierTfIdf(Good, Bad)

goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}

classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

// Required
classifier.ConvertTermsFreqToTfIdf()

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
    []string{"tall", "girl"},
)

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
    []string{"tall", "girl"},
)

Use wisely.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
LICENSE		LICENSE
README.md		README.md
bayesian.go		bayesian.go
bayesian_test.go		bayesian_test.go
doc.go		doc.go
todo.txt		todo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

bayesian.go

bayesian.go

bayesian_test.go

bayesian_test.go

doc.go

doc.go

todo.txt

todo.txt

Repository files navigation

Naive Bayesian Classification

Background

Installation

Documentation

Features

Example 1 (Simple Classification)

Example 2 (TF-IDF Support)

About

Releases

Packages

Used by 111

Contributors 12

Languages

License

jbrukh/bayesian

Folders and files

Latest commit

History

Repository files navigation

Naive Bayesian Classification

Background

Installation

Documentation

Features

Example 1 (Simple Classification)

Example 2 (TF-IDF Support)

About

Resources

License

Stars

Watchers

Forks

Languages