Skip to content
This repository has been archived by the owner. It is now read-only.
semisupervised naive bayes
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
site
spec
README
article.rb
core_extensions.rb
main.rb
naive_bayes.rb
parser.rb
perez.bz2
semi_supervised_naive_bayes.rb
shorten_urls.rb
thereg.bz2

README

an experiment in semi supervised naive bayes text classification

a walk through of the project is available at http://matpalm.com/semi_supervised_naive_bayes

in general run things up with
bash> bzcat perez.bz2 thereg.bz2 | shuf | head -300 | ./shorten_urls.rb | ./main.rb

git tag v1_diy_fractions
nominal naive bayes implementation with diy rational arithmetic
fails due to numerical overflow

git tag v2_fractions_using_rational
rewrite using ruby's native Rational object
fails for same reason as v1

git tag v3_multinominal_rewrite
rewrite using multinominal naive bayes and explicit bucketing of articles into a class rather than
retained distributions (quick test showed that the unlabelled articles almost ALWAYS followed a distribution
along the lines of 0.99/0.01 anyways (??)) 

TODO
convergence of unlabelled set is always in a single iteration; this rings some warning bells for me.
even though it works something isn't right; perhaps should introduce unlabelled values incrementally?

You can’t perform that action at this time.