Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naive Bayes classifier #77

Open
UnkindPartition opened this issue May 5, 2016 · 2 comments
Open

Naive Bayes classifier #77

UnkindPartition opened this issue May 5, 2016 · 2 comments

Comments

@UnkindPartition
Copy link
Contributor

I wanted to do Bayesian classification with HLearn, but it is no longer here.

I found #59 which gives a bit of background, but it doesn't solve my problem :) So, where do things stand now? Anything we can do to resurrect NBC and other classifiers from hlearn-classification?

@mikeizbicki
Copy link
Owner

Anything explicitly using probability distributions has still been removed because I still haven't had time to flesh out what I think the proper interface should be, and I don't want to spend time maintaining a less-than-perfect interface.

Is there a particular reason you want naive bayes? Usually, logisitic regression will outperform it, and that is currently implemented (although admittedly not super well documented).

Actually, if you're serious about wanting good machine learning results, you'll be better off with a non-Haskell library for now. I should have brought this up earlier, but I generally strongly warn people against using HLearn in its current state. The interface to most learning algorithms is pretty different from the interfaces provided by other libraries; it's still not super well documented; and I can't promise being able to spend much time helping you through the quirks. I'd really only recommend it at this point if you're interested in exploring alternative design spaces of what machine learning systems could look like.

@UnkindPartition
Copy link
Contributor Author

At the moment I'm dealing with multi-class classification. I vaguely know that I can use logistic regression for predicting each class separately, but I don't know how well it works in practice (in terms of performance and accuracy). I am very new to machine learning.

I did a prototype in R, but on the real data (~10M observations), R runs out of memory. Streaming is a strength of Haskell (and HLearn), and the algorithm itself is straightforward, so I thought this would be a good fit.

Thanks for the warning though. I guess I may naively try to code a naive Bayes classifier myself and see where this leads me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants