naive bayes on a more realistic example? #65

randomgambit · 2018-05-21T18:59:05Z

Hello the h20 team. Thanks for this wonderful package!

I was simply wondering if there is a tutorial somewhere that shows how we can use h20 to perform a full naive bayes text classification model.

I see a small example here, but in my dataset I have many documents (think headlines of articles), so the matrix representation of the bag-of-word processing would be (in a regular R session) a sparse matrix or a document term matrix.

Can h20 manage/create that? or somehow h2o can only work with a dataframe with a small number of dummies for each selected word already created by the user?

Thanks!

The text was updated successfully, but these errors were encountered:

ABartzGit · 2018-05-25T18:09:55Z

Hello Olaf,

Thank you for the kind words about H2O. But unfortunately, we do not currently have an example that you’re looking for.

One optimization you can try is to save the representation as an svmlight (aka libsvm) file, which is a sparse format, and load it in. Another option is to use PCA to reduce the dimensionality, and then use NB or a better algorithm like RF or GBM.

You may also find this recent article about the PCA approach interesting/useful:
https://www.hvitfeldt.me/2018/05/using-pca-for-word-embedding-in-r/

Please let me know if any of the above is helpful.

Sincerely,
Angela

MayukhSobo · 2018-09-20T16:40:57Z

Hello,

I feel more example should be given with variable importance for Naive Bayes. If you are supporting one algorithm then, it should be complete support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

naive bayes on a more realistic example? #65

naive bayes on a more realistic example? #65

randomgambit commented May 21, 2018

ABartzGit commented May 25, 2018

MayukhSobo commented Sep 20, 2018

naive bayes on a more realistic example? #65

naive bayes on a more realistic example? #65

Comments

randomgambit commented May 21, 2018

ABartzGit commented May 25, 2018

MayukhSobo commented Sep 20, 2018