Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

naive bayes on a more realistic example? #65

Open
randomgambit opened this issue May 21, 2018 · 2 comments
Open

naive bayes on a more realistic example? #65

randomgambit opened this issue May 21, 2018 · 2 comments

Comments

@randomgambit
Copy link

Hello the h20 team. Thanks for this wonderful package!

I was simply wondering if there is a tutorial somewhere that shows how we can use h20 to perform a full naive bayes text classification model.

I see a small example here, but in my dataset I have many documents (think headlines of articles), so the matrix representation of the bag-of-word processing would be (in a regular R session) a sparse matrix or a document term matrix.

Can h20 manage/create that? or somehow h2o can only work with a dataframe with a small number of dummies for each selected word already created by the user?

Thanks!

@ABartzGit
Copy link

Hello Olaf,

Thank you for the kind words about H2O. But unfortunately, we do not currently have an example that you’re looking for.

One optimization you can try is to save the representation as an svmlight (aka libsvm) file, which is a sparse format, and load it in. Another option is to use PCA to reduce the dimensionality, and then use NB or a better algorithm like RF or GBM.

You may also find this recent article about the PCA approach interesting/useful:
https://www.hvitfeldt.me/2018/05/using-pca-for-word-embedding-in-r/

Please let me know if any of the above is helpful.

Sincerely,
Angela

@MayukhSobo
Copy link

Hello,

I feel more example should be given with variable importance for Naive Bayes. If you are supporting one algorithm then, it should be complete support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants