-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NB documentation and cross validation #1010
Comments
Sorry, I have two other related questions:
|
Point taken on the documentation, will fix asap. On the Bernoulli, there are actually three distributions common in Naive Bayes for text: multinomial, Bernoulli, and "binary multinomial". In the issues below, the conditional for Bernoulli was not doing anything, but also the predict method was wrong (now fixed, see code starting https://github.com/kbenoit/quanteda/blob/master/R/textmodel_NB.R#L202). For fitting the model, it computes probabilities based on binary occurrence, but for predicting on new data, it needed fixing. See the discussions here:
If this is not clear or you find an error, then please start a new issue on Naive Bayes - Bernoulli with specifics, otherwise I will fix the documentation and close this issue. |
I suggest that for cross-validation, you propose a desired behaviour and open it as a new issue. |
Hi,
the documentation for textmodel_NB does not include explanations for the different priors, although this is stated in the arguments:
And loosely related to this: Do you have any recommendations for using cross validation with quanteda textmodels? At the moment I manually split the data into training and testset, but it would be very handy to have a quanteda function for CV.
Edit: I also noticed that for
distrubition = 'Bernoulli'
, the underlying code seems to automatically convert the dfm to binary:If so, the related suggestion in the documentation could be removed.
The text was updated successfully, but these errors were encountered: