Predicted output coming wrong for textmodel_nb prediction #476

vibhutittu · 2017-01-18T17:34:03Z

I was working on naive bayes for text classification using your package. I was using textmodel_NB , for which you fixed a issue for the wrong priors using docfreq. Now its coming correct but the predictions should also be changed Please see the below example as mentioned in your package:

library(quanteda)
trainingset <- as.dfm(matrix(c(1, 2, 0, 0, 0, 0,
                               0, 2, 0, 0, 1, 0,
                               0, 1, 0, 1, 0, 0,
                               0, 1, 1, 0, 0, 1,
                               0, 3, 1, 0, 0, 1), 
                             ncol=6, nrow=5, byrow=TRUE,
                             dimnames = list(docs = paste("d", 1:5, sep = ""),
                                             features = c("Beijing", "Chinese",  "Japan", "Macao", 
                                                          "Shanghai", "Tokyo"))))
trainingclass <- factor(c("Y", "Y", "Y", "N", NA), ordered = TRUE)
## replicate IIR p261 prediction for test set (document 5)
(nb.p261 <- textmodel_NB(trainingset, trainingclass,prior="docfreq"))
predict(nb.p261, newdata = trainingset[5, ])

Output:

Fitted Naive Bayes model:
Call:
	textmodel_NB(x = trainingset, y = trainingclass, prior = "docfreq")


Training classes and priors:
   N    Y 
0.25 0.75 

		  Likelihoods:		Class Posteriors:
6 x 4 Matrix of class "dgeMatrix"
                  Y         N          Y         N
Beijing  0.14285714 0.1111111 0.30000000 0.7000000
Chinese  0.42857143 0.2222222 0.39130435 0.6086957
Japan    0.07142857 0.2222222 0.09677419 0.9032258
Macao    0.14285714 0.1111111 0.30000000 0.7000000
Shanghai 0.14285714 0.1111111 0.30000000 0.7000000
Tokyo    0.07142857 0.2222222 0.09677419 0.9032258

This is coming correct now

Predicted textmodel of type: Naive Bayes

       lp(N)     lp(Y)     Pr(N)  Pr(Y) Predicted
**d5 -9.206303 -7.808069    0.1981 0.8019         N**

The prediction should be Y as Pr(Y)>Pr(N) but its is giving N

Please fix it to get the correct predictions.

kbenoit · 2017-01-18T23:21:32Z

Thanks! Should be ok now.

Note that there was a worse bug, when the Pc was wrongly ordered, it also affected the computation of fitted likelihoods. That's all corrected now and the output matches the textbook example.

> predict(textmodel_NB(trainingset, trainingclass, prior = "docfreq"))
Predicted textmodel of type: Naive Bayes

       lp(Y)     lp(N)     Pr(Y)  Pr(N) Predicted
d1 -3.928188 -6.591674    0.9348 0.0652         Y
d2 -3.928188 -6.591674    0.9348 0.0652         Y
d3 -3.080890 -5.087596    0.8815 0.1185         Y
d4 -6.413095 -5.898527    0.3741 0.6259         N
d5 -8.107690 -8.906681    0.6898 0.3102         Y

vibhutittu · 2017-01-19T20:23:11Z

Thanks kbnenoit for your quick responses and fixes. The package is really good and fast. Regards, Vibhuti Gupta, Graduate Student Computer Science Research Assistant(IMMAP) Texas Tech University Ph: 8065006843 E-mail: vibhuti.gupta@ttu.edu

________________________________ From: Kenneth Benoit <notifications@github.com> Sent: Wednesday, January 18, 2017 5:21:33 PM To: kbenoit/quanteda Cc: Gupta, Vibhuti; Author Subject: Re: [kbenoit/quanteda] Predicted output coming wrong for textmodel_nb prediction (#476) Thanks! Should be ok now.

predict(textmodel_NB(trainingset, trainingclass, prior = "docfreq"))

Predicted textmodel of type: Naive Bayes lp(Y) lp(N) Pr(Y) Pr(N) Predicted d1 -3.928188 -6.591674 0.9348 0.0652 Y d2 -3.928188 -6.591674 0.9348 0.0652 Y d3 -3.080890 -5.087596 0.8815 0.1185 Y d4 -6.413095 -5.898527 0.3741 0.6259 N d5 -8.107690 -8.906681 0.6898 0.3102 Y - You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#476 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ASZS5FgUO8caASrBWF12O__gvRgNEvYjks5rTp59gaJpZM4LnK-W>.

kbenoit · 2017-01-19T21:47:36Z

Thanks! Would love you have you describe your experience through feedback in issue #461.

kbenoit closed this as completed in ca927af Jan 18, 2017

kbenoit reopened this Jan 19, 2017

kbenoit closed this as completed Jan 19, 2017

kbenoit mentioned this issue Feb 7, 2017

Error in Naive Bayes with non uniform priors #546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predicted output coming wrong for textmodel_nb prediction #476

Predicted output coming wrong for textmodel_nb prediction #476

vibhutittu commented Jan 18, 2017 •

edited by kbenoit

kbenoit commented Jan 18, 2017 •

edited

vibhutittu commented Jan 19, 2017 via email

kbenoit commented Jan 19, 2017

Predicted output coming wrong for textmodel_nb prediction #476

Predicted output coming wrong for textmodel_nb prediction #476

Comments

vibhutittu commented Jan 18, 2017 • edited by kbenoit

kbenoit commented Jan 18, 2017 • edited

vibhutittu commented Jan 19, 2017 via email

kbenoit commented Jan 19, 2017

vibhutittu commented Jan 18, 2017 •

edited by kbenoit

kbenoit commented Jan 18, 2017 •

edited