New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can Naive bayes in sklearn support good-turing smoothing? #12862
Comments
Are you talking about multinomial naive Bayes or smoothing the class
distribution? If the justification is clear I'm sure we would consider a pr
implementing alternative smoothing.
|
I'm so sorry for unclearly issue my problem. |
I don't see why not. I'm not sure if Good-Turing is the only reasonable pick. |
I am new to machine learning area. However I search and find smoothing below. |
@jnothman How do you see this to be implemented? Should we add a |
I don't think users will benefit much from having a wide range of choices. But yes, it would be controlled by such a parameter. |
Has someone started working on this? I'd like to follow the specific code that went into doing this so I understand how you structured things when adding new features |
Working on it now. I've only tested the code manually but I'll open a PR so that you can see the structure. I'm open to any suggestions as to what smoothing methods should be implemented. Jelinek-Mercer, Absolute Discounting, and Dirichlet all seem like good choices (their performance depends on the size of training data) and are very straightforward but I'm not sure how many we want to support. |
@@ -698,10 +700,15 @@ class MultinomialNB(BaseDiscreteNB):
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to
Information Retrieval. Cambridge University Press, pp. 234-265.
https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html
+
+ Gale, William A., and Geoffrey Sampson. 1996. Good-Turing Frequency
+ Estimation without Tears. Brighton: University of Sussex.
+ https://www.grsampson.net/AGtf1.html
"""
- def __init__(self, alpha=1.0, fit_prior=True, class_prior=None):
+ def __init__(self, alpha=1.0, smoothing='additive', fit_prior=True, class_prior=None):
self.alpha = alpha
+ self.smoothing = smoothing
self.fit_prior = fit_prior
self.class_prior = class_prior
@@ -714,11 +721,11 @@ class MultinomialNB(BaseDiscreteNB):
def _update_feature_log_prob(self, alpha):
"""Apply smoothing to raw counts and recompute log probabilities"""
- smoothed_fc = self.feature_count_ + alpha
- smoothed_cc = smoothed_fc.sum(axis=1)
-
- self.feature_log_prob_ = (np.log(smoothed_fc) -
- np.log(smoothed_cc.reshape(-1, 1)))
+ if self.smoothing == 'additive':
+ self.feature_log_prob_ = self._additive_smoothing(alpha)
+ elif self.smoothing == 'good-turing':
+ self.feature_log_prob_ = np.apply_along_axis(
+ self._simple_good_turing, 1, self.feature_count_)
def _joint_log_likelihood(self, X):
"""Calculate the posterior log probability of the samples X"""
@@ -728,6 +735,77 @@ class MultinomialNB(BaseDiscreteNB):
return (safe_sparse_dot(X, self.feature_log_prob_.T) +
self.class_log_prior_)
+ def _additive_smoothing(self, alpha):
+ smoothed_fc = self.feature_count_ + alpha
+ smoothed_cc = smoothed_fc.sum(axis=1)
+ return np.log(smoothed_fc) - np.log(smoothed_cc.reshape(-1, 1))
+
+ def _simple_goog_turing(self, fc):
+ N = fc.sum()
+ # Get the frequencies of frequencies
+ n = dict()
+ for r in fc:
+ if r == 0:
+ continue
+ n[r] = n.get(r, 0) + 1
+
+ # The probability of unseen samples
+ P_0 = n.get(1, 0)/N
+
+ # Compute the Z values
+ r = np.array(sorted(n.items(), key=lambda keyval: keyval[0]), dtype='int')[:, 0]
+ n = np.array(sorted(n.items(), key=lambda keyval: keyval[0]), dtype='int')[:, 1]
+ Z = dict()
+ Z[r[0]] = 2*n[0] / r[1]
+ Z[r[-1]] = n[-1] / (r[-1] - r[-2])
+ for (idx, j) in enumerate(r):
+ if idx == 0 or idx >= len(r) - 1:
+ continue
+ i = r[idx-1]
+ k = r[idx+1]
+ Z[j] = 2*n[idx]/(k-i)
+ Z = np.array(sorted(Z.items()))[:, 1]
+
+ # Compute S using linear regression
+ log_r = np.log10(r).reshape(-1, 1)
+ log_Z = np.log10(Z).reshape(-1, 1)
+ reg = LinearRegression().fit(log_r, log_Z)
+ a, b = reg.intercept_, reg.coef_[0]
+ S = np.power(10, a+b*np.log(np.arange(1, np.max(r)+2)))
+
+ # Compute r*
+ cease_x = False
+ r_star = np.zeros(len(r))
+ for i in range(len(r)):
+ y = (r[i]+1)*(S[r[i]]/S[r[i]-1])
+ if i >= len(r) - 1 or r[i+1] != r[i] + 1:
+ cease_x = True
+ if cease_x:
+ r_star[i] = y
+ continue
+ x = (r[i]+1)*(n[i+1]/n[i])
+ threshold = 1.96*np.sqrt(((r[i]+1)**2)*(n[i+1]/n[i]**2)*(1+(n[i+1]/n[i])))
+ if np.abs(x-y) <= threshold:
+ r_star[i] = y
+ cease_x = True
+ continue
+ r_star[i] = x
+
+ # Compute the probability of each frequency
+ N_prime = np.inner(n, r_star)
+ p_r = (1-P_0)*(r_star/N_prime)
+
+ # Calculate probability of each feature
+ total_unseen = np.count_nonzero(fc == 0)
+ unseen_prob = P_0/total_unseen
+ prob = np.zeros(fc.size)
+ for i in range(len(p_r)):
+ idx = np.where(fc == r[i])
+ prob[idx] = p_r[i]
+ prob[np.where(fc == 0)] = unseen_prob
+
+ return np.log(prob)
+ This is what I have so far. The notation is the same as in the paper linked in the references. Since Good-Turing uses raw counts, the code won't work if the input is transformed using, for example, TfidfTransformer. Would throwing an error be a reasonable solution when the input is not raw counts? Below are some other smoothing algorithms from Enhancing Naive Bayes with Various Smoothing Methods for Short Text Classification that might be worth considering. def jelinek_mercer(fc, alpha):
mle = fc.sum(axis=0)/fc.sum()
return np.log((1-alpha)*(fc/fc.sum(axis=1).reshape(-1,1)) + alpha*mle)
def dirichlet(fc, alpha):
mle = fc.sum(axis=0)/fc.sum()
smoothed_fc = fc + alpha*mle
smoothed_cc = smoothed_fc.sum(axis=1)
return np.log(smoothed_fc) - np.log(smoothed_cc.reshape(-1, 1))
def absolute_discounting(fc, alpha):
mle = fc.sum(axis=0)/fc.sum()
count_unique = np.count_nonzero(fc, axis=1).reshape(-1,1)
cc = fc.sum(axis=1).reshape(-1,1)
return (np.log(np.maximum(fc - alpha, 0) + alpha*count_unique*mle) -
np.log(cc))
def two_stage(fc, lam, mu):
mle = fc.sum(axis=0)/fc.sum()
smoothed_fc = fc + mu*mle
smoothed_cc = smoothed_fc.sum(axis=1)
return np.log((1-lam)*(smoothed_fc/smoothed_cc) + lam*mle) |
Please offer changes as a pull request. They are much easier to comment on.
|
Description
Naive bayes in sklearn seems do not support additive smoothing only.
Can we make some change to make it support good-turing smoothing or other smoothing and how?
In some experiment, good-turing smoothing is better than additive smoothing.
The text was updated successfully, but these errors were encountered: