Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods for classifying and returning the score as well #39

Merged
merged 3 commits into from
Sep 3, 2015

Conversation

kreynolds
Copy link
Contributor

I've been using a monkeypatch similar to this on the old classifier for years to help with various kinds of keyword research... hoping to get it (and eventually several others) included upstream with the classifier rebirth

@Ch4s3
Copy link
Member

Ch4s3 commented Aug 20, 2015

I' like the direction here, it relates to #22, which is awesome. But, I think we should maintain the current method and add classify_with_score() as a separate method.

@kreynolds
Copy link
Contributor Author

Ok .. in the interest of reducing code duplication, are on you on board with the idea of having a separate private method that contained the main logic that was then called by both classify() and classify_with_score()?

I looked at #22 and I think it makes more sense to leave the threshold cutoff in the calling application instead of embedded in classify() since threshold failure could be handled differently

@Ch4s3
Copy link
Member

Ch4s3 commented Aug 21, 2015

Yeah I think that's the way to go. What about you @parkr?

@kreynolds
Copy link
Contributor Author

@Ch4s3 should I go ahead and make the change or shall we wait for feedback from @parkr ?

@parkr
Copy link
Member

parkr commented Aug 28, 2015

Go ahead with the plan.

How would you propose handling threshold failure?

@kreynolds
Copy link
Contributor Author

Handling threshold failure is entirely up to the application .. could be something as simple as (note the scores are completely fabricated, this is just an example):

# Return the category for this item, if there is one
category, score = classify_with_score(string)
score > -0.5 ? category : nil

# Raise an exception
category, score = classify_with_score(email_body)
raise SpamError.new("Email marked as spam", score) if category == 'spam' and score > -0.5

Basically the threshold handling should be done application side, not library side imo.

@parkr
Copy link
Member

parkr commented Aug 31, 2015

@kreynolds Thanks for the example. I agree.

@Ch4s3
Copy link
Member

Ch4s3 commented Aug 31, 2015

I like it

@kreynolds
Copy link
Contributor Author

I left the scored_categories method public .. seems like it could be useful in some contexts. I also left the bayes code alone .. it already just uses another method.

@kreynolds
Copy link
Contributor Author

@Ch4s3 @parkr while I have your attention .. bump

lsi = ClassifierReborn::LSI.new
lsi.add_item @str1, "Dog"
lsi.add_item @str2, "Dog"
lsi.add_item @str3, "Cat"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something totally weird with the formatting in this file. could you match your code to whatever the rest of the code is doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is weird .. looks the same to me in textmate but obviously there are some actual tabs in there somewhere (whitespace is a bit weird through this whole repo). I'll take a look.

Ch4s3 added a commit that referenced this pull request Sep 3, 2015
Add methods for classifying and returning the score as well
@Ch4s3 Ch4s3 merged commit 3c3b1c5 into jekyll:master Sep 3, 2015
Ch4s3 added a commit that referenced this pull request Sep 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants