Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache total word counts per category #4

Merged
merged 1 commit into from
Nov 9, 2014
Merged

Conversation

parkr
Copy link
Member

@parkr parkr commented Aug 10, 2014

@parkr
Copy link
Member Author

parkr commented Aug 10, 2014

From @dabble:

Original code recalculated total word count per category for each classification.
On large datasets, this is really expensive.

On my dataset with 58k records in training set and 11k records in classifying set,
classification times improved from 1560 seconds to 107 seconds, and training time
only went from 593 to 691 seconds.

and

Note, I also attempted pulling category.to_s out of the loop and changing
s = category_words.has_key?(word) ? category_words[word] : 0.1
TO
s = category_words[word] || 0.1
and neither change made a significant difference in performance

@parkr
Copy link
Member Author

parkr commented Aug 12, 2014

Rebased on master.

@Ch4s3
Copy link
Member

Ch4s3 commented Oct 30, 2014

Rebased on master.

#edit Is there a plan for bringing this into master?

@parkr
Copy link
Member Author

parkr commented Oct 31, 2014

Is there a plan for bringing this into master?

I just ported it over. Is it particularly beneficial to you?

@Ch4s3
Copy link
Member

Ch4s3 commented Nov 1, 2014

Is it particularly beneficial to you?

Not especially, but I can understand why it was originally done.

@parkr parkr force-pushed the cardmagic/classifier/pull/8 branch from 571dfc7 to c6af85a Compare November 8, 2014 18:57
@parkr parkr force-pushed the cardmagic/classifier/pull/8 branch from c6af85a to 13c1e58 Compare November 8, 2014 19:43
@parkr parkr merged commit 57e195f into master Nov 9, 2014
@parkr parkr deleted the cardmagic/classifier/pull/8 branch November 9, 2014 00:01
parkr added a commit that referenced this pull request Nov 9, 2014
Copy link

@macotouenaca macotouenaca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ありがとうございます

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants