Memory and Redis backend support #84

ibnesayeed · 2016-12-22T06:36:04Z

This PR is to support #81

TODO

parkr

Great start! Left a few comments to align with the Jekyll coding style and some helpful Ruby tips. 😄

parkr · 2016-12-22T19:33:50Z

lib/classifier-reborn/backends/bayes_memory_backend.rb

@@ -0,0 +1,72 @@
+# Author: Sawood Alam <@ibnesayeed>


We don't have any author comments in our code – your authorship is present in the Git history and here on GitHub. 😄

That will be gone. I usually don't add such things, but I have seen many other files here that have Author, License, and some other metadata associated in comments, so I thought it was the style of this code base.

parkr · 2016-12-22T19:35:03Z

lib/classifier-reborn/backends/bayes_memory_backend.rb

+    end
+
+    def total_words
+      @total_words


For these, you can use the built-in attr_reader.

Replace this method with

attr_reader :total_words

It's convenient short-hand for

def total_words @total_words end

parkr · 2016-12-22T19:35:48Z

lib/classifier-reborn/backends/bayes_memory_backend.rb

+      @total_words += diff
+    end
+
+    def total_trainings


Use attr_reader here as well. And they can be combined:

attr_reader :total_words, :total_trainings

Just a comma-separated list of symbols, where the symbol is the name of the variable and the name of the method.

I am aware of attr_reader, but I thought in other backends they wont be attributes. So for the sake of uniformity, I used a more usual method definition. However, I can certainly change it in the Memory backend in the more concise form.

I think it would be unlikely that any future backend wouldn't have these available as attributes.

@Ch4s3, in the Redis backend they are not plain in-memory attributes. They are stored in the Redis instance. Keeping them in memory as plain attributes would require to traverse through all the records and when initialized from an existing dataset. This was the whole point of using another backend to keep the counters and other state information in the backend store while keeping the configurations and other similar things in the memory.

parkr · 2016-12-22T19:37:44Z

lib/classifier-reborn/backends/bayes_memory_backend.rb

+      @categories[category] ||= Hash.new(0)
+    end
+
+    def category_keys


Should this be categories? They keys of the @categories Hash correspond to the categories that the user has input, so the user might like to write categories to get the list of categories that have data.

In the bayes.rb file, there is a method named categories that returns a more natural list of category names. However, in the data structure, those categories are converted to symbols (see CategoryNamer class: name.to_s.tr('_', ' ').capitalize.intern). I want to keep the transformations and logic away from the storage (backend) class. So, the category_keys method returns the form that was stored in the data structure.

That makes sense.

parkr · 2016-12-22T19:40:17Z

lib/classifier-reborn/bayes.rb

      options = { language:         'en',
                  auto_categorize:  false,
                  enable_threshold: false,
                  threshold:        0.0,
-                  enable_stemmer:   true
+                  enable_stemmer:   true,
+                  redis_con_str:    ''


I think the best option here is to use dependency injection. Thus the user does:

memory_backed_classifier = ClassifierReborn::Bayes.new redis_backed_classifier = ClassifierReborn::Bayes.new(:backend => ClassifierReborn::BayesRedisBackend.new("localhost", "6987"))

So they pass in a :backend key which is an instance of the relevant backend they wish to use. Then they can configure Redis and we don't have to worry about any backend configurations in this class.

I liked this approach, thanks! Initially, I thought about letting users directly create instance of the Redis class and pass that as for connection, but that would be too much abstraction. I will go ahead and make the Memory backend as default and let the user pass an instance of the backend to allow override.

ibnesayeed · 2016-12-22T21:16:04Z

@parkr, I have taken care of all your comments. Please have a look at the new changes and suggest for more changes if necessary. Meanwhile, I can work on the Redis backend.

ibnesayeed · 2016-12-23T03:49:20Z

All tests are passing in both Memory and Redis backends. If a Redis server is not running on the same machine with default configs then Redis backend tests are omitted.

Currently, I have just copied the BayesianTest class to BayesianRedisTest, updated the setup method, and added cleanup method (because Redis is persistent). I was thinking of somehow not repeating the test cases and run all the test cases of BayesianTest twice with different configurations, but can't see a clean way of doing this. I also tried inheriting BayesianRedisTest from BayesianTest and only overriding the setup and cleanup methods while expecting that all the test_* methods will be inherited and run, but it looks like that's not how tests run. Any other approach to DRY up the test cases?

ibnesayeed · 2016-12-23T15:43:22Z

Here is an article about DRYing up the test cases so that they can be used as modules, but it is for Minitest. We want similar capability in our test environment. Still looking for my options.

ibnesayeed · 2016-12-23T17:03:31Z

I was able to put all test cases in a module and include that in both the test classes. Yay!

Should the name BayesianTest be changed to BayesianMemoryTest in accordance to the BayesianRedisTest (and corresponding file name)?

Ch4s3 · 2016-12-28T15:10:19Z

I'd probably leave the file name the same to preserve history, unless you need to change the contents of the tests dramatically.

ibnesayeed · 2016-12-28T15:33:20Z

@Ch4s3, did you get a chance to review the code changes yet?

ibnesayeed · 2017-01-02T00:26:42Z

@parkr and @Ch4s3 if the code so far looks good then I can go ahead and write unit tests for the backend classes and update the README so that we can proceed with merging it before it gets stale.

# Conflicts: # .travis.yml # test/bayes/bayesian_test.rb

ibnesayeed · 2017-01-02T05:11:36Z

Added necessary unit tests for the newly added backend classes.
Merged recent changes in the master branch and made all the necessary changes to work with the new testing library.
Updated the README to illustrate the usage of the Redis backend.

It is no more a work in progress from my end. Please review the code and let me know if some changes are needed.

Ch4s3 · 2017-01-02T05:31:04Z

lib/classifier-reborn/bayes.rb

@@ -55,19 +59,20 @@ def train(category, text)
      category = CategoryNamer.prepare_name(category)

      # Add the category dynamically or raise an error
-      unless @categories.key?(category)
+      unless category_keys.include?(category)


I think we should keep .key? include? and key? are functionally the same, unless I'm missing something, and I think key is more clear.

@categories was a nested hash which has key? method. However, category_keys is an array of just the keys of that hash (which was abstracted to support Redis backend) and arrays don't have key? method defined. Am I missing something?

Ch4s3 · 2017-01-02T05:42:40Z

This generally looks very good @ibnesayeed! Its a big change, so I need to put a bit more time into looking it over. I'd also like to make sure @parkr is on board because I don't want to cause any upstream problems for Jekyll.

ibnesayeed · 2017-01-05T14:08:14Z

@Ch4s3 and @parkr do we have a real or pseudo-real (fairly good in size) training set that we can use to train and ask for scores of certain queries without this PR, then we can repeat the same with this PR in both Memory and Redis backends. If the scores come out to be exactly the same then we will have more confidence that nothing was changed while refactoring inline memory operations into a separate backend. I think I am essentially talking about an integration test.

Ch4s3 · 2017-01-05T14:32:52Z

@ibnesayeed no, but we should. I usually pull down some wikipedia articles to demo the gem, but that might not be enough.

ibnesayeed · 2017-01-05T17:50:21Z

@Ch4s3 I have tried to test it on a public data set about SMS spam classification. I am yet to match the results, but here is what I did.

Cases:

Pre-refactoring version
Post-refactoring Memory backend
Post-refactoring Redis backend

For each of these cases:

Trained 5474 instances of Ham and Spam classes
Generated scores of 100 untrained instances
Untrained 2000 instances from the already trained model
Generated scores of 100 untrained instances

Please find the test scripts, data, and results at https://gist.github.com/ibnesayeed/73680bb26c4ea2dc1c55651d126e57fb

Ch4s3 · 2017-01-05T20:27:16Z

I'll take a look as soon as possible.

ibnesayeed · 2017-01-05T22:50:07Z

Now that I am back to my desk and had a look at the produced output, they are exactly the same. The gist I posted before had some wrong data initially as I was not flushing the Redis database, so after successive runs, scores based on Redis backend were changing due to persistence of earlier tests. Now I have updated the gist.

I am now very confident that the implementation was safely refactored and extended without changing the existing behavior. Coding style and other things can be reviewed obviously.

Ch4s3 · 2017-01-05T23:08:07Z

Awesome! Maybe we should consider using that data for integration testing.

ibnesayeed · 2017-01-06T00:18:19Z

@Ch4s3: Awesome! Maybe we should consider using that data for integration testing.

That was going to be my next question, if you would be interested in getting that test data added in the repo with some test cases and more tests can be written based on that data later.

How should we put a test case for now? Should we just do what I did in that external script, then compare the results of the two backends or should we compare them against a static expected output?

ibnesayeed · 2017-01-07T21:41:03Z

@Ch4s3: I'd probably leave the file name the same to preserve history, unless you need to change the contents of the tests dramatically.

Actually, now that there are more files in the classifier-reborn/test/bayes directory with an emerging pattern, I think it now makes more sense to rename the file and the class to make it in sync with others. Otherwise it will be confusing for new contributors.

Ch4s3 · 2017-01-07T21:49:27Z

I agree

ibnesayeed · 2017-01-07T22:29:00Z

I agree

PR #102 should do.

* Disabled Redis disc persistence and refactored integration test, fixes #95 * Changed test class and file name as per #84

ibnesayeed added 2 commits December 16, 2016 10:41

A misspelling correction

51146d4

Bayes classifier refactoring to enable Memory and Redis backends

737e744

ibnesayeed mentioned this pull request Dec 22, 2016

Using Redis for the data structure #81

Closed

parkr reviewed Dec 22, 2016

View reviewed changes

ibnesayeed added 2 commits December 22, 2016 15:46

Removed authon name, added attr_reader shorthand

49012d1

Allow backend dependency injection

4feae4f

ibnesayeed added 3 commits December 22, 2016 20:43

DRY an implementation of the Memory backend

d4a0203

Redis backend implemented

43bfc01

Bayesian Redis backend tests added

67e50ca

Adding Redis in Travis for testing

ceae3a2

DRY Baysian tests using modules

c1b89ca

ibnesayeed added 5 commits January 1, 2017 21:33

Added unit tests for Memory and Redis backends

d4c8ade

Merge remote-tracking branch 'upstream/master' into redis

02d9095

# Conflicts: # .travis.yml # test/bayes/bayesian_test.rb

Add backend-specific classifiers that are needed in some tests

36d085e

Replaced negative assersions with refute

3a1ba7d

Redis usage documented

9ed81eb

ibnesayeed changed the title ~~WIP: Memory and Redis backend support~~ Memory and Redis backend support Jan 2, 2017

Ch4s3 reviewed Jan 2, 2017

View reviewed changes

Ch4s3 merged commit 92112f5 into jekyll:master Jan 5, 2017

ibnesayeed deleted the redis branch January 6, 2017 00:19

ibnesayeed mentioned this pull request Jan 6, 2017

Bayes integration test of Memory and Redis backends with real data #92

Merged

ibnesayeed mentioned this pull request Jan 7, 2017

Rename Bayes memory test class #102

Merged

Ch4s3 pushed a commit that referenced this pull request Jan 7, 2017

Rename Bayes memory test class (#102)

e1580bb

* Disabled Redis disc persistence and refactored integration test, fixes #95 * Changed test class and file name as per #84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory and Redis backend support #84

Memory and Redis backend support #84

ibnesayeed commented Dec 22, 2016 •

edited

Loading

parkr left a comment

parkr Dec 22, 2016

ibnesayeed Dec 22, 2016

parkr Dec 22, 2016

parkr Dec 22, 2016

ibnesayeed Dec 22, 2016

Ch4s3 Dec 28, 2016

ibnesayeed Dec 28, 2016

parkr Dec 22, 2016

ibnesayeed Dec 22, 2016 •

edited

Loading

Ch4s3 Jan 2, 2017

parkr Dec 22, 2016

ibnesayeed Dec 22, 2016

ibnesayeed commented Dec 22, 2016

ibnesayeed commented Dec 23, 2016

ibnesayeed commented Dec 23, 2016 •

edited

Loading

ibnesayeed commented Dec 23, 2016

Ch4s3 commented Dec 28, 2016

ibnesayeed commented Dec 28, 2016

ibnesayeed commented Jan 2, 2017

ibnesayeed commented Jan 2, 2017

Ch4s3 Jan 2, 2017

ibnesayeed Jan 2, 2017

Ch4s3 commented Jan 2, 2017

ibnesayeed commented Jan 5, 2017

Ch4s3 commented Jan 5, 2017

ibnesayeed commented Jan 5, 2017

Ch4s3 commented Jan 5, 2017

ibnesayeed commented Jan 5, 2017 •

edited

Loading

Ch4s3 commented Jan 5, 2017

ibnesayeed commented Jan 6, 2017

ibnesayeed commented Jan 7, 2017

Ch4s3 commented Jan 7, 2017

ibnesayeed commented Jan 7, 2017

Memory and Redis backend support #84

Memory and Redis backend support #84

Conversation

ibnesayeed commented Dec 22, 2016 • edited Loading

TODO

parkr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibnesayeed Dec 22, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibnesayeed commented Dec 22, 2016

ibnesayeed commented Dec 23, 2016

ibnesayeed commented Dec 23, 2016 • edited Loading

ibnesayeed commented Dec 23, 2016

Ch4s3 commented Dec 28, 2016

ibnesayeed commented Dec 28, 2016

ibnesayeed commented Jan 2, 2017

ibnesayeed commented Jan 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ch4s3 commented Jan 2, 2017

ibnesayeed commented Jan 5, 2017

Ch4s3 commented Jan 5, 2017

ibnesayeed commented Jan 5, 2017

Ch4s3 commented Jan 5, 2017

ibnesayeed commented Jan 5, 2017 • edited Loading

Ch4s3 commented Jan 5, 2017

ibnesayeed commented Jan 6, 2017

ibnesayeed commented Jan 7, 2017

Ch4s3 commented Jan 7, 2017

ibnesayeed commented Jan 7, 2017

ibnesayeed commented Dec 22, 2016 •

edited

Loading

ibnesayeed Dec 22, 2016 •

edited

Loading

ibnesayeed commented Dec 23, 2016 •

edited

Loading

ibnesayeed commented Jan 5, 2017 •

edited

Loading