Classifier evaluation and validation #142

ibnesayeed · 2017-01-19T06:48:40Z

It is still a work in progress as per #71...

Ch4s3 · 2017-01-19T15:08:36Z

looks pretty good so far

ibnesayeed · 2017-01-20T05:43:43Z

Now we are generating k-fold validation accuracy report that looks like this:

------------------ Stats ------------------
Run     Total   Correct Incorrect  Accuracy
-------------------------------------------
  1       500       486        14   0.97200
  2       500       500         0   1.00000
  3       500       499         1   0.99800
  4       500       495         5   0.99000
  5       500       500         0   1.00000
  6       500       499         1   0.99800
  7       500       498         2   0.99600
  8       500       499         1   0.99800
  9       500       498         2   0.99600
 10       500       498         2   0.99600
-------------------------------------------
All      5000      4972        28   0.99440

ibnesayeed · 2017-01-20T07:43:26Z

Now printing confusion matrix over the total sample set along with the accuracy stats of accumulated and individual runs or k-fold validation. The code is capable enough to print confusion matrix of individual runs, but that would be overwhelming output. Many methods are defined in a way that they can produce meaningful output for one or more instances of conf_mat objects passed.

The code is written with multi-class analysis in mind (not just binary). That's why we are only printing the confusion matrix, but not the confusion table (the one that has TP/TN/FP/FN stats) as it would require the classes to be binary and some way to tell which class is considered positive. Perhaps we can provide a parameter so that user can tell the name of the positive class in binary classes then we can conditionally generate more statistics on the data. However, we can still calculate precision and recall for each class without any supplementary information (that would be my next task).

$ rake validate

------------------ Stats ------------------
Run     Total   Correct Incorrect  Accuracy
-------------------------------------------
  1       500       486        14   0.97200
  2       500       499         1   0.99800
  3       500       499         1   0.99800
  4       500       498         2   0.99600
  5       500       496         4   0.99200
  6       500       499         1   0.99800
  7       500       500         0   1.00000
  8       500       499         1   0.99800
  9       500       497         3   0.99400
 10       500       499         1   0.99800
-------------------------------------------
All      5000      4972        28   0.99440

---------------- Confusion Matrix -----------------
Predicted ->          Ham         Spam        Total
---------------------------------------------------
Ham                  4307           20         4327
Spam                    8          665          673
---------------------------------------------------
Total                4315          685         5000

ibnesayeed · 2017-01-20T14:42:02Z

I think I have got an idea, we can report stats for each class as the positive class. This will be one versus all situation repeated for all classes.

Ch4s3 · 2017-01-20T22:33:12Z

I'll defer to your judgement here, as this is a bit out of my wheelhouse.

ibnesayeed · 2017-01-21T02:44:29Z

Now reporting confusion matrix with various derived stats for each class treated as positive class one at a time. The code is refactored in a way that it can be reused if one knows the positive class and wants to generate reports only for that.

$ rake validate

------------------ Stats ------------------
Run     Total   Correct Incorrect  Accuracy
-------------------------------------------
  1       500       485        15   0.97000
  2       500       497         3   0.99400
  3       500       497         3   0.99400
  4       500       497         3   0.99400
  5       500       499         1   0.99800
  6       500       497         3   0.99400
  7       500       500         0   1.00000
  8       500       499         1   0.99800
  9       500       499         1   0.99800
 10       500       498         2   0.99600
-------------------------------------------
All      5000      4968        32   0.99360

---------------- Confusion Matrix -----------------
Predicted ->          Ham         Spam        Total
---------------------------------------------------
Ham                  4305           22         4327
Spam                   10          663          673
---------------------------------------------------
Total                4315          685         5000

# Positive class: Ham
Total population   : 5000
Condition positive : 4327
Condition negative : 673
True positive      : 4305
True negative      : 663
False positive     : 10
False negative     : 22
Prevalence         : 0.8654
Specificity        : 0.9851411589895989
Recall             : 0.9949156459440721
Precision          : 0.9976825028968713
Accuracy           : 0.9936
F1 score           : 0.9962971534367044

# Positive class: Spam
Total population   : 5000
Condition positive : 673
Condition negative : 4327
True positive      : 663
True negative      : 4305
False positive     : 22
False negative     : 10
Prevalence         : 0.1346
Specificity        : 0.9949156459440721
Recall             : 0.9851411589895989
Precision          : 0.9678832116788321
Accuracy           : 0.9936
F1 score           : 0.9764359351988218

…sion matrix

ibnesayeed · 2017-01-21T03:43:24Z

----------------------- Confusion Matrix ----------
Predicted ->          Ham         Spam        Total
---------------------------------------------------
Ham                  4307           20         4327
Spam                    6          667          673
---------------------------------------------------
Total                4313          687         5000

Confusion matrix now also reports class-wise precision and recall in last row and last column respectively. Although, not tested yet, but all the functionalities implemented so far should work in multi-class datasets equally well.

----------------------- Confusion Matrix -----------------------
Predicted ->          Ham         Spam        Total       Recall
----------------------------------------------------------------
Ham                  4307           20         4327      0.99538
Spam                    6          667          673      0.99108
----------------------------------------------------------------
Total                4313          687         5000
Precision         0.99861      0.97089

# Conflicts: # lib/classifier-reborn/bayes.rb

ibnesayeed · 2017-01-22T01:33:45Z

This is what a typical validation task result now looks like.

$ rake validate
/usr/local/bin/ruby -w -I"lib:lib" -I"/usr/local/bundle/gems/rake-12.0.0/lib" "/usr/local/bundle/gems/rake-12.0.0/lib/rake/rake_test_loader.rb" "test/validators/classifier_validation.rb" 

# ClassifierValidation

===================== lsi_classifier_5_fold_cross_validate =====================
TODO: LSI is not validatable until all of the [:train, :classify, :categories] methods are implemented!
--------------------------------------------------------------------------------

================ bayes_classifier_10_fold_cross_validate_memory ================

------------------ Stats ------------------
Run     Total   Correct Incorrect  Accuracy
-------------------------------------------
  1       500       484        16   0.96800
  2       500       489        11   0.97800
  3       500       490        10   0.98000
  4       500       484        16   0.96800
  5       500       487        13   0.97400
  6       500       489        11   0.97800
  7       500       489        11   0.97800
  8       500       488        12   0.97600
  9       500       488        12   0.97600
 10       500       491         9   0.98200
-------------------------------------------
All      5000      4879       121   0.97580

----------------------- Confusion Matrix -----------------------
Predicted ->          Ham         Spam        Total       Recall
----------------------------------------------------------------
Ham                  4230           97         4327      0.97758
Spam                   24          649          673      0.96434
----------------------------------------------------------------
Total                4254          746         5000
Precision         0.99436      0.86997

# Positive class: Ham
Total population   : 5000
Condition positive : 4327
Condition negative : 673
True positive      : 4230
True negative      : 649
False positive     : 24
False negative     : 97
Prevalence         : 0.8654
Specificity        : 0.9643387815750372
Recall             : 0.9775826207534088
Precision          : 0.9943582510578279
Accuracy           : 0.9758
F1 score           : 0.9858990793613798

# Positive class: Spam
Total population   : 5000
Condition positive : 673
Condition negative : 4327
True positive      : 649
True negative      : 4230
False positive     : 97
False negative     : 24
Prevalence         : 0.1346
Specificity        : 0.9775826207534088
Recall             : 0.9643387815750372
Precision          : 0.8699731903485255
Accuracy           : 0.9758
F1 score           : 0.9147286821705426

--------------------------------------------------------------------------------

================= bayes_classifier_3_fold_cross_validate_redis =================

------------------ Stats ------------------
Run     Total   Correct Incorrect  Accuracy
-------------------------------------------
  1      1666      1630        36   0.97839
  2      1666      1622        44   0.97359
  3      1666      1611        55   0.96699
-------------------------------------------
All      4998      4863       135   0.97299

----------------------- Confusion Matrix -----------------------
Predicted ->          Ham         Spam        Total       Recall
----------------------------------------------------------------
Ham                  4212          113         4325      0.97387
Spam                   22          651          673      0.96731
----------------------------------------------------------------
Total                4234          764         4998
Precision          0.9948      0.85209

# Positive class: Ham
Total population   : 4998
Condition positive : 4325
Condition negative : 673
True positive      : 4212
True negative      : 651
False positive     : 22
False negative     : 113
Prevalence         : 0.8653461384553821
Specificity        : 0.9673105497771174
Recall             : 0.9738728323699422
Precision          : 0.9948039678790742
Accuracy           : 0.9729891956782714
F1 score           : 0.9842271293375394

# Positive class: Spam
Total population   : 4998
Condition positive : 673
Condition negative : 4325
True positive      : 651
True negative      : 4212
False positive     : 113
False negative     : 22
Prevalence         : 0.13465386154461784
Specificity        : 0.9738728323699422
Recall             : 0.9673105497771174
Precision          : 0.8520942408376964
Accuracy           : 0.9729891956782714
F1 score           : 0.906054279749478

--------------------------------------------------------------------------------


Finished in 24.51805s

ibnesayeed · 2017-01-22T01:40:59Z

I feel it is quite full-featured now. We still need some unit tests for individual methods of the module, RDoc, and user documentation, but those can be handled in a separate PR. @Ch4s3 please feel free to merge it.

@marciovicente, Could you please have a look at the reports in the last message and see if anything important is missing or wrong?

… readability

ibnesayeed · 2017-01-23T04:10:41Z

@Ch4s3 I consider this one done from my side. I have added exhaustive user documentation (#145), hence RDoc is less important for this one, though we can add that in a separate PR. Unit tests will also be added separately as this has already become a big pile of commits and file changes.

marciovicente · 2017-01-24T13:39:54Z

@ibnesayeed It's a nice report! Seems like a Weka output 👏
Looks awesome to me! ✅

ibnesayeed added 5 commits January 18, 2017 15:36

Merge remote-tracking branch 'upstream/master'

b4eb8c5

Benchmark reporter improvement

c036f71

Added reset method placeholder

6cc36fd

Added fundamental validation API

bca22d3

Added initial validation task with custom reporting

87a63dc

ibnesayeed added 2 commits January 19, 2017 15:41

Merge remote-tracking branch 'upstream/master' into validation

6899e63

k-fold accuracy report generated

3da09aa

ibnesayeed added 2 commits January 20, 2017 00:46

Hardcoded number removed

cca6eb0

Confusion matrix generated

0991855

ibnesayeed mentioned this pull request Jan 20, 2017

Thoughts about validation #71

Closed

Reporting confusion matrix with various derived stats

1b26dac

ibnesayeed added 4 commits January 20, 2017 21:46

Reordered a derived attribute

50d6866

Merge remote-tracking branch 'upstream/master'

d2fc1ad

Fixed test failutres if redis is not running

8d66f93

Added row and column for class-wise precision and recall in the confu…

512d633

…sion matrix

ibnesayeed added 6 commits January 21, 2017 15:45

Remove warnings of uninitialized instances of test teardown

626d749

Checking right conditiones for the test teardown

2194580

Added reset methods in Bayes, LSI, and Bayes backends

dd38cec

Corrected teardown conditionals

f4115aa

Added tests for reset functionality

8b7e94a

Corrected typo in documentation

3803dff

ibnesayeed mentioned this pull request Jan 21, 2017

Making LSI evaluatable #144

Closed

Merge branch 'test-fix' into validation

987b6f8

# Conflicts: # lib/classifier-reborn/bayes.rb

ibnesayeed added 2 commits January 21, 2017 20:24

Validation tasks added for Redis backend and LSI

609c52d

Added a message if Redis server is not running during validation

75187cc

ibnesayeed changed the title ~~WIP: Classifier evaluation and validation~~ Classifier evaluation and validation Jan 22, 2017

ibnesayeed mentioned this pull request Jan 22, 2017

Classifier validation user documenation #145

Merged

ibnesayeed added 8 commits January 22, 2017 15:59

Loaded validator module in the gem

a7753a0

A more meaningful table title

96e875a

Renamed a method to better reflect the role

7308c76

Added classifier auto instantiation to validate method, reordered for…

89f40a1

… readability

Added options argument in validate method

be813fd

Renamed methods with more suitable names

1e5537f

Added optional header printing in run report

1c3d1af

Adding accuracy in the confusion matrix reporting

49d83ca

Ch4s3 merged commit 15ec41a into jekyll:master Jan 24, 2017

ibnesayeed deleted the validation branch February 8, 2017 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classifier evaluation and validation #142

Classifier evaluation and validation #142

ibnesayeed commented Jan 19, 2017 •

edited

Loading

Ch4s3 commented Jan 19, 2017

ibnesayeed commented Jan 20, 2017

ibnesayeed commented Jan 20, 2017

ibnesayeed commented Jan 20, 2017

Ch4s3 commented Jan 20, 2017

ibnesayeed commented Jan 21, 2017 •

edited

Loading

ibnesayeed commented Jan 21, 2017

ibnesayeed commented Jan 22, 2017

ibnesayeed commented Jan 22, 2017

ibnesayeed commented Jan 23, 2017

marciovicente commented Jan 24, 2017

Classifier evaluation and validation #142

Classifier evaluation and validation #142

Conversation

ibnesayeed commented Jan 19, 2017 • edited Loading

Ch4s3 commented Jan 19, 2017

ibnesayeed commented Jan 20, 2017

ibnesayeed commented Jan 20, 2017

ibnesayeed commented Jan 20, 2017

Ch4s3 commented Jan 20, 2017

ibnesayeed commented Jan 21, 2017 • edited Loading

ibnesayeed commented Jan 21, 2017

ibnesayeed commented Jan 22, 2017

ibnesayeed commented Jan 22, 2017

ibnesayeed commented Jan 23, 2017

marciovicente commented Jan 24, 2017

ibnesayeed commented Jan 19, 2017 •

edited

Loading

ibnesayeed commented Jan 21, 2017 •

edited

Loading