New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sweep scoring #328

Merged
merged 15 commits into from Jan 15, 2019

Conversation

Projects
None yet
5 participants
@balladeer
Copy link
Contributor

balladeer commented Jan 6, 2019

Implements a method to score and optimize a detector's results on the NAB corpus by 'sweeping' through an anomaly-sorted list of data points, progressively scoring more and more points as the threshold for what constitutes an 'active' prediction is lowered.

Running locally in a virtual machine, I can score an entire detector (full corpus, all profiles) in about 20 seconds.

Some thoughts:

  • All of the unit tests pass locally. I've added some for the new code and updated existing unit tests.
  • I'm not attached to the 'Sweeper' name at all, but I needed something to use, given that the same core code is used in both the optimization and scoring steps.
  • I've compared this method vs. the results stored in the results/ folder. Any differences in final score tended to be small, and seem to be accompanied by slight updates to the 'best' threshold, as well. I think that these deviations are expected, but I'd love some confirmation.
  • I have not added the ability to save the 'score vs threshold' data to files. Doing so may help folks evaluate how sensitive various detectors are to changes in threshold, but I wanted to keep changes to a minimum as much as possible. There's already a decent amount of code change here!
@rhyolight

This comment has been minimized.

Copy link
Member

rhyolight commented Jan 7, 2019

I will review this and run NAB locally to ensure everything works.

@rhyolight rhyolight self-requested a review Jan 7, 2019

Show resolved Hide resolved nab/sweeper.py Outdated
@earthgecko

This comment has been minimized.

Copy link
Contributor

earthgecko commented Jan 10, 2019

I have done some testing and it reduces my run times on my new detector tests from ~30 mins to 7mins and it produces the same scores. It is almost unbelievable to see

  Running optimize step
  Optimizer found a max score of -68.4329257075 with anomaly threshold 1.0.
  Optimizer found a max score of -44.2918503 with anomaly threshold 1.0.
  Optimizer found a max score of -13.2918503 with anomaly threshold 1.0.

run in mere seconds! Fantastic @balladeer 🥇

@rhyolight

This comment has been minimized.

Copy link
Member

rhyolight commented Jan 10, 2019

Running this now...

@rhyolight

This comment has been minimized.

Copy link
Member

rhyolight commented Jan 10, 2019

Vanilla scoring took 10:59.29 on my 20-core iMac Pro. Part of the CPU load looked like this:

screen shot 2019-01-10 at 10 51 29 am

Sweep scoring took 3:48.92 and looked like this:

screen shot 2019-01-10 at 10 51 29 am copy

Final sweep scores fall within range of reported NuPIC scores on the current leaderboard except for the standard profile:

'reward_low_FP_rate' profile = 61.71
'reward_low_FN_rate' profile = 74.32
'standard' profile = 69.67

Leaderboard currently looks like this:

screen shot 2019-01-10 at 11 11 56 am

As you can see 69.67 falls slightly below 69.7 on the leaderboard. Is this a concern @subutai or @lscheinkman?

If not, I will continue to review the code in the PR.

@lscheinkman

This comment has been minimized.

Copy link
Contributor

lscheinkman commented Jan 10, 2019

@rhyolight I would not worry about the rounding from 69.67 to 69.7, these are considered the same score as far as the Leaderboard

@earthgecko

This comment has been minimized.

Copy link
Contributor

earthgecko commented Jan 10, 2019

@rhyolight from what I can tell the scores on the scoreboard are always rounded from the scores reported by NAB so 69.67 becomes 69.7 on the scoreboard. However I am not sure that helps if you are trying to determine if the actual NAB score changed, surely someone has a log :)

@subutai

This comment has been minimized.

Copy link
Member

subutai commented Jan 10, 2019

Agreed, I wouldn't worry about the rounding.

@rhyolight

This comment has been minimized.

Copy link
Member

rhyolight commented Jan 10, 2019

Ok great, I am reading the NAB white paper so I can do an informed review.

FYI if this change is merged it will require a Major version change (because it changes the scoring mechanism), so we’ll need to be extra diligent.

@subutai

This comment has been minimized.

Copy link
Member

subutai commented Jan 10, 2019

FYI if this change is merged it will require a Major version change (because it changes the scoring mechanism), so we’ll need to be extra diligent.

Is this just a change to the optimizer, or the scoring algorithm itself? I'm ok with changing the former, but not the latter. The optimizer is in some sense a convenience function, and it would be awesome if it was a lot faster.

@subutai

This comment has been minimized.

Copy link
Member

subutai commented Jan 10, 2019

Is this just a change to the optimizer, or the scoring algorithm itself?

A simple way to test this is to run NAB using the original thresholds file, without doing detection and without running the optimizer. (This should go fast since the algorithms won't be rerun). If we still get the same scores for all the algorithms, we should be good. But make sure you run on all the original files except for the changes in this PR.

@balladeer

This comment has been minimized.

Copy link
Contributor

balladeer commented Jan 10, 2019

Wow, thanks everyone for taking a look at this PR!

Is this just a change to the optimizer, or the scoring algorithm itself?

A simple way to test this is to run NAB using the original thresholds file, without doing detection and without running the optimizer.

While I did change the specific implementation of the scoring algorithm, I did my best to keep the underlying algorithm the same (according to both the whitepaper and the existing implementation). Running the sweep scoring on the original threshold files, I get the following differences:

detector profile decimals_identical original_score sweep_score
knncad reward_low_FN_rate 11 64.81232086001505 64.81232086001437
knncad reward_low_fp_rate 10 43.40851703291233 43.40851703293104
knncad standard 8 57.99434335898808 57.99434335900862
numentaTM reward_low_FN_rate 13 69.185068229060349 69.18506822906035
numentaTM standard 13 64.553464412543107 64.55346441254311
random reward_low_FN_rate 13 25.876351314080456 25.87635131408046
random reward_low_FP_rate 14 5.763682319827586 5.763682319827587

While the differences are few (7 out of 39 detector-profile pairs) and small (differences only show at 9 decimal places and beyond), I certainly leave it to you all as to whether this constitutes a change to the scoring algorithm.

I would love to see someone else's confirmation / results.


Part of the CPU load looked like this

Out of curiosity, @rhyolight, what did you use to generate those CPU profiles?

@rhyolight

This comment has been minimized.

Copy link
Member

rhyolight commented Jan 10, 2019

Out of curiosity, @rhyolight, what did you use to generate those CPU profiles?

The "Activity Monitor" app that comes standard on macOS.

@rhyolight

This comment was marked as outdated.

Copy link
Member

rhyolight commented Jan 11, 2019

run NAB using the original thresholds file, without doing detection and without running the optimizer.

Like this, right?

python run.py --score --normalize -t config/thresholds.json

My config looks like this:

{'dataDir': 'data',
 'detect': False,
 'detectors': ['null',
               'numenta',
               'random',
               'skyline',
               'bayesChangePt',
               'windowedGaussian',
               'expose',
               'relativeEntropy'],
 'normalize': True,
 'numCPUs': None,
 'optimize': False,
 'profilesFile': 'config/profiles.json',
 'resultsDir': 'results',
 'score': True,
 'skipConfirmation': False,
 'thresholdsFile': 'config/thresholds.json',
 'windowsFile': 'labels/combined_windows.json'}
@rhyolight

This comment has been minimized.

Copy link
Member

rhyolight commented Jan 11, 2019

@subutai I have run a comparison of the scoring / normalization processes on master and on this branch. The command I ran was python run.py --score --normalize. The default thresholds file was used for each run. Results of each run were written to results/final_results.json and had insignificant differences:

     "numentaTM": {
-        "reward_low_FN_rate": 69.185068229060349,
+        "reward_low_FN_rate": 69.18506822906035,
         "reward_low_FP_rate": 56.665308225043105,
-        "standard": 64.553464412543107
+        "standard": 64.55346441254311
     },
     "random": {
-        "reward_low_FN_rate": 25.876351314080456,
-        "reward_low_FP_rate": 5.763682319827586,
+        "reward_low_FN_rate": 25.87635131408046,
+        "reward_low_FP_rate": 5.763682319827587,
         "standard": 16.831768350517237
     },

While the scoring results are essentially the same, I must note that all the result CSV files are entirely different, which is a result of the scoring algorithm being updated.

As far as optimization, with previous scoring logic the command took 1m22s to run. The new sweep scoring logic took 23s. I ran this command (and others) lots of times and this was certainly notable.

The scoring code is different, but it is achieving nearly the exact same numbers with drastic CPU savings. I am continuing to study the current twiddle algorithm and how this new one works before approving the PR. But from a functional standpoint, this does not need a major version change.

@balladeer

This comment has been minimized.

Copy link
Contributor

balladeer commented Jan 11, 2019

I am continuing to study the current twiddle algorithm and how this new one works before approving the PR.

If I can be of help, let me know.

Show resolved Hide resolved nab/scorer.py
Show resolved Hide resolved nab/sweeper.py Outdated
@rhyolight
Copy link
Member

rhyolight left a comment

@balladeer Overall this is great. I'm not done understanding the logic yet but I have actionable feedback for you.

We try to use the same python conventions across projects, so I ran pylint with NuPIC's pylintrc file and found some things I would like you to fix please. I know a little nitpicky but a little enforcement goes a long way on open source projects.

Using config file /Users/mtaylor/nta/nupic/pylintrc
************* Module nab.sweeper
C0301,  27: 0 - Line too long (100/80) (line-too-long)
C0301,  28: 0 - Line too long (102/80) (line-too-long)
C0301,  90: 0 - Line too long (89/80) (line-too-long)
C0301,  94: 0 - Line too long (83/80) (line-too-long)
C0301, 109: 0 - Line too long (81/80) (line-too-long)
C0301, 110: 0 - Line too long (113/80) (line-too-long)
C0301, 129: 0 - Line too long (83/80) (line-too-long)
C0301, 138: 0 - Line too long (94/80) (line-too-long)
C0301, 140: 0 - Line too long (88/80) (line-too-long)
C0301, 144: 0 - Line too long (94/80) (line-too-long)
C0301, 147: 0 - Line too long (87/80) (line-too-long)
C0301, 154: 0 - Line too long (89/80) (line-too-long)
C0301, 159: 0 - Line too long (84/80) (line-too-long)
C0301, 160: 0 - Line too long (84/80) (line-too-long)
C0301, 165: 0 - Line too long (85/80) (line-too-long)
C0301, 182: 0 - Line too long (91/80) (line-too-long)
C0301, 192: 0 - Line too long (94/80) (line-too-long)
C0301, 193: 0 - Line too long (92/80) (line-too-long)
C0301, 194: 0 - Line too long (90/80) (line-too-long)
C0301, 195: 0 - Line too long (98/80) (line-too-long)
C0301, 201: 0 - Line too long (85/80) (line-too-long)
C0301, 216: 0 - Line too long (102/80) (line-too-long)
C0301, 220: 0 - Line too long (83/80) (line-too-long)
C0301, 226: 0 - Line too long (90/80) (line-too-long)
C0301, 227: 0 - Line too long (91/80) (line-too-long)
C0301, 243: 0 - Line too long (103/80) (line-too-long)
R1705,  61: 2 - Unnecessary "else" after "return" (no-else-return)
R0201,  93: 2 - Method could be a function (no-self-use)
E1130, 105:37 - bad operand type for unary -: NoneType (invalid-unary-operand-type)
C1801, 134: 9 - Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
W1201, 140: 8 - Specify string format arguments as logging function parameters (logging-not-lazy)
W1201, 167: 8 - Specify string format arguments as logging function parameters (logging-not-lazy)
W0120, 218: 4 - Else clause on loop without a break statement (useless-else-on-loop)
W0611,  24: 0 - Unused Corpus imported from nab.corpus (unused-import)

Will continue reviewing but so far your code looks good.

Show resolved Hide resolved nab/sweeper.py Outdated
@subutai

This comment has been minimized.

Copy link
Member

subutai commented Jan 11, 2019

The scoring code is different, but it is achieving nearly the exact same numbers with drastic CPU savings. I am continuing to study the current twiddle algorithm and how this new one works before approving the PR. But from a functional standpoint, this does not need a major version change.

Sounds great, thank you for the time spent on it! I agree this wouldn't need a major version change.

@subutai

This comment has been minimized.

Copy link
Member

subutai commented Jan 11, 2019

While I did change the specific implementation of the scoring algorithm, I did my best to keep the underlying algorithm the same (according to both the whitepaper and the existing implementation).

Looks great @balladeer - thanks!

Show resolved Hide resolved nab/sweeper.py Outdated
Show resolved Hide resolved nab/sweeper.py Outdated
Show resolved Hide resolved nab/sweeper.py
Show resolved Hide resolved nab/optimizer.py Outdated
Show resolved Hide resolved nab/sweeper.py Outdated
Show resolved Hide resolved nab/sweeper.py Outdated
# If in a window, score as if true positive
if curWindowLimits is not None:
# Doesn't the `+ 1` in this equation mean we can _never_ have a positionInWindow == 0?
positionInWindow = -(curWindowRightIndex - i + 1) / curWindowWidth

This comment has been minimized.

@rhyolight

rhyolight Jan 11, 2019

Member

Regarding the question in your code comment, see above:

curWindowRightIndex = timestamps.index(curWindowLimits[1])

Isn't that where the 1 is coming from?

This comment has been minimized.

@rhyolight

rhyolight Jan 15, 2019

Member

@balladeer Did my comment make sense? I hate leaving questions in code comments like this.

This comment has been minimized.

@balladeer

balladeer Jan 15, 2019

Contributor

Yeah, makes sense, I think. I removed the question/comment.

@balladeer

This comment has been minimized.

Copy link
Contributor

balladeer commented Jan 11, 2019

Thanks for all the review so far. I'm headed out for the weekend, but I'll respond to the feedback (linting, comments, etc) when I get back. Cheers!

Show resolved Hide resolved nab/sweeper.py Outdated
@balladeer

This comment has been minimized.

Copy link
Contributor

balladeer commented Jan 15, 2019

Alright, I've pushed changes that I believe responds to all of the feedback:

  1. sweeper.py, scorer.py, and optimizer.py all score 10/10 using NuPIC's pylintrc
  2. Quite a few functions / classes now have new or expanded doc strings.

Let me know what else needs changing!

@rhyolight
Copy link
Member

rhyolight left a comment

Great PR. @subutai I approve these changes. Essentially the same twiddle algorithm is being applied. The speed up is due to getting rid of some pandas data structures and their associated windowed navigation functionality, replaced with a simpler and IMO cleaner implementation of the windowed sweep.

@subutai

This comment has been minimized.

Copy link
Member

subutai commented Jan 15, 2019

👍 Thanks @balladeer - really nice work.

@rhyolight rhyolight merged commit d62afec into numenta:master Jan 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment