Skip to content

optimization results#32

Merged
breznak merged 1 commit intohtm-community:fixing_spatial_anomalyfrom
psteinroe:fixing_spatial_anomaly
Apr 22, 2020
Merged

optimization results#32
breznak merged 1 commit intohtm-community:fixing_spatial_anomalyfrom
psteinroe:fixing_spatial_anomaly

Conversation

@psteinroe
Copy link

@psteinroe psteinroe commented Apr 22, 2020

I optimized overnight (using my MacBook, so not much processing power) with two processes in parallel - on using localAreaDensity and one using numActiveCols and a fixed seed of 5.

Two things that I find interesting:

  1. numActiveCols seems to superior. Max standard score was 71, while localAreaDensity never reached 70.
  2. When removing the fixed seed the standard score drops to 67.88. It seems like randomness plays a big role. I think adding seed as one of the params to optimise (instead of setting it fix to 5 during the optimization) might further increase the score but in my opinion this would be bad practice...
  3. Maybe, instead of returning the standard score during the optimization, we could try to return the mean of all three scores?

Edit: Results are

"reward_low_FN_rate": 76.5626293570994,
"reward_low_FP_rate": 61.359926511549155,
"standard": 71.3094612770284

@breznak
Copy link
Member

breznak commented Apr 22, 2020

These are very nice scores for HTMcore!!

Results are
"reward_low_FN_rate": 76.5626293570994,
"reward_low_FP_rate": 61.359926511549155,
"standard": 71.3094612770284

Compared to

Numenta HTM* 70.5-69.7 62.6-61.7 75.2-74.2 numenta

Numenta HTM using NuPIC v0.5.6* 70.1 63.1 74.3

NumentaTM HTM* 64.6 56.7 69.2 (aka our type of TM used)

Numenta HTM*, no likelihood 53.62 34.15 61.89

So we could say we're the winners now! 💯 Best HTM model score on NAB dataset (*and now we have some new features in the sleeve, just were held back because "how does it affect performance". And we can reliably answer that now!)

But...

numActiveCols seems to superior. Max standard score was 71, while localAreaDensity never reached 70.

ok, but it's a close call. 1% shouldn't be that important. Also, as I understand it, this is just 1-param opt, right? I'll get your framework running, and then try running it on a cluster as well.

When removing the fixed seed the standard score drops to 67.88. It seems like randomness plays a big role. I think adding seed as one of the params to optimise (instead of setting it fix to 5 during the optimization) might further increase the score but in my opinion this would be bad practice...

this is a bad thing. It should never be so sensitive to the rng seed! I'm wondering if the dataset is not good, being so sensitive to overfitting.
Or if there could be a bug in our algos that handles fixed seed somehow differently. Curiosity: are the good results only if rng seed is "5"? Or you get same score for, say 42? Or same for 42 after re-tuning?
But to conclude, we want results with random seed (ie not specified or set to the special value that means "completely random"). The scores might be worse, but the results would be corresponding to general reality/performance on any dataset.

we could try to return the mean of all three scores?

TBH, I don't know how exacly the scores are computed, but the "standard" should be just that. Some balance between low FP, low FN.

'synPermActiveInc': 0.003892649892638879,
'synPermConnected': 0.22110323252238637,
'synPermInactiveDec': 0.0006151856346474387,
'seed': 5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not use the fixed seed, keep it completely random. That way, results won't be overfitted and should generalize.

"standard": 57.22915150504096
"reward_low_FN_rate": 76.5626293570994,
"reward_low_FP_rate": 61.359926511549155,
"standard": 71.3094612770284
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rerun for the new (worse :( ) scores. And you can update README with the new results! 👏

@psteinroe
Copy link
Author

psteinroe commented Apr 22, 2020

I will rerun the optimization tonight without setting the seed parameter to see how it influences the results. Numenta did set the seed to a fixed value check here for their models, but I fully agree that this is bad practice.

Or if there could be a bug in our algos that handles fixed seed somehow differently.

We should check if setting no seed means using a random one for TM, SP as well as the RSDE Encoder to be sure.

@breznak
Copy link
Member

breznak commented Apr 22, 2020

I will rerun the optimization tonight without setting the seed parameter to see how it influences the results.

I want to play around, so I suggest we'll just merge results and correct/update it later.

We should check if setting no seed means using a random one for TM, SP as well as the RSDE Encoder to be sure.

I can do that but I'm quite sure it defaults to random seed.
What I'm wondering is why would a fixed seed have such a great effect? The sequence should still be pseudo-random.
My intuition would be:
Say we're walking all cells/columns in a layer in a for-loop:

  • unless ordered (ie by overlap), we walk them "randomly".
  • current "random (unseeded) random" generates a different sequence each call. That is like if "asynchronous processing", imho the optimal case.
  • we could use "random-seeded random", a compromise between current, and fixed seed. seed=rng(), that way, each instance has a different seed, but within one instance of the object, there's a fixed seed, which leads to the columns walked in fixed manner. I think this theoretically could lead to better/easier emergence of patterns among columns.

@breznak breznak merged commit 7f73723 into htm-community:fixing_spatial_anomaly Apr 22, 2020
@breznak
Copy link
Member

breznak commented Apr 22, 2020

We should check if setting no seed means using a random one for TM, SP as well as the RSDE Encoder to be sure.

Turns out default params were set to default to fixed. I have a PR that changes that, just need to iron out all the determinism tests.

A quick workaround would be to force seed=0 everywhere, that means random random.

@psteinroe
Copy link
Author

we could use "random-seeded random"

This sounds like the right way to do it. Can we implement that? Or does it behave like that already?

I have a PR that changes that, just need to iron out all the determinism tests.

Very nice, thanks!!

A quick workaround would be to force seed=0 everywhere, that means random random.

Alright, I will do that for tonights run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants