Random seed and model stability #114

Antoine-Cate · 2017-01-19T16:19:04Z

Hi everyone,

As everyone has seen, the random seed can have a significant effect on the prediction scores. This is due to the fact that most of us are using algorithms with a random component (e.g., random forest, extra trees...).
The effect is probably enhanced by the fact that the dataset we are working on is small and non stationary.

Matt has been solving the problem by testing a series of random seeds and taking the best. This avoids discarding a model just because of a "bad" random seed. However, this might favor the most unstable models. A very stable model will yield scores in a small range when testing several random seeds, while an unstable model will yield a wide range of scores when testing several random seeds. Thus, it is likely that an unstable model can get a very high score given enough random seeds are tested. But it does not mean the model will be good at predicting new test data.

A possible solution would be to test 10 (or an other number) random seeds and to take the median score as the prediction score. It would require us to directly include that in our scripts to avoid further work for Matt. We could just make 10 predictions, using 10 random seeds and export them in a single csv file.

What do you guys (and especially Matt) think about that?

geckya · 2017-01-19T16:42:43Z

Great suggestion!

LukasMosser · 2017-01-20T13:13:47Z

I think it's a great suggestion. I've been seeing that as well with our attempts.
Would 10 be enough to be significant? What's the cost of running 100 predictions? I assume not too much.

kwinkunks · 2017-01-20T13:22:20Z

I agree, a good suggestion.

For most models, it wouldn't take much for me to implement this, and usually the model instantiation is separate enough from the CV workflow that I can make it fast enough to do many realizations.., but some models have been 'tricky' to reproduce, either because of the workflow or the setting of seeds (eg we had trouble getting seeds to work properly with Keras/TensorFlow).

I'll take a look at the top 3 now, since I know I can work with those, and report back.

kwinkunks · 2017-01-20T13:40:29Z

OK, here's a description of 100 realizations for the current HouMath model:

Median: 0.619

So it looks like, indeed, my lazy 'method' was rather favourable. In my defence, I used the same approach for everyone, so I hope there's been no unfairness. Either way, I'll take a look at some more models now.

To make sure they see this conversation, I'll cc @dalide @ar4 @bestagini @gccrowther @lperozzi @thanish @mycarta @alexcombessie

Antoine-Cate · 2017-01-20T14:02:56Z

I guess a Std of 0.007 would not be a big deal in an industrial application (the number of misclassification does not change dramatically). But looking at how close to each other we are in the contest, this is significant.
Thanks @kwinkunks !

kwinkunks · 2017-01-20T14:03:28Z

Result from @ar4's submission:

bestagini · 2017-01-20T14:04:01Z

Hi everybody!

I also agree that considering the average, median, or some other value obtained after testing multiple random seeds could be a good option.

This could also solve another problem. Working with Keras (TensorFlow or Theano backend), I am having issues in fixing a given seed for reproducibility. Hopefully, averaged results could be more representative of the proposed method.

kwinkunks · 2017-01-20T14:26:28Z

@bestagini Ah yes, of course that property of the 'unreproducible' results is dealt with here... I was thinking they'd be a problem but obviously that's the whole point: we're fixing that problem :)

Here's the same treatment of your own entry:

So implementing this will indeed change the order of the 2nd and 3rd entries, as things stand.

Side note to @alexcombessie — I can't reproduce your workflow, so only have your submission to go on. I will have another crack at it. @Antoine-Cate I am working on yours now.

kwinkunks · 2017-01-20T14:39:03Z

Here's geoLEARN's result:

cc @Antoine-Cate @lperozzi @mablou

Rather than soaking up this thread, maybe I'll just start putting the validation scores (all realizations) into another folder, so everyone can see the data etc. Stay tuned.

Fearing for the rest of my day, I might adopt the following strategy:

Do more or less what I've been doing until 31 January, perhaps without searching too hard for a maximum.
If you want a 'stochastic score', please explicitly make them and give me all the realizations you want me to score in a CSV or NumPy or similar.
On 1 February, I will validate the final top (5? 10? I guess it depends how close people are at the end) in the way I've done here (see below).
This might mean that some scores will change after the contest closes.

For the record, here's how I'm getting the realizations (generic example:

y_pred = []
for seed in range(100):
    np.random.seed(seed)
    clf = RandomForestClassifier(<hyperparams>, random_state=seed,  n_jobs=-1)
    clf.fit(X, y)    
    y_pred.append(clf.predict(X_test))
    print('.', end='')
np.save('100_realizations.npy', y_pred)

ar4 · 2017-01-20T15:08:29Z

Excellent idea. One thing to consider is how to handle cases (such as my own) where there are two fits (PE, and then Facies). Would two loops be a good approach - an outer loop for 10 iterations that picks the seed for the PE fit, and an inner loop around the Facies fit for another 10 iterations? Edit: Or one outer loop for 100 iterations that picks two random seeds each time.

kwinkunks · 2017-01-20T15:51:37Z

@ar4 Just to be clear: I'm just getting the results from 100 seeds, and averaging the scores those results achieve. So there's no optimization going on. You probably got this, just checking :)

I made a new workflow, bringing in a super hacky way the PE part into the seed-setting loop. I checked this in so you can see it HERE... Please check it!

The score is now like this:

Did I understand what you were asking??

kwinkunks · 2017-01-20T15:51:49Z

By the way everyone, the results from realizations are now in the Stochastic_validations directory.

ar4 · 2017-01-20T16:23:29Z

Ah, I wondered for a moment why you wanted to clarify that there was no optimization going on, and finally realised that my choice of the phrase "picks the seed" was problematic. Now there would be an example of overfitting! ;-) (I just meant a new seed is picked/set for each loop iteration.)

Your modification seems to be approximately my second proposal (one outer loop), but I see you use the same seed for both the PE and Facies steps. It's probably not a problem, but two random seeds - one for each - seems like it might be a bit safer.

kwinkunks · 2017-01-20T19:06:20Z

I ran it again where I keep the same loop, but give the PE generator seed+100. So it's not the same as the latin square arrangement you were thinking of, but should nonetheless be a better answer. I think. Right?

LukasMosser · 2017-01-23T13:33:32Z

Since this was referenced here, what our team was trying to prove is that using an "ensemble" of results one would get an improved result. Full credit to the top four and our score will stay at 0.568 (until we submit our own work). We did expect this "meta-submission" to perform better than it did though.

kwinkunks · 2017-01-24T16:08:36Z

FYI all, this is the stochastic result of the new leader, SHandPR:

kwinkunks mentioned this issue Jan 20, 2017

Submission 4 #97

Merged

kwinkunks mentioned this issue Jan 23, 2017

The Meta Submission #116

Merged

kwinkunks mentioned this issue Jan 24, 2017

Master #125

Merged

This was referenced Jan 26, 2017

Esa team #143

Merged

Create HoustonJ_sub2.csv #160

Merged

last submission #187

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random seed and model stability #114

Random seed and model stability #114

Antoine-Cate commented Jan 19, 2017

geckya commented Jan 19, 2017

LukasMosser commented Jan 20, 2017

kwinkunks commented Jan 20, 2017

kwinkunks commented Jan 20, 2017 •

edited

Loading

Antoine-Cate commented Jan 20, 2017

kwinkunks commented Jan 20, 2017

bestagini commented Jan 20, 2017

kwinkunks commented Jan 20, 2017

kwinkunks commented Jan 20, 2017 •

edited

Loading

ar4 commented Jan 20, 2017 •

edited

Loading

kwinkunks commented Jan 20, 2017 •

edited

Loading

kwinkunks commented Jan 20, 2017

ar4 commented Jan 20, 2017

kwinkunks commented Jan 20, 2017

LukasMosser commented Jan 23, 2017

kwinkunks commented Jan 24, 2017 •

edited

Loading

Random seed and model stability #114

Random seed and model stability #114

Comments

Antoine-Cate commented Jan 19, 2017

geckya commented Jan 19, 2017

LukasMosser commented Jan 20, 2017

kwinkunks commented Jan 20, 2017

kwinkunks commented Jan 20, 2017 • edited Loading

Antoine-Cate commented Jan 20, 2017

kwinkunks commented Jan 20, 2017

bestagini commented Jan 20, 2017

kwinkunks commented Jan 20, 2017

kwinkunks commented Jan 20, 2017 • edited Loading

ar4 commented Jan 20, 2017 • edited Loading

kwinkunks commented Jan 20, 2017 • edited Loading

kwinkunks commented Jan 20, 2017

ar4 commented Jan 20, 2017

kwinkunks commented Jan 20, 2017

LukasMosser commented Jan 23, 2017

kwinkunks commented Jan 24, 2017 • edited Loading

kwinkunks commented Jan 20, 2017 •

edited

Loading

kwinkunks commented Jan 20, 2017 •

edited

Loading

ar4 commented Jan 20, 2017 •

edited

Loading

kwinkunks commented Jan 20, 2017 •

edited

Loading

kwinkunks commented Jan 24, 2017 •

edited

Loading