Predictions from RandomForestClassifier completely unstable between different machines #7366

iliaschalkidis · 2016-09-08T14:34:40Z

Text classification - Entity Recognition

I am currently having some text classification experiments using windows of word embeddings + some hand-crafted features. I already used LinearSVC and LogisticRegression and I want to finalize my experiments with RandomForestClassifier.

Use of VM - Dataset (train,validation,test) - Weird behaviour

Because my datasets are pretty big around 15-35 GB, I am having my training online on a VM and then I make a classification report on my test set on my laptop.

So far so good with LinearSVC and LogisticRegression, but RandomForestClassifier has pretty weird behaviour.

After training classifier in my code, I make a classification report over a validation set (the last 20% of my initial samples). Of course I don't use the same samples during the training, so we have 20% of fresh samples to measure our classifier. WIth RandomForestClassifier I get the following report:

VALIDATION RESULTS

     precision    recall  f1-score   support

  0       0.99      0.99      0.99     78473
  1       0.89      0.81      0.85      5711

Then I downloaded the classifier, loaded and start the classification report over test set on my laptop, which gives us:

TEST RESULTS (LAPTOP)

     precision    recall  f1-score   support

  0       0.93      0.99      0.96     33845
  1       0.52      0.07      0.13      2618

This doesn't look good at all. There is no any possibility to have so much different results between validation and test from the same dataset (both unseen during training). So I tried to think what possibly is going wrong from a theoretical point of view and I couldn't find something that makes such a great difference. So I tried to run the same test on the VM which I use for training experiments.

TEST RESULTS (VM)

     precision    recall  f1-score   support

  0       0.99      0.99      0.99     33845
  1       0.91      0.81      0.85      2618

So it seems (in my case?) RandomForestClassifier works really differently from one machine to the other. Any other classifier (SVC, LR) gives the same validation report on both machines.

I would be grateful to have an explanation of this strange behavior to fix this issue. Maybe I'm missing something...

Extra Information

Enviroments

VM :
OS: Ubuntu 14.04 LTS
Python 3.5
scikit-learn 0.17.1
numpy 1.11.1
scipy 0.17.1

LAPTOP :
OS: Ubuntu 16.04 LTS
Python 3.5
scikit-learn 0.17.1
numpy 1.11.1
scipy 0.17.1

Parameters

RandomForestClassifier(n_jobs=1, n_estimators=200, warm_start= True)

I was using n_jobs=-1, but then I fixed it to one to be sure that the problem is not happening because of the multi-thread difference (4 threads vs 8 threads).

Save/Load

I'm saving - loading classifiers as suggested on scikit-learn.org ( http://scikit-learn.org/stable/modules/model_persistence.html )

Metrics

Default scikit-learn classification_report (metrics.classification_report)

Notice (Size of .npy files)

LinearSVC and LogisticRegression are saved in 1 .pkl file and 4 extra .npy files for each classifier. RandomForestClassifier is saved 1 .pkl file and 801(!!!) extra .npy files.

The text was updated successfully, but these errors were encountered:

amueller · 2016-09-08T14:50:01Z

You didn't set random_state, did you?

iliaschalkidis · 2016-09-08T15:19:07Z

@amueller No I used the default option. My configuration is RandomForestClassifier(n_jobs=1, n_estimators=200, warm_start= True). Should I use the random_state?

amueller · 2016-09-08T15:29:07Z

yes, otherwise it's not deterministic, even on the same machine.

iliaschalkidis · 2016-09-08T15:39:22Z

@amueller OK, I will try it and send feedback. Should I use a specific value e.g. (random_state=1) or tune it let's say with gridsearch or is irrelevant with the final result?

Also, LinearSVC, LogisticRegression has the same parameter, I also used the default option but the results are the same. Should I feel lucky or is less important on these classifiers?

amueller · 2016-09-08T16:01:24Z

just us a fixed value. And on linear models it's less important, and not all solvers use it.

iliaschalkidis · 2016-09-09T08:39:07Z

Hi there,

@amueller I tried to solve the issue using the following configuration (with random_state) as you suggested:

RandomForestClassifier(n_jobs=1, n_estimators=200, warm_start= True,random_state=rs)

, firstly setting the rs variable as

rs = 1

and also using numpy RandomState as:

rs = numpy.random.RandomState(1234)
rs.seed(1234)

The predictions in both cases are still different between different machines and are always the same in a single machine:

TEST RESULTS (LAPTOP)

    precision    recall  f1-score   support

0       0.93      0.99      0.96     33845
1       0.52      0.07      0.13      2618

TEST RESULTS (VM)

     precision    recall  f1-score   support

 0       0.98      0.99      0.99     33845
 1       0.90      0.80      0.85      2618

I'm just copying the saved files from the VM to my laptop and load them using joblib.load()

amueller · 2016-09-09T15:01:59Z

Oh sorry I didn't read correctly. So you pickle on one machine and unpickle on the other?
Is one of them by chance 32bit and the other 64bit?

iliaschalkidis · 2016-09-09T15:28:02Z

@amueller Yes that's what I want to do save using joblib on VM and load the classifier on my laptop. Both machines seems to be 64bit...

VM returns:

> kiddo@ml-experiments:~$ arch
> x86_64

My laptop returns:

> kiddo@kiddo-K56CB:~$ arch
> x86_64

amueller · 2016-09-09T15:52:55Z

wow, that is strange. ping @jmschrei
can you check the values attribute of the first tree in the forest?

rf.estimators_[0].tree_.values_

on both machines?

Also, can you please provide the full code for training, predicting, storing and loading?

Thanks.

iliaschalkidis · 2016-09-09T16:33:33Z

I placed the following command on validation code:

    classifier = joblib.load(classifier_file_name)
    print(classifier.estimators_[0].tree_.value)

Your suggestion, if I understood,

    classifier = joblib.load(classifier_file_name)
    print(classifier.estimators_[0].tree_.values_)

gave me:

"AttributeError: 'sklearn.tree.tree.Tree' object has no attribute 'values'"

So with my code we have the following printing:

VM

 kiddo@ml-experiments:~/cognitiv-app/test/AnalysisEngine$ python validate_model.py --cl_name 'RF' --window_size 11 --element 'counterparty'

 [[[  3.12816000e+05   2.39210000e+04]]

  [[  2.92080000e+05   1.79910000e+04]]

  [[  2.28962000e+05   8.68200000e+03]]

  ..., 
  [[  8.00000000e+00   2.00000000e+00]]

  [[  0.00000000e+00   2.00000000e+00]]

  [[  8.00000000e+00   0.00000000e+00]]]

LAPTOP

 kiddo@kiddo-K56CB:~/PycharmProjects/cognitiv-app/test/AnalysisEngine$ python validate_model.py --cl_name 'RF' --window_size 11 --element 'counterparty'

 [[[  3.12816000e+05   2.39210000e+04]]

  [[  2.92080000e+05   1.79910000e+04]]

  [[  2.28962000e+05   8.68200000e+03]]

  ..., 
  [[  8.00000000e+00   2.00000000e+00]]

  [[  0.00000000e+00   2.00000000e+00]]

  [[  8.00000000e+00   0.00000000e+00]]]

I'm really sorry but I would like to keep the code private :)

amueller · 2016-09-09T16:50:18Z

Are the values identical?
You should be able to reproduce with simple code for which there is no reason to keep it private.
If you can't provide code, we can't help.

jmschrei · 2016-09-09T16:56:02Z

It'd be really useful if you could look at the full tree, which will allow you to see the overall structure, thresholds, and values together easily. It seems like your values are the same though. Here is how to plot your trees: http://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html

It shouldn't make a huge difference hopefully, but can you make sure that your dataset is 64 bit floats in both cases by explicitly casting them before putting them in the prediction step?

jmschrei · 2016-09-09T16:57:50Z

I suppose another possible issue is that joblib has a bug. Can you try training your model, looking at the testing/validation data with it on the VM, saving the model, loading the mdoel back up on your VM, and checking to make sure the results are the same?

iliaschalkidis · 2016-09-09T18:29:14Z

@jmschrei This is what I do actually.

First I run a python file (File 1) on VM, which has the following code:

        [PREPARE DATA]
        [SPLIT INTO TRAIN AND VALIDATION SET 0.8/0.2]
        [IF SVM, ELIF LR]
        elif self._parameters['classifier_name'] == 'RF':
            rs = numpy.random.RandomState(1234)
            rs.seed(1234)
            // ALSO EXPERIMENTS WITH rs=1
            self._classifier = RandomForestClassifier(n_jobs=self._parameters['classifier_params']['n_jobs'][0],
                                                      n_estimators=self._parameters['classifier_params']['n_estimators'][0],
                                                      warm_start=self._parameters['classifier_params']['warm_start'][0],
                                                      verbose=self._parameters['classifier_params']['verbose'][0],
                                                      random_state=rs)
        self._classifier.fit(x_train, y_train)
        [PREDICT AND PRINT CLASSIFICATION REPORT ON VALIDATION SET]
        self._save_classifier(element_name)

Then I run another python file (File 2), on both VM and laptop, which has the following code:

         [PREPARE TEST DATA]
         classifier = joblib.load(classifier_file_name)
         predictions = classifier.predict(x_test)
         [PRINT CLASSIFICATION REPORT ON TEST SET]

Both VM and my laptop run the same piece of code (File 2) to print the evaluation report and have the different printed results I already mentioned. The whole project is on a Git repository, so both machines run absolutely the same code.

I do the same process with LogisticRegression and LinearSVC and I don't have any problem like that so far...

Thanx for your replies!

notmatthancock · 2016-09-09T22:09:12Z

The line, classifier = joblib.load(classifier_file_name), uses the same file on both machines?

iliaschalkidis · 2016-09-09T22:18:02Z

@notmatthancock Yes I download the classifier files from VM and place them in the models directory on my laptop.

notmatthancock · 2016-09-09T23:39:22Z

Try comparing the attributes of classifier on both machines upon loading. For example, take t = classifier.estimators_[0].tree_, and there are various attributes to inspect, such as t.children_left, t.children_right, t.feature, t.impurity, etc. See if you spot any differences between the imported classifiers on the two machines.

cbertolasio · 2020-06-30T23:26:07Z

@notmatthancock , I am having the same issue. What would you do if you saw differences in the t values as described above?

In my scenario I have trained on vm1, dumped the model onto the filesystem of vm1, then loaded the model back into a notebook on vm1. At this point I get the exact same predictions that were made right after the model was first fit/created in memory on vm1.

However when I take the model from vm1 and put it into a docker container that runs on vm1, or on another set of vms running on k8s nodes I get vastly different predictions than when the model is loaded into an ipynb.

glemaitre · 2024-05-17T17:52:37Z

We would need a reproducer to be able to answer here. I can think of some potential architecture issue maybe some tie breaking that will be different as well.

cmarmo added the module:ensemble label Dec 14, 2021

glemaitre closed this as completed May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions from RandomForestClassifier completely unstable between different machines #7366

Predictions from RandomForestClassifier completely unstable between different machines #7366

iliaschalkidis commented Sep 8, 2016 •

edited

amueller commented Sep 8, 2016

iliaschalkidis commented Sep 8, 2016 •

edited

amueller commented Sep 8, 2016

iliaschalkidis commented Sep 8, 2016

amueller commented Sep 8, 2016

iliaschalkidis commented Sep 9, 2016

amueller commented Sep 9, 2016

iliaschalkidis commented Sep 9, 2016

amueller commented Sep 9, 2016 •

edited

iliaschalkidis commented Sep 9, 2016

amueller commented Sep 9, 2016

jmschrei commented Sep 9, 2016

jmschrei commented Sep 9, 2016

iliaschalkidis commented Sep 9, 2016 •

edited

notmatthancock commented Sep 9, 2016

iliaschalkidis commented Sep 9, 2016

notmatthancock commented Sep 9, 2016

cbertolasio commented Jun 30, 2020 •

edited

glemaitre commented May 17, 2024

Predictions from RandomForestClassifier completely unstable between different machines #7366

Predictions from RandomForestClassifier completely unstable between different machines #7366

Comments

iliaschalkidis commented Sep 8, 2016 • edited

amueller commented Sep 8, 2016

iliaschalkidis commented Sep 8, 2016 • edited

amueller commented Sep 8, 2016

iliaschalkidis commented Sep 8, 2016

amueller commented Sep 8, 2016

iliaschalkidis commented Sep 9, 2016

amueller commented Sep 9, 2016

iliaschalkidis commented Sep 9, 2016

amueller commented Sep 9, 2016 • edited

iliaschalkidis commented Sep 9, 2016

amueller commented Sep 9, 2016

jmschrei commented Sep 9, 2016

jmschrei commented Sep 9, 2016

iliaschalkidis commented Sep 9, 2016 • edited

notmatthancock commented Sep 9, 2016

iliaschalkidis commented Sep 9, 2016

notmatthancock commented Sep 9, 2016

cbertolasio commented Jun 30, 2020 • edited

glemaitre commented May 17, 2024

iliaschalkidis commented Sep 8, 2016 •

edited

iliaschalkidis commented Sep 8, 2016 •

edited

amueller commented Sep 9, 2016 •

edited

iliaschalkidis commented Sep 9, 2016 •

edited

cbertolasio commented Jun 30, 2020 •

edited