Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictions from RandomForestClassifier completely unstable between different machines #7366

Closed
iliaschalkidis opened this issue Sep 8, 2016 · 19 comments

Comments

@iliaschalkidis
Copy link

iliaschalkidis commented Sep 8, 2016

Text classification - Entity Recognition

I am currently having some text classification experiments using windows of word embeddings + some hand-crafted features. I already used LinearSVC and LogisticRegression and I want to finalize my experiments with RandomForestClassifier.

Use of VM - Dataset (train,validation,test) - Weird behaviour

Because my datasets are pretty big around 15-35 GB, I am having my training online on a VM and then I make a classification report on my test set on my laptop.

So far so good with LinearSVC and LogisticRegression, but RandomForestClassifier has pretty weird behaviour.

After training classifier in my code, I make a classification report over a validation set (the last 20% of my initial samples). Of course I don't use the same samples during the training, so we have 20% of fresh samples to measure our classifier. WIth RandomForestClassifier I get the following report:

VALIDATION RESULTS

     precision    recall  f1-score   support

  0       0.99      0.99      0.99     78473
  1       0.89      0.81      0.85      5711

Then I downloaded the classifier, loaded and start the classification report over test set on my laptop, which gives us:

TEST RESULTS (LAPTOP)

     precision    recall  f1-score   support

  0       0.93      0.99      0.96     33845
  1       0.52      0.07      0.13      2618

This doesn't look good at all. There is no any possibility to have so much different results between validation and test from the same dataset (both unseen during training). So I tried to think what possibly is going wrong from a theoretical point of view and I couldn't find something that makes such a great difference. So I tried to run the same test on the VM which I use for training experiments.

TEST RESULTS (VM)

     precision    recall  f1-score   support

  0       0.99      0.99      0.99     33845
  1       0.91      0.81      0.85      2618

So it seems (in my case?) RandomForestClassifier works really differently from one machine to the other. Any other classifier (SVC, LR) gives the same validation report on both machines.

I would be grateful to have an explanation of this strange behavior to fix this issue. Maybe I'm missing something...

Extra Information

  • Enviroments

VM :
OS: Ubuntu 14.04 LTS
Python 3.5
scikit-learn 0.17.1
numpy 1.11.1
scipy 0.17.1

LAPTOP :
OS: Ubuntu 16.04 LTS
Python 3.5
scikit-learn 0.17.1
numpy 1.11.1
scipy 0.17.1

  • Parameters

RandomForestClassifier(n_jobs=1, n_estimators=200, warm_start= True)

I was using n_jobs=-1, but then I fixed it to one to be sure that the problem is not happening because of the multi-thread difference (4 threads vs 8 threads).

  • Save/Load

I'm saving - loading classifiers as suggested on scikit-learn.org ( http://scikit-learn.org/stable/modules/model_persistence.html )

  • Metrics

Default scikit-learn classification_report (metrics.classification_report)

  • Notice (Size of .npy files)

LinearSVC and LogisticRegression are saved in 1 .pkl file and 4 extra .npy files for each classifier. RandomForestClassifier is saved 1 .pkl file and 801(!!!) extra .npy files.

@amueller
Copy link
Member

amueller commented Sep 8, 2016

You didn't set random_state, did you?

@iliaschalkidis
Copy link
Author

iliaschalkidis commented Sep 8, 2016

@amueller No I used the default option. My configuration is RandomForestClassifier(n_jobs=1, n_estimators=200, warm_start= True). Should I use the random_state?

@amueller
Copy link
Member

amueller commented Sep 8, 2016

yes, otherwise it's not deterministic, even on the same machine.

@iliaschalkidis
Copy link
Author

@amueller OK, I will try it and send feedback. Should I use a specific value e.g. (random_state=1) or tune it let's say with gridsearch or is irrelevant with the final result?

Also, LinearSVC, LogisticRegression has the same parameter, I also used the default option but the results are the same. Should I feel lucky or is less important on these classifiers?

@amueller
Copy link
Member

amueller commented Sep 8, 2016

just us a fixed value. And on linear models it's less important, and not all solvers use it.

@iliaschalkidis
Copy link
Author

Hi there,

@amueller I tried to solve the issue using the following configuration (with random_state) as you suggested:

RandomForestClassifier(n_jobs=1, n_estimators=200, warm_start= True,random_state=rs)

, firstly setting the rs variable as

rs = 1 

and also using numpy RandomState as:

rs = numpy.random.RandomState(1234)
rs.seed(1234)

The predictions in both cases are still different between different machines and are always the same in a single machine:

TEST RESULTS (LAPTOP)

    precision    recall  f1-score   support

0       0.93      0.99      0.96     33845
1       0.52      0.07      0.13      2618

TEST RESULTS (VM)

     precision    recall  f1-score   support

 0       0.98      0.99      0.99     33845
 1       0.90      0.80      0.85      2618

I'm just copying the saved files from the VM to my laptop and load them using joblib.load()

@amueller
Copy link
Member

amueller commented Sep 9, 2016

Oh sorry I didn't read correctly. So you pickle on one machine and unpickle on the other?
Is one of them by chance 32bit and the other 64bit?

@iliaschalkidis
Copy link
Author

@amueller Yes that's what I want to do save using joblib on VM and load the classifier on my laptop. Both machines seems to be 64bit...

VM returns:

> kiddo@ml-experiments:~$ arch
> x86_64

My laptop returns:

> kiddo@kiddo-K56CB:~$ arch
> x86_64

@amueller
Copy link
Member

amueller commented Sep 9, 2016

wow, that is strange. ping @jmschrei
can you check the values attribute of the first tree in the forest?

rf.estimators_[0].tree_.values_

on both machines?

Also, can you please provide the full code for training, predicting, storing and loading?

Thanks.

@iliaschalkidis
Copy link
Author

I placed the following command on validation code:

    classifier = joblib.load(classifier_file_name)
    print(classifier.estimators_[0].tree_.value)

Your suggestion, if I understood,

    classifier = joblib.load(classifier_file_name)
    print(classifier.estimators_[0].tree_.values_)

gave me:

"AttributeError: 'sklearn.tree.tree.Tree' object has no attribute 'values'"

So with my code we have the following printing:

VM

 kiddo@ml-experiments:~/cognitiv-app/test/AnalysisEngine$ python validate_model.py --cl_name 'RF' --window_size 11 --element 'counterparty'

 [[[  3.12816000e+05   2.39210000e+04]]

  [[  2.92080000e+05   1.79910000e+04]]

  [[  2.28962000e+05   8.68200000e+03]]

  ..., 
  [[  8.00000000e+00   2.00000000e+00]]

  [[  0.00000000e+00   2.00000000e+00]]

  [[  8.00000000e+00   0.00000000e+00]]]

LAPTOP

 kiddo@kiddo-K56CB:~/PycharmProjects/cognitiv-app/test/AnalysisEngine$ python validate_model.py --cl_name 'RF' --window_size 11 --element 'counterparty'

 [[[  3.12816000e+05   2.39210000e+04]]

  [[  2.92080000e+05   1.79910000e+04]]

  [[  2.28962000e+05   8.68200000e+03]]

  ..., 
  [[  8.00000000e+00   2.00000000e+00]]

  [[  0.00000000e+00   2.00000000e+00]]

  [[  8.00000000e+00   0.00000000e+00]]]

I'm really sorry but I would like to keep the code private :)

@amueller
Copy link
Member

amueller commented Sep 9, 2016

Are the values identical?
You should be able to reproduce with simple code for which there is no reason to keep it private.
If you can't provide code, we can't help.

@jmschrei
Copy link
Member

jmschrei commented Sep 9, 2016

It'd be really useful if you could look at the full tree, which will allow you to see the overall structure, thresholds, and values together easily. It seems like your values are the same though. Here is how to plot your trees: http://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html

It shouldn't make a huge difference hopefully, but can you make sure that your dataset is 64 bit floats in both cases by explicitly casting them before putting them in the prediction step?

@jmschrei
Copy link
Member

jmschrei commented Sep 9, 2016

I suppose another possible issue is that joblib has a bug. Can you try training your model, looking at the testing/validation data with it on the VM, saving the model, loading the mdoel back up on your VM, and checking to make sure the results are the same?

@iliaschalkidis
Copy link
Author

iliaschalkidis commented Sep 9, 2016

@jmschrei This is what I do actually.

First I run a python file (File 1) on VM, which has the following code:

        [PREPARE DATA]
        [SPLIT INTO TRAIN AND VALIDATION SET 0.8/0.2]
        [IF SVM, ELIF LR]
        elif self._parameters['classifier_name'] == 'RF':
            rs = numpy.random.RandomState(1234)
            rs.seed(1234)
            // ALSO EXPERIMENTS WITH rs=1
            self._classifier = RandomForestClassifier(n_jobs=self._parameters['classifier_params']['n_jobs'][0],
                                                      n_estimators=self._parameters['classifier_params']['n_estimators'][0],
                                                      warm_start=self._parameters['classifier_params']['warm_start'][0],
                                                      verbose=self._parameters['classifier_params']['verbose'][0],
                                                      random_state=rs)
        self._classifier.fit(x_train, y_train)
        [PREDICT AND PRINT CLASSIFICATION REPORT ON VALIDATION SET]
        self._save_classifier(element_name)

Then I run another python file (File 2), on both VM and laptop, which has the following code:

         [PREPARE TEST DATA]
         classifier = joblib.load(classifier_file_name)
         predictions = classifier.predict(x_test)
         [PRINT CLASSIFICATION REPORT ON TEST SET]

Both VM and my laptop run the same piece of code (File 2) to print the evaluation report and have the different printed results I already mentioned. The whole project is on a Git repository, so both machines run absolutely the same code.

I do the same process with LogisticRegression and LinearSVC and I don't have any problem like that so far...

Thanx for your replies!

@notmatthancock
Copy link
Contributor

The line, classifier = joblib.load(classifier_file_name), uses the same file on both machines?

@iliaschalkidis
Copy link
Author

@notmatthancock Yes I download the classifier files from VM and place them in the models directory on my laptop.

@notmatthancock
Copy link
Contributor

Try comparing the attributes of classifier on both machines upon loading. For example, take t = classifier.estimators_[0].tree_, and there are various attributes to inspect, such as t.children_left, t.children_right, t.feature, t.impurity, etc. See if you spot any differences between the imported classifiers on the two machines.

@cbertolasio
Copy link

cbertolasio commented Jun 30, 2020

@notmatthancock , I am having the same issue. What would you do if you saw differences in the t values as described above?

In my scenario I have trained on vm1, dumped the model onto the filesystem of vm1, then loaded the model back into a notebook on vm1. At this point I get the exact same predictions that were made right after the model was first fit/created in memory on vm1.

However when I take the model from vm1 and put it into a docker container that runs on vm1, or on another set of vms running on k8s nodes I get vastly different predictions than when the model is loaded into an ipynb.

@glemaitre
Copy link
Member

We would need a reproducer to be able to answer here. I can think of some potential architecture issue maybe some tie breaking that will be different as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants