Inconsistent prediction results after dumping and loading lightgbm.LGBMClassifier via pickle #2449

slam3085 · 2019-09-25T14:49:25Z

Basically, title.

train model LGBMClassifier
get predictions for test set
dump model via pickle (I also tried joblib - had same issue)
load model
get different predictions for test set

Both models have same parameters and metrics in properties.
I'm using python 3.7.3 and lightgbm 2.2.3

StrikerRUS · 2019-09-25T15:02:02Z

@slam3085 Please post the whole snippet. Also, attach the data you use, if it's not sensitive. Or can you reproduce this issue with random data?

We do have a test where predictions are the same:

LightGBM/tests/python_package_test/test_sklearn.py

Lines 155 to 178 in be206a9

    
           def test_joblib(self): 
        
               X, y = load_boston(True) 
        
               X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42) 
        
               gbm = lgb.LGBMRegressor(n_estimators=10, objective=custom_asymmetric_obj, 
        
                                       silent=True, importance_type='split') 
        
               gbm.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)], 
        
                       eval_metric=mse, early_stopping_rounds=5, verbose=False, 
        
                       callbacks=[lgb.reset_parameter(learning_rate=list(np.arange(1, 0, -0.1)))]) 
        
               joblib.dump(gbm, 'lgb.pkl')  # test model with custom functions 
        
               gbm_pickle = joblib.load('lgb.pkl') 
        
               self.assertIsInstance(gbm_pickle.booster_, lgb.Booster) 
        
               self.assertDictEqual(gbm.get_params(), gbm_pickle.get_params()) 
        
               np.testing.assert_array_equal(gbm.feature_importances_, gbm_pickle.feature_importances_) 
        
               self.assertAlmostEqual(gbm_pickle.learning_rate, 0.1) 
        
               self.assertTrue(callable(gbm_pickle.objective)) 
        
               for eval_set in gbm.evals_result_: 
        
                   for metric in gbm.evals_result_[eval_set]: 
        
                       np.testing.assert_allclose(gbm.evals_result_[eval_set][metric], 
        
                                                  gbm_pickle.evals_result_[eval_set][metric]) 
        
               pred_origin = gbm.predict(X_test) 
        
               pred_pickle = gbm_pickle.predict(X_test) 
        
               np.testing.assert_allclose(pred_origin, pred_pickle)

slam3085 · 2019-09-26T09:07:57Z

Hi!

I can't share the data and the snippet looks like this:

try:
    with open(path, "rb") as f:
        mdl = pickle.load(f)
    print(f'lightgbm_fitted_model_{today} loaded')
except:
    print(f'Unable to open lightgbm_fitted_model_{today}; fitting...')
    X_actual_train, X_eval, y_actual_train, y_eval = train_test_split(X_train, y_train, test_size=0.1,
                                                                      random_state=42, stratify=y_train)

    hyperparams = {'n_estimators': 200, 'class_weight': 'balanced', 'random_state': 42}
    mdl = lightgbm.LGBMClassifier(**hyperparams)
    mdl.fit(X_actual_train, y_actual_train, eval_set=(X_eval, y_eval), eval_metric='roc_auc')
    with open(path, "wb") as f:
        pickle.dump(mdl, f)
pred_proba = mdl.predict_proba(X_test)[:, 1]
n_pred_proba_95 = sum(pred_proba >= 0.95)
print(f'stats: {n_pred_proba_95} out of {len(pred_proba)} are 95% objects of class 1')
return pred_proba

Basically either I fit new model and dump it via pickle, or I use already trained one.

I tried to reproduce it in local environment only and couldn't, and now I understand why - I have python 3.7.3 as local environment and python 3.7.0 as production environment. So, actual steps to reproduce are:

train model in python 3.7.3
load it in python 3.7.0
results of predictions vary

I guess it's not really a bug then or very minor one and issue can be closed.

StrikerRUS · 2019-09-26T20:53:45Z

@slam3085

I tried to reproduce it in local environment only and couldn't, and now I understand why - I have python 3.7.3 as local environment and python 3.7.0 as production environment.

Makes sense! Thanks a lot for your investigation.

dcanones · 2019-12-11T12:30:59Z

We are having the same problem. It happens when we try to deploy our models to production in a different system/environment than the one we trained the model in, and it is not easy to reproduce. Observed this issue two times:

Trying to deploy a model trained in a system (a node part of a Hadoop cluster with CentOS, 512GB RAM and a 32 cores Xeon) and loading the pickle/text in the rest of the nodes. Predictions are correct only in the same system where the model was trained and inconsistent in the rest (imagine, a continuous variable with values like [123.5, 143.2, 456.4] got predictions like [-4.0, 5.0, -9.0]). Of course using same data.
Today while deploying using MLFlow and setting up an endpoint (Ubuntu 16.04 LTS), exactly same problem.

Using both scikit-learn API and classic Python API. Changing the model in our pipeline for a typical scikit-learn RandomForestRegressor and everything is OK.

Any idea about what is happening? It looks like we are missing something important and this is driving us crazy.

guolinke · 2019-12-12T00:27:01Z

What is the difference for the prediction results?

dcanones · 2019-12-12T07:46:01Z

For example, predictions on the same machine with the same environment the model was trained:

[123.5, 143.2, 456.4] (reasonable predictions, close to the original values we try to predict)

Predictions loading the model (joblib or txt) in another machine/environment, always installing LightGBM with conda or pip:

[-4.0, 5.0, -9.0] (weird predictions, negative and integer-like with the .0)

guolinke · 2019-12-12T07:59:46Z

with this large gap, I think it is not the same problem as this issue.
did you try to save and then load from the same machine/env?

dcanones · 2019-12-19T12:38:22Z

Yes, in the same machine and using the same environment there is no problem, in the same machine and using a different environment (for example, deploying the model using MLFlow) the results are different, same as saving the model and loading it in other machines...

The thing is, the models are supposed to be portable, at least saving it as a .txt file.

guolinke · 2019-12-20T01:46:54Z

@dcanones could you try the cli version in the same machine? I think it is possible the bug of env.

lock · 2020-02-28T13:59:20Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

StrikerRUS added the bug label Sep 26, 2019

guolinke removed the bug label Sep 26, 2019

StrikerRUS closed this as completed Sep 26, 2019

lock bot locked as resolved and limited conversation to collaborators May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent prediction results after dumping and loading lightgbm.LGBMClassifier via pickle #2449

Inconsistent prediction results after dumping and loading lightgbm.LGBMClassifier via pickle #2449

slam3085 commented Sep 25, 2019

StrikerRUS commented Sep 25, 2019

slam3085 commented Sep 26, 2019

StrikerRUS commented Sep 26, 2019

dcanones commented Dec 11, 2019 •

edited

guolinke commented Dec 12, 2019

dcanones commented Dec 12, 2019

guolinke commented Dec 12, 2019

dcanones commented Dec 19, 2019

guolinke commented Dec 20, 2019

lock bot commented Feb 28, 2020

Inconsistent prediction results after dumping and loading lightgbm.LGBMClassifier via pickle #2449

Inconsistent prediction results after dumping and loading lightgbm.LGBMClassifier via pickle #2449

Comments

slam3085 commented Sep 25, 2019

StrikerRUS commented Sep 25, 2019

slam3085 commented Sep 26, 2019

StrikerRUS commented Sep 26, 2019

dcanones commented Dec 11, 2019 • edited

guolinke commented Dec 12, 2019

dcanones commented Dec 12, 2019

guolinke commented Dec 12, 2019

dcanones commented Dec 19, 2019

guolinke commented Dec 20, 2019

lock bot commented Feb 28, 2020

dcanones commented Dec 11, 2019 •

edited