Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saving OptimalBinning & some other issues #77

Closed
skwskwskwskw opened this issue Feb 5, 2021 · 18 comments
Closed

saving OptimalBinning & some other issues #77

skwskwskwskw opened this issue Feb 5, 2021 · 18 comments
Assignees
Labels
enhancement New feature or request

Comments

@skwskwskwskw
Copy link

Hi,

I am dealing with a big dataset => so scorecard module can't be used on my pc. I resort to OptimalBinning of each variable:

Code
optb = OptimalBinning(name=variable, dtype="numerical")
optb.fit(x, y)

import joblib
joblib.dump(optb, output+'txt.pkl')

Error
_TypeError: can't pickle thread.RLock objects

Would love to:

  1. Obtain the criteria of bins (to transform the dataset later)? Or;
  2. Save the OptimalBinning in pickle format.

Any thoughts on the above are much appreciated.

@guillermo-navas-palencia
Copy link
Owner

Hi @similang,

I just test this and I did not have any problem:

image

Tested in Linux Lubuntu with Python 3.7.

@guillermo-navas-palencia
Copy link
Owner

I searched for this issue with joblib, and it seems to be related to multiprocessing. Are you trying to run several OptimalBinning in parallel?

@skwskwskwskw
Copy link
Author

Ah, no - just a for loop and I don't apply any multiprocessing. Even for one variable the same problem persists.

I am doing it in a windows pc. By the way - is it possible to get the bin rules to apply on new datasets?

@guillermo-navas-palencia
Copy link
Owner

Not directly, but you can retrieve the split points for each variable and implement your own transform method.

@guillermo-navas-palencia
Copy link
Owner

I will add new functionality to create a binning_process/scorecard from a set of OptimalBinning objects. In this way, it will be possible to transform and create a scorecard table for large datasets while only keeping the data x and target y in memory. Does it sound reasonable?

@skwskwskwskw
Copy link
Author

That would be perfect.

@guillermo-navas-palencia
Copy link
Owner

Perfect, it might take a few days. I will keep you informed.

@naenumtou
Copy link

@guillermo-navas-palencia Having the same issue for exporting the object and looking forward to seeing the new support. However, why did you successfully run without any issue?

@guillermo-navas-palencia
Copy link
Owner

Hi @naenumtou, are you using Windows, Linux or Mac? I only tested it on Linux.

guillermo-navas-palencia added a commit that referenced this issue Feb 8, 2021
@naenumtou
Copy link

@guillermo-navas-palencia I am using on Google Colab.

@guillermo-navas-palencia
Copy link
Owner

guillermo-navas-palencia commented Feb 9, 2021

I can reproduce the error on Google Colab. However, I run it without any issue on Linux and Windows using anaconda 3.7 and 3.8. I would recommend you to run it locally and see if the error persists. I do not know why it does not work on Google Colab, I do not use it regularly.

@skwskwskwskw
Copy link
Author

skwskwskwskw commented Feb 9, 2021

Hi, is there .whl for this package? Tried to install with .whl.

#Edit: think I found it - but is there an official link?

@guillermo-navas-palencia
Copy link
Owner

@guillermo-navas-palencia guillermo-navas-palencia added the enhancement New feature or request label Feb 11, 2021
@guillermo-navas-palencia
Copy link
Owner

Hi,

I found the source of the error. The logger in all OptBinning classes cannot be pickled and Google Colab fails. It will be fixed in the next release. In addition, all optimal binning classes will expose the method save to automatically save the object to a pickle file.

@guillermo-navas-palencia
Copy link
Owner

Hi @similang @naenumtou,

Release 0.9.1 is ready. It has been tested on Google Colab installed with the wheel (https://pypi.org/project/optbinning/#files). Please update OptBinning and reopen this issue if you encounter any problem.

@naenumtou
Copy link

@guillermo-navas-palencia I have tried to install via wheel on my Colab environment. It fail to build because the library required python version >=3.7 but on Colab is 3.6.9.

Is there any way to fix it?

@guillermo-navas-palencia
Copy link
Owner

Hi @naenumtou,

When I tested it on Colab I created a wheel changing the requirement >= 3.7 in setup.py manually. Python 3.7 is required due to some dependencies with CVXPY (SCS solver) but apparently, it might work with Python 3.6 under some environments. I would recommend updating Colab python to 3.7 if possible (python 3.6 is from 2016). Otherwise, try to run it locally.

It might help: https://stackoverflow.com/questions/63867581/install-python-3-7-via-google-colab-as-default-python

@apatange-source
Copy link

Hi @similang @naenumtou,

Release 0.9.1 is ready. It has been tested on Google Colab installed with the wheel (https://pypi.org/project/optbinning/#files). Please update OptBinning and reopen this issue if you encounter any problem.

Hey! I'm still getting the same issue saying:
TypeError: cannot pickle '_thread.lock' object

I'm running python 3.8 wiht optbinning 0.12.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants