Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy import of sklearn and xgboost #99

Closed
wants to merge 1 commit into from
Closed

Conversation

yoavram
Copy link

@yoavram yoavram commented Sep 6, 2020

Importing xgboost just before using it is needed is useful on systems where it is not installed.
Since installation of XGBoost can be tricky, I think this is a good idea.

@tlapusan
Copy link
Collaborator

tlapusan commented Sep 6, 2020

Hi @yoavram, let me see if I've understood your issue.
You want to say that in current release version you need to have installed both scikit-learn and xgboost, even if you want to use only one of them?

@yoavram
Copy link
Author

yoavram commented Sep 6, 2020

Yes it seems that when I import dtreeviz I get an error about bad xgboost install; when I moved the import (since I work with scikit learn) it works fine.
If you’d want to have it work without installing scikit learn, you’d probably need to try-except the import statement, but I haven’t done that.

@tlapusan
Copy link
Collaborator

tlapusan commented Sep 7, 2020

I've checkout your PR and uninstalled xgboost.

Screen Shot 2020-09-07 at 5 48 45 PM

If we cannot install xgboost or have problems importing it, we still will have issues.
Even if we have the error in pycharm (xgboost module cannot be found), we can still render scikit-learn visualizations... (because the way python interpreter works)

Screen Shot 2020-09-07 at 5 54 29 PM

If we switch the ShadowXGBDTree initialization block and we want to render scikit-learn visualizations, we will receive the "ModuleNotFoundError: No module named 'xgboost'" exception.

Since get_shadow_tree() is called from any visualization methods.... I don't have an idea right now how to implement the "lazy" import. Any suggestions ?

@parrt
Copy link
Owner

parrt commented Sep 7, 2020

I believe I did lazy import somewhere. hang on. Yep, i've done this kinda stuff. would this be helpful?

    if 'numpy' not in sys.modules:
        import numpy
    np = sys.modules['numpy']
    if not isinstance(data,np.ndarray):
        return " "

@yoavram
Copy link
Author

yoavram commented Sep 7, 2020

No I think that’s an overkill. What I mean by “Lazy import” Is that you only import when you need it, not that you import if it hasn’t been imported. The latter is taken care of by Python automatically.

@yoavram
Copy link
Author

yoavram commented Sep 7, 2020

@tlapusen I’ll take a closer look at what you wrote later.

@parrt
Copy link
Owner

parrt commented Sep 7, 2020

Meaning that the user has to do the import? isn't that the way it is now?

@tlapusan
Copy link
Collaborator

hi @yoavram

do you continue with this or should we close this PR ?

@parrt
Copy link
Owner

parrt commented Oct 12, 2020

@tlapusan I was thinking that we should do this conditional install that somebody fixed for me in https://github.com/parrt/tensor-sensor

That fixes what is required to be installed. Still, inside the code we have to avoid using types that don't exist. We tweaked the code also. See https://github.com/parrt/tensor-sensor/blob/master/tsensor/analysis.py#L459

@parrt
Copy link
Owner

parrt commented Oct 12, 2020

@yoavram my understanding is that those imports would slow things down but I'm not sure it matters, depending on how many times you call the function. I think it's easiest if we asked for the class names and module names to check for types.

@tlapusan
Copy link
Collaborator

@parrt I'm gonna look on those conditional imports in the next days, thanks.

@parrt
Copy link
Owner

parrt commented Oct 12, 2020

I'm laughing because I had no idea you could do conditional pip installs...

@tlapusan
Copy link
Collaborator

tlapusan commented Oct 20, 2020

@parrt I've looked over conditional imports and found also pip docs about it. Like you said, we still need to check into code if the libraries are installed or not.
I suppose that the default installation (pip install dtreeviz) should install all requirements and the custom pip installations should be very well documented in readme.md page, else the people will be confused.

I started to work on lightgbm implementation and I would like to work on this fix (#99) later. (happy if someone else could do this meanwhile)

@parrt
Copy link
Owner

parrt commented Oct 20, 2020

@tlapusan lightgbm is definitely more important :)

@parrt
Copy link
Owner

parrt commented Nov 25, 2020

I think all we need is something like the following in the set up (from tensor-sensor):

    install_requires=['graphviz>=0.14.1','numpy','IPython', 'matplotlib'],
    extras_require = {'all': all_requires,
                      'torch': torch_requires,
                      'tensorflow': tensorflow_requires
                     },

then check for the modules:

if v.shape.__class__.__module__ == "torch": ...

poking around the web, it looks like people do this for conditional imports:

# pylint: disable=import-not-at-top
try:
  from monty.python.au import bruce
else:
  bruce = None

But, we still need to use the class/module name instead of checking the type in the code.

@tlapusan
Copy link
Collaborator

tlapusan commented Jan 8, 2021

hi @yoavram, please follow this PR #115 if you are still interested in making xgboost or pyspark lazy imported.

@yoavram
Copy link
Author

yoavram commented Jan 18, 2021

Sorry I fell off this issue. Hopefully someone can pick it up.

@parrt
Copy link
Owner

parrt commented Jan 24, 2021

@tlapusan should we close this one in favor of #115? I'm trying to figure out which is the PR to focus on.

@parrt
Copy link
Owner

parrt commented Jan 24, 2021

Closing since I merged #115.

@parrt parrt closed this Jan 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants