Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

!pip install dtreeviz fails on Kaggle notebook #108

Closed
georgezoto opened this issue Oct 2, 2020 · 20 comments
Closed

!pip install dtreeviz fails on Kaggle notebook #108

georgezoto opened this issue Oct 2, 2020 · 20 comments

Comments

@georgezoto
Copy link

Hello dtreeviz team,

I wanted to use your cool library on my kaggle notebook but it is failing installation:

WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fecd5b1f950>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/dtreeviz/
ERROR: Could not find a version that satisfies the requirement dtreeviz (from versions: none)
ERROR: No matching distribution found for dtreeviz

image

Here is a link to my public notebook: https://www.kaggle.com/georgezoto/intro-to-ml-underfitting-and-overfitting

Do you you have any ideas how to make this work in a kaggle notebook?

Thank you,
George

@parrt
Copy link
Owner

parrt commented Oct 2, 2020

that's weird. There must be something about the Kaggle environment that prevents it from dragging in new pip packages. That doesn't look to be a dtreeviz issue. That looks like !pip is failing from the commandline

@tlapusan
Copy link
Collaborator

tlapusan commented Oct 5, 2020

hi @georgezoto,

all you need to do is to enable internet for your kernel (from your kernel -> Setting -> Internet ON).
I've copied your notebook, edit it and it worked.

I saw that you want to visualize very deep trees...it will be hard to interpret them with any library, especially if you want to plot the whole tree structure.
If you want to check if the model is overfitting, I suggest you to check the visualizations viz_leaf_criterion() and viz_leaf_samples. If the number of leaves is too big, you can change display_type="hist".

Let's us know if it worked for you !

@parrt parrt added the question This was a question not a bug label Oct 8, 2020
@parrt
Copy link
Owner

parrt commented Oct 8, 2020

@georgezoto did that work?

@georgezoto
Copy link
Author

Thank you @tlapusan and @parrt, after enabling internet connection on my kaggle notebook, the installation progressed.
I see another package related issue though that is breaking the full installation. Any idea how to resolve this? !pip install dtreeviz is the only external library I am downloading and installing in this notebook.

Installing collected packages: graphviz, colour, py4j, pyspark, dtreeviz
  Attempting uninstall: graphviz
    Found existing installation: graphviz 0.8.4
    Uninstalling graphviz-0.8.4:
      Successfully uninstalled graphviz-0.8.4
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

mxnet 1.7.0.post1 requires graphviz<0.9.0,>=0.8.1, but you'll have graphviz 0.14.2 which is incompatible.
Successfully installed colour-0.1.5 dtreeviz-1.1.2 graphviz-0.14.2 py4j-0.10.9 pyspark-3.0.1

image

@parrt
Copy link
Owner

parrt commented Oct 8, 2020

something is weird. mxnet requires an old version of graphviz. hmm.. maybe it will be okay without that. I tried the --use-feature=2020-resolver on pip and all is well.

The error you are seeing with run likely means it has a previous version of graphviz. I'm not sure why that would be. Maybe try asking for graphviz.__version__ at runtime? who knows which version actually got installed

@georgezoto
Copy link
Author

georgezoto commented Oct 8, 2020

!pip install dtreeviz --use-feature=2020-resolver seems to work with no issues @parrt

I am not sure if this is the right version

graphviz.__version__
'0.8.4' 

This command is still failing for me

from dtreeviz.trees import dtreeviz

ImportError                               Traceback (most recent call last)
<ipython-input-20-980878a242a2> in <module>
----> 1 from dtreeviz.trees import dtreeviz

/opt/conda/lib/python3.7/site-packages/dtreeviz/trees.py in <module>
     11 import matplotlib.pyplot as plt
     12 from colour import Color, rgb2hex
---> 13 from graphviz.backend import run, view
     14 from sklearn import tree
     15 from typing import Mapping, List, Tuple

ImportError: cannot import name 'run' from 'graphviz.backend' (/opt/conda/lib/python3.7/site-packages/graphviz/backend.py)

@parrt
Copy link
Owner

parrt commented Oct 8, 2020

Definitely need 0.14.1 or above. pip says installing 0.14.2 so there is disconnect with your install vs your python execution.

@georgezoto
Copy link
Author

The issue is that this is the default environment in Kaggle @parrt
I tried the following:

!pip uninstall dtreeviz
WARNING: Skipping dtreeviz as it is not installed.
!pip install dtreeviz --use-feature=2020-resolver

Collecting dtreeviz
  Downloading dtreeviz-1.1.2.tar.gz (48 kB)
     |████████████████████████████████| 48 kB 1.2 MB/s eta 0:00:011
Requirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (1.1.3)
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (1.18.5)
Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (0.23.2)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (3.2.1)
Requirement already satisfied: xgboost in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (1.2.0)
Requirement already satisfied: pytest in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (5.4.1)
Collecting colour
  Downloading colour-0.1.5-py2.py3-none-any.whl (23 kB)
Collecting graphviz>=0.9
  Downloading graphviz-0.14.2-py2.py3-none-any.whl (18 kB)
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (1.18.5)
Requirement already satisfied: scipy in /opt/conda/lib/python3.7/site-packages (from xgboost->dtreeviz) (1.4.1)
Collecting pyspark
  Downloading pyspark-3.0.1.tar.gz (204.2 MB)
     |████████████████████████████████| 204.2 MB 29 kB/s s eta 0:00:01
Collecting py4j==0.10.9
  Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB)
     |████████████████████████████████| 198 kB 42.0 MB/s eta 0:00:01
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (1.18.5)
Requirement already satisfied: scipy in /opt/conda/lib/python3.7/site-packages (from xgboost->dtreeviz) (1.4.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn->dtreeviz) (2.1.0)
Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn->dtreeviz) (0.14.1)
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (1.18.5)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (1.2.0)
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (1.18.5)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (2.8.1)
Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from cycler>=0.10->matplotlib->dtreeviz) (1.14.0)
Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from cycler>=0.10->matplotlib->dtreeviz) (1.14.0)
Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.7/site-packages (from pandas->dtreeviz) (2019.3)
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (1.18.5)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (2.8.1)
Requirement already satisfied: py>=1.5.0 in /opt/conda/lib/python3.7/site-packages (from pytest->dtreeviz) (1.8.1)
Requirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from pytest->dtreeviz) (20.1)
Requirement already satisfied: attrs>=17.4.0 in /opt/conda/lib/python3.7/site-packages (from pytest->dtreeviz) (19.3.0)
Requirement already satisfied: more-itertools>=4.0.0 in /opt/conda/lib/python3.7/site-packages (from pytest->dtreeviz) (8.2.0)
Requirement already satisfied: pluggy<1.0,>=0.12 in /opt/conda/lib/python3.7/site-packages (from pytest->dtreeviz) (0.13.0)
Requirement already satisfied: wcwidth in /opt/conda/lib/python3.7/site-packages (from pytest->dtreeviz) (0.1.9)
Requirement already satisfied: importlib-metadata>=0.12 in /opt/conda/lib/python3.7/site-packages (from pytest->dtreeviz) (2.0.0)
Requirement already satisfied: importlib-metadata>=0.12 in /opt/conda/lib/python3.7/site-packages (from pytest->dtreeviz) (2.0.0)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata>=0.12->pytest->dtreeviz) (3.1.0)
Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from cycler>=0.10->matplotlib->dtreeviz) (1.14.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (2.4.7)
Building wheels for collected packages: dtreeviz, pyspark
  Building wheel for dtreeviz (setup.py) ... done
  Created wheel for dtreeviz: filename=dtreeviz-1.1.2-py3-none-any.whl size=52361 sha256=4d78bdd4d1267b25de5048ae8514b1717c05d3c8bba366a1c0110eff937dc185
  Stored in directory: /root/.cache/pip/wheels/fd/48/9c/ba55c9d47180cd8948dccaa9189d1fe3f5ee4782199aa4c183
  Building wheel for pyspark (setup.py) ... done
  Created wheel for pyspark: filename=pyspark-3.0.1-py2.py3-none-any.whl size=204612244 sha256=a699a3b01a0d76639ff5ac8e36bb2eb128dba4206c25807123a1919233dacf13
  Stored in directory: /root/.cache/pip/wheels/5e/34/fa/b37b5cef503fc5148b478b2495043ba61b079120b7ff379f9b
Successfully built dtreeviz pyspark
Installing collected packages: py4j, pyspark, graphviz, colour, dtreeviz
  Attempting uninstall: graphviz
    Found existing installation: graphviz 0.8.4
    Uninstalling graphviz-0.8.4:
      Successfully uninstalled graphviz-0.8.4
ERROR: mxnet 1.7.0.post1 requires graphviz<0.9.0,>=0.8.1, but you'll have graphviz 0.14.2 which is incompatible.
Successfully installed colour-0.1.5 dtreeviz-1.1.2 graphviz-0.14.2 py4j-0.10.9 pyspark-3.0.1
graphviz.__version__
NameError: name 'graphviz' is not defined

@georgezoto
Copy link
Author

Does anyone have a Kaggle notebook installing dtreeviz and working so I can take a look at?

@parrt
Copy link
Owner

parrt commented Oct 8, 2020

Oh dang. It's too bad that there environment doesn't have the newer stuff. Hmm... not sure how to get them to update.

@parrt parrt added compatibility and removed question This was a question not a bug labels Oct 8, 2020
@tlapusan
Copy link
Collaborator

@georgezoto I've tried your notebook, installed dtreeviz library and run the visualization.
I've changed the tree depth to 3. I made no other modifications.
Screen Shot 2020-10-12 at 11 43 15 AM

If your issues still persist, I would suggest cu create another notebook from scratch...maybe is something wrong with that notebook infrastructure.

@georgezoto
Copy link
Author

Thank you for giving this another try @tlapusan

I created a brand new notebook and limited the tree depth to 3 like you recommended here: https://www.kaggle.com/georgezoto/pip-install-dtreeviz-fails-on-kaggle-notebook

Unfortunately it is giving me the same error:

from dtreeviz.trees import dtreeviz

ImportError                               Traceback (most recent call last)
<ipython-input-13-980878a242a2> in <module>
----> 1 from dtreeviz.trees import dtreeviz

/opt/conda/lib/python3.7/site-packages/dtreeviz/trees.py in <module>
     11 import matplotlib.pyplot as plt
     12 from colour import Color, rgb2hex
---> 13 from graphviz.backend import run, view
     14 from sklearn import tree
     15 from typing import Mapping, List, Tuple

ImportError: cannot import name 'run' from 'graphviz.backend' (/opt/conda/lib/python3.7/site-packages/graphviz/backend.py)

Screen Shot 2020-10-12 at 12 16 09 PM

@parrt
Copy link
Owner

parrt commented Oct 12, 2020

yes it definitely seems that Kaggle is behind. Is there a way to run a pip upgrade on a package in Kaggle's environment?

@georgezoto
Copy link
Author

georgezoto commented Oct 12, 2020

We have a winner 🎉 great suggestion @parrt
The underlying issue is an outdated graphviz version. You need to upgrade graphviz BEFORE importing it otherwise you will get use the outdated version in Kaggle (0.8.4) instead of the latest one 0.14.2

After this step dtreeviz is installed properly and is able to visualize your tree as long is its not deep.

Solution: !pip install graphviz --upgrade before import graphviz

Thank you friends for helping me troubleshoot this,
George

Screen Shot 2020-10-12 at 12 59 21 PM

@parrt parrt closed this as completed Oct 12, 2020
@fridrichmrtn
Copy link

I would like to just add, that the install/update is best done before loading/using any library. I know this might be obvious but took me a while today.

@Arpit-DS
Copy link

Arpit-DS commented Feb 15, 2021

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mxnet 1.7.0.post2 requires graphviz<0.9.0,>=0.8.1, but you have graphviz 0.16 which is incompatible.
autogluon-core 0.1.0b20210210 requires graphviz<0.9.0,>=0.8.1, but you have graphviz 0.16 which is incompatible.
autogluon-core 0.1.0b20210210 requires numpy==1.19, but you have numpy 1.19.5 which is incompatible.
Successfully installed graphviz-0.16
Note: you may need to restart the kernel to use updated packages.

!pip install graphviz==0.16

After installing the above version of graphviz, dtreeviz worked for me on kaggle.

@ratankj
Copy link

ratankj commented Dec 29, 2022

Screenshot_1

@ratankj
Copy link

ratankj commented Dec 29, 2022

Hello dtreeviz team,

i m facing this error with my notenook. I already install the dtreeviz for my notebook. after even after that this error is showing again and again.

can any one look into this .

@tlapusan
Copy link
Collaborator

Hi ratankj,

I assume your notebook is a kaggle one since you posted here, right ?

how did you install dtreeviz ? did you check the python path ? did you import dtreeviz ?
We need more details to help you, the above details are not sufficient.

@parrt
Copy link
Owner

parrt commented Dec 29, 2022

Try this from release notes:

Using old functions with 2.0+:

For backward compatibility to call function dtreeviz() and the old API, you can change the import to be:

from dtreeviz import *
dtreeviz(tree_model=clf, X_train, ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants