Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range #745

Closed
ibuda opened this issue Aug 14, 2019 · 22 comments
Closed

IndexError: list index out of range #745

ibuda opened this issue Aug 14, 2019 · 22 comments

Comments

@ibuda
Copy link

ibuda commented Aug 14, 2019

I am running the following code:

from catboost.datasets import *
train_df, _ = catboost.datasets.amazon()
ix = 100
X_train = train_df.drop('ACTION', axis=1)[:ix]
y_train = train_df.ACTION[:ix]
X_val = train_df.drop('ACTION', axis=1)[ix:ix+20]
y_val = train_df.ACTION[ix:ix+20]
model = CatBoostClassifier(iterations=100, learning_rate=0.5, random_seed=12)
model.fit(X_train, y_train, eval_set=(X_val, y_val), verbose=False, plot=False)
shap.TreeExplainer(model)

I get the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-6d52aef09dc8> in <module>
      8 model = CatBoostClassifier(iterations=100, learning_rate=0.5, random_seed=12)
      9 model.fit(X_train, y_train, eval_set=(X_val, y_val), verbose=False, plot=False)
---> 10 shap.TreeExplainer(model)

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
     94         self.feature_dependence = feature_dependence
     95         self.expected_value = None
---> 96         self.model = TreeEnsemble(model, self.data, self.data_missing)
     97 
     98         assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
    594             self.dtype = np.float32
    595             cb_loader = CatBoostTreeModelLoader(model)
--> 596             self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
    597             self.tree_output = "log_odds"
    598             self.objective = "binary_crossentropy"

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in get_trees(self, data, data_missing)
   1120 
   1121             # load the per-tree params
-> 1122             depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
   1123 
   1124             # load the nodes

IndexError: list index out of range

This error was spotted with Catboost version 0.15.2, I upgraded to the latest version (0.16.4 as of today), but the error persists.
I have Shap version: '0.29.3'

@ibuda
Copy link
Author

ibuda commented Aug 15, 2019

I managed to find a solution to the error encountered. Apparently, num_trees is no longer the iterations number, i.e. the line:

self.num_trees = self.loaded_cb_model['model_info']['params']['boosting_options']['iterations']

causes the problem. For example, if you set the parameter iterations to 100 during model training, and the model training finishes with tree_count_ of 20, then the above line causes the error when accessing the 21st tree in oblivious_trees in the loop from the get_trees method.
Changing the above line to:
self.num_trees = len(self.loaded_cb_model['oblivious_trees'])
solved my issue.
I suppose this is not the "right" way to do it, but it works like a charm as a "temporary" fix.

@ibuda ibuda closed this as completed Aug 15, 2019
@ibuda ibuda changed the title TypeError: object of type 'NoneType' has no len() IndexError: list index out of range Aug 18, 2019
@ibuda
Copy link
Author

ibuda commented Aug 18, 2019

Reopening issue for the pull request.

@ibuda ibuda reopened this Aug 18, 2019
@ruslanmustafin
Copy link

Hi! Could this be merged?

@ibuda
Copy link
Author

ibuda commented Nov 13, 2019

Hi! Could this be merged?

it's in my pull request #749

@ruslanmustafin
Copy link

Yes, I already cloned your fix, thanks for that!

I was wondering whether it could be merged and included in a release

@Garve
Copy link

Garve commented Nov 14, 2019

Doesn't work for me. Shap v'0.31.0', Catboost v'0.18'

@ibuda
Copy link
Author

ibuda commented Nov 15, 2019

@Garve check out my pull request's #749 code, or just git clone the repo from my account.

@rightx2
Copy link
Contributor

rightx2 commented Dec 27, 2019

@ibuda This doesn't work either. I clone your repo and checkout to ef593f5 and installed but still doesn't work. Here is the reproduce code:

import shap
import catboost

from catboost import Pool
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import shuffle
 
iris_dataset = load_iris()
x = pd.DataFrame(
    iris_dataset.data,
    columns=iris_dataset.feature_names
)
y = iris_dataset.target


x, y = shuffle(x, y)

train_set = Pool(
    data=x.iloc[:100],
    label=y[:100],
)
valid_set = Pool(
    data=x.iloc[100:100+50],
    label=y[100:100+50],
)

model = catboost.CatBoostClassifier(
    iterations=1000,
    eval_metric="MultiClass", 
)

model = model.fit(
    train_set,
    eval_set=[valid_set],
    verbose=True,
    early_stopping_rounds=5,
    use_best_model=False,
)

explainer = shap.TreeExplainer(model)

The error is not about index out of range but:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-58e69af00e9b> in <module>
     38 )
     39
---> 40 explainer = shap.TreeExplainer(model)

~/miniconda3/lib/python3.7/site-packages/shap-0.29.3.dev0-py3.7-macosx-10.7-x86_64.egg/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
    100         self.feature_dependence = feature_dependence
    101         self.expected_value = None
--> 102         self.model = TreeEnsemble(model, self.data, self.data_missing)
    103
    104         assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"

~/miniconda3/lib/python3.7/site-packages/shap-0.29.3.dev0-py3.7-macosx-10.7-x86_64.egg/shap/explainers/tree.py in __init__(self, model, data, data_missing)
    663             for i in range(ntrees):
    664                 l = len(self.trees[i].features)
--> 665                 self.children_left[i,:l] = self.trees[i].children_left
    666                 self.children_right[i,:l] = self.trees[i].children_right
    667                 self.children_default[i,:l] = self.trees[i].children_default

ValueError: could not broadcast input array from shape (383) into shape (255)

My catboost version: 0.20.2

@ibuda
Copy link
Author

ibuda commented Dec 27, 2019

Hi @rightx2, you're right, the error you're getting has nothing to do with the problem presented in this issue. However, I've seen something similar to the error you're getting.
There is a way to bypass this by getting the shap_values directly from Catboost model. Some tweaking must be applied on the way:

shap_values = model.get_feature_importance(train_set, type="ShapValues")
shap_values_transposed = shap_values.transpose(1, 0, 2)
shap.summary_plot(list(shap_values_transposed[:,:,:-1]))

Screenshot from 2019-12-27 15-11-55

@slundberg
Copy link
Collaborator

@ibuda merged! (sorry for the unreasonable delay, this issue was in a batch I missed following up on)

@ibuda
Copy link
Author

ibuda commented Dec 28, 2019

@slundberg I hope to speak from the entire community, we can only imagine how busy your schedule is, and would like to thank you for a great product you've given us! Happy new coming year!

@yoavweg
Copy link

yoavweg commented Jan 8, 2020

@slundberg Thanks for your response, but I still get this "list index out of range" error with catboost-0.20.2 and shap-0.34.0.
Any intention of another update to try and solve this?

@ibuda
Copy link
Author

ibuda commented Jan 8, 2020

Hi @yoavweg. I mentioned this in #979. Apparently this merge did not get into the current package but will be included in the next one.

@ibuda
Copy link
Author

ibuda commented Feb 1, 2020

Up to my knowledge, this issue is fixed, closing.

@ibuda ibuda closed this as completed Feb 1, 2020
@okunahe
Copy link

okunahe commented Feb 11, 2020

I still have problem with "IndexError: list index out of range" by running of this line with shap-0.34.0 :
explainer = shap.DeepExplainer(model, padded_docs_train)

Here is the full error message:

IndexError Traceback (most recent call last)
in ()
1 import shap
2
----> 3 explainer = shap.DeepExplainer(model, padded_docs_train)
4
5 num_explanations = 25

/home/olya/env/lib64/python3.7/site-packages/shap/explainers/deep/init.py in init(self, model, data, session, learning_phase_flags)
78
79 if framework == 'tensorflow':
---> 80 self.explainer = TFDeepExplainer(model, data, session, learning_phase_flags)
81 elif framework == 'pytorch':
82 self.explainer = PyTorchDeepExplainer(model, data)

/home/olya/env/lib64/python3.7/site-packages/shap/explainers/deep/deep_tf.py in init(self, model, data, session, learning_phase_flags)
79 if str(type(model)).endswith("keras.engine.sequential.Sequential'>"):
80 self.model_inputs = model.inputs
---> 81 self.model_output = model.layers[-1].output
82 elif str(type(model)).endswith("keras.models.Sequential'>"):
83 self.model_inputs = model.inputs

IndexError: list index out of range

Do you have any idea, why I'm getting this error or how could I solve this problem ?

@ibuda
Copy link
Author

ibuda commented Feb 11, 2020

The error we were getting was related to catboost tree explainer, yours is related to deep explainer, and it seems that the layers[-1] causes the error, i.e. there are no layers in your NN.
I would suggest/ask you to provide a minimalistic code to reproduce the error. Thank you.

@okunahe
Copy link

okunahe commented Feb 11, 2020

Hi @ibuda. Thank you for your fast reply.
Here is a minimalistic code. Let me know if you need more details.

e = Embedding(vocab_size, 300, weights=[embedding_matrix], input_length=max_words, trainable=False)

define model

model = Sequential()
model.add(e)
model.add(Bidirectional(LSTM(32, dropout=0.5)))
model.add(Dense(5, activation='sigmoid'))

compile the model

model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

summarize the model

print(model.summary())

fit the model

model.fit(padded_docs_train, y_train, epochs=10, verbose=0)

evaluate the model

loss, accuracy = model.evaluate(padded_docs_test, y_test, verbose=0)
print('Accuracy: %f' % (accuracy*100))

@ibuda
Copy link
Author

ibuda commented Feb 11, 2020

@okunahe I would suggest you open a new issue since this one refers to a different framework.
Also, when you do that, please provide a reproducible code. I could not use the one you mentioned above to reproduce the error. I will try to help you once you do that. Thank you.

@ArpitSisodia
Copy link

I am still facing the error ' list index out of range' . when using SHAP tree explainer for CATBOOST model. Is it really fixed?

1349 self.leaf_child_cnt = []
1350 for i in range(self.num_trees):
-> 1351
1352 # load the per-tree params
1353 self.num_roots[i] = self.read('i')

IndexError: list index out of range

@ibuda
Copy link
Author

ibuda commented Feb 17, 2020

@ArpitSisodia there is something wrong with the issue you are reporting, as looking at the source code of the error you specified, the line 1352 refers to class XGBTreeModelLoader(object): but not to Catboost.

Please provide some code which we could run to reproduce the error you are getting.

@wagnerjorge
Copy link

@ibuda thanks for your dedication. I found the same error using shap and catboost version 0.34.0 and 0.21, respectively.

Minimum example:

import shap
import catboost
import pandas as pd

from catboost import Pool
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import shuffle

iris_dataset = load_iris()
x = pd.DataFrame(
iris_dataset.data,
columns=iris_dataset.feature_names
)
y = iris_dataset.target

x, y = shuffle(x, y)

train_set = Pool(
data=x.iloc[:100],
label=y[:100],
)
valid_set = Pool(
data=x.iloc[100:100+50],
label=y[100:100+50],
)

model = catboost.CatBoostClassifier(
iterations=1000,
eval_metric="MultiClass",
)

model = model.fit(
train_set,
eval_set=[valid_set],
verbose=True,
early_stopping_rounds=5,
use_best_model=False,
)

explainer = shap.TreeExplainer(model)

Error message:


IndexError Traceback (most recent call last)
C:/Users/Oncase/mpd/data_analysis/metricas_classificacao.py in
----> 1 explainer = shap.TreeExplainer(classifier1)
2 #shap_values = explainer.shap_values(train1)
3 #shap.summary_plot(shap_values, X1_train)
4
5 #classifier1.get_feature_importance(train1, type='ShapValues')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in init(self, model, data, model_output, feature_perturbation, **deprecated_options)
110 self.feature_perturbation = feature_perturbation
111 self.expected_value = None
--> 112 self.model = TreeEnsemble(model, self.data, self.data_missing)
113
114 if feature_perturbation not in feature_perturbation_codes:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in init(self, model, data, data_missing)
738 self.input_dtype = np.float32
739 cb_loader = CatBoostTreeModelLoader(model)
--> 740 self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
741 self.tree_output = "log_odds"
742 self.objective = "binary_crossentropy"

~\AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in get_trees(self, data, data_missing)
1354
1355 # load the per-tree params
-> 1356 depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
1357
1358 # load the nodes

IndexError: list index out of range

@slundberg
Copy link
Collaborator

Thanks @ibuda and @wagnerjorge , using the example I was able to find and fix the issue :)

santanaangel added a commit to santanaangel/shap that referenced this issue Feb 21, 2020
* Correct 1 usage of lodash/filter

* Update fairness explanations notebook

* Update fairness explanations notebook

* Clean up fairness notebook

* Fix XGBoost 1.0 issue

* Fix shap#1061 for CatBoost cat features

* New unit test for CatBoost categorical features

* Allow ragged arrays sizes for Catboost shap#745

Co-authored-by: Scott Lundberg <slundberg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants