IndexError: list index out of range #745

ibuda · 2019-08-14T11:09:54Z

I am running the following code:

from catboost.datasets import *
train_df, _ = catboost.datasets.amazon()
ix = 100
X_train = train_df.drop('ACTION', axis=1)[:ix]
y_train = train_df.ACTION[:ix]
X_val = train_df.drop('ACTION', axis=1)[ix:ix+20]
y_val = train_df.ACTION[ix:ix+20]
model = CatBoostClassifier(iterations=100, learning_rate=0.5, random_seed=12)
model.fit(X_train, y_train, eval_set=(X_val, y_val), verbose=False, plot=False)
shap.TreeExplainer(model)

I get the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-6d52aef09dc8> in <module>
      8 model = CatBoostClassifier(iterations=100, learning_rate=0.5, random_seed=12)
      9 model.fit(X_train, y_train, eval_set=(X_val, y_val), verbose=False, plot=False)
---> 10 shap.TreeExplainer(model)

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
     94         self.feature_dependence = feature_dependence
     95         self.expected_value = None
---> 96         self.model = TreeEnsemble(model, self.data, self.data_missing)
     97 
     98         assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
    594             self.dtype = np.float32
    595             cb_loader = CatBoostTreeModelLoader(model)
--> 596             self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
    597             self.tree_output = "log_odds"
    598             self.objective = "binary_crossentropy"

~/prb/anaconda3/lib/python3.6/site-packages/shap/explainers/tree.py in get_trees(self, data, data_missing)
   1120 
   1121             # load the per-tree params
-> 1122             depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
   1123 
   1124             # load the nodes

IndexError: list index out of range

This error was spotted with Catboost version 0.15.2, I upgraded to the latest version (0.16.4 as of today), but the error persists.
I have Shap version: '0.29.3'

The text was updated successfully, but these errors were encountered:

ibuda · 2019-08-15T09:02:48Z

I managed to find a solution to the error encountered. Apparently, num_trees is no longer the iterations number, i.e. the line:

self.num_trees = self.loaded_cb_model['model_info']['params']['boosting_options']['iterations']

causes the problem. For example, if you set the parameter iterations to 100 during model training, and the model training finishes with tree_count_ of 20, then the above line causes the error when accessing the 21st tree in oblivious_trees in the loop from the get_trees method.
Changing the above line to:
self.num_trees = len(self.loaded_cb_model['oblivious_trees'])
solved my issue.
I suppose this is not the "right" way to do it, but it works like a charm as a "temporary" fix.

ibuda · 2019-08-18T10:36:27Z

Reopening issue for the pull request.

ruslanmustafin · 2019-11-13T13:53:03Z

Hi! Could this be merged?

ibuda · 2019-11-13T14:42:41Z

Hi! Could this be merged?

it's in my pull request #749

ruslanmustafin · 2019-11-13T14:49:25Z

Yes, I already cloned your fix, thanks for that!

I was wondering whether it could be merged and included in a release

Garve · 2019-11-14T22:01:27Z

Doesn't work for me. Shap v'0.31.0', Catboost v'0.18'

ibuda · 2019-11-15T12:08:54Z

@Garve check out my pull request's #749 code, or just git clone the repo from my account.

rightx2 · 2019-12-27T01:44:35Z

@ibuda This doesn't work either. I clone your repo and checkout to ef593f5 and installed but still doesn't work. Here is the reproduce code:

import shap
import catboost

from catboost import Pool
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import shuffle
 
iris_dataset = load_iris()
x = pd.DataFrame(
    iris_dataset.data,
    columns=iris_dataset.feature_names
)
y = iris_dataset.target


x, y = shuffle(x, y)

train_set = Pool(
    data=x.iloc[:100],
    label=y[:100],
)
valid_set = Pool(
    data=x.iloc[100:100+50],
    label=y[100:100+50],
)

model = catboost.CatBoostClassifier(
    iterations=1000,
    eval_metric="MultiClass", 
)

model = model.fit(
    train_set,
    eval_set=[valid_set],
    verbose=True,
    early_stopping_rounds=5,
    use_best_model=False,
)

explainer = shap.TreeExplainer(model)

The error is not about index out of range but:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-58e69af00e9b> in <module>
     38 )
     39
---> 40 explainer = shap.TreeExplainer(model)

~/miniconda3/lib/python3.7/site-packages/shap-0.29.3.dev0-py3.7-macosx-10.7-x86_64.egg/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_dependence)
    100         self.feature_dependence = feature_dependence
    101         self.expected_value = None
--> 102         self.model = TreeEnsemble(model, self.data, self.data_missing)
    103
    104         assert feature_dependence in feature_dependence_codes, "Invalid feature_dependence option!"

~/miniconda3/lib/python3.7/site-packages/shap-0.29.3.dev0-py3.7-macosx-10.7-x86_64.egg/shap/explainers/tree.py in __init__(self, model, data, data_missing)
    663             for i in range(ntrees):
    664                 l = len(self.trees[i].features)
--> 665                 self.children_left[i,:l] = self.trees[i].children_left
    666                 self.children_right[i,:l] = self.trees[i].children_right
    667                 self.children_default[i,:l] = self.trees[i].children_default

ValueError: could not broadcast input array from shape (383) into shape (255)

My catboost version: 0.20.2

ibuda · 2019-12-27T13:14:42Z

Hi @rightx2, you're right, the error you're getting has nothing to do with the problem presented in this issue. However, I've seen something similar to the error you're getting.
There is a way to bypass this by getting the shap_values directly from Catboost model. Some tweaking must be applied on the way:

shap_values = model.get_feature_importance(train_set, type="ShapValues")
shap_values_transposed = shap_values.transpose(1, 0, 2)
shap.summary_plot(list(shap_values_transposed[:,:,:-1]))

slundberg · 2019-12-27T18:08:26Z

@ibuda merged! (sorry for the unreasonable delay, this issue was in a batch I missed following up on)

ibuda · 2019-12-28T08:05:25Z

@slundberg I hope to speak from the entire community, we can only imagine how busy your schedule is, and would like to thank you for a great product you've given us! Happy new coming year!

yoavweg · 2020-01-08T09:09:08Z

@slundberg Thanks for your response, but I still get this "list index out of range" error with catboost-0.20.2 and shap-0.34.0.
Any intention of another update to try and solve this?

ibuda · 2020-01-08T09:13:44Z

Hi @yoavweg. I mentioned this in #979. Apparently this merge did not get into the current package but will be included in the next one.

ibuda · 2020-02-01T07:16:40Z

Up to my knowledge, this issue is fixed, closing.

okunahe · 2020-02-11T13:34:46Z

I still have problem with "IndexError: list index out of range" by running of this line with shap-0.34.0 :
explainer = shap.DeepExplainer(model, padded_docs_train)

Here is the full error message:

IndexError Traceback (most recent call last)
in ()
1 import shap
2
----> 3 explainer = shap.DeepExplainer(model, padded_docs_train)
4
5 num_explanations = 25

/home/olya/env/lib64/python3.7/site-packages/shap/explainers/deep/init.py in init(self, model, data, session, learning_phase_flags)
78
79 if framework == 'tensorflow':
---> 80 self.explainer = TFDeepExplainer(model, data, session, learning_phase_flags)
81 elif framework == 'pytorch':
82 self.explainer = PyTorchDeepExplainer(model, data)

/home/olya/env/lib64/python3.7/site-packages/shap/explainers/deep/deep_tf.py in init(self, model, data, session, learning_phase_flags)
79 if str(type(model)).endswith("keras.engine.sequential.Sequential'>"):
80 self.model_inputs = model.inputs
---> 81 self.model_output = model.layers[-1].output
82 elif str(type(model)).endswith("keras.models.Sequential'>"):
83 self.model_inputs = model.inputs

IndexError: list index out of range

Do you have any idea, why I'm getting this error or how could I solve this problem ?

ibuda · 2020-02-11T13:42:19Z

The error we were getting was related to catboost tree explainer, yours is related to deep explainer, and it seems that the layers[-1] causes the error, i.e. there are no layers in your NN.
I would suggest/ask you to provide a minimalistic code to reproduce the error. Thank you.

okunahe · 2020-02-11T15:30:43Z

Hi @ibuda. Thank you for your fast reply.
Here is a minimalistic code. Let me know if you need more details.

e = Embedding(vocab_size, 300, weights=[embedding_matrix], input_length=max_words, trainable=False)

define model

model = Sequential()
model.add(e)
model.add(Bidirectional(LSTM(32, dropout=0.5)))
model.add(Dense(5, activation='sigmoid'))

compile the model

model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

summarize the model

print(model.summary())

fit the model

model.fit(padded_docs_train, y_train, epochs=10, verbose=0)

evaluate the model

loss, accuracy = model.evaluate(padded_docs_test, y_test, verbose=0)
print('Accuracy: %f' % (accuracy*100))

ibuda · 2020-02-11T16:36:01Z

@okunahe I would suggest you open a new issue since this one refers to a different framework.
Also, when you do that, please provide a reproducible code. I could not use the one you mentioned above to reproduce the error. I will try to help you once you do that. Thank you.

ArpitSisodia · 2020-02-17T09:37:19Z

I am still facing the error ' list index out of range' . when using SHAP tree explainer for CATBOOST model. Is it really fixed?

1349 self.leaf_child_cnt = []
1350 for i in range(self.num_trees):
-> 1351
1352 # load the per-tree params
1353 self.num_roots[i] = self.read('i')

IndexError: list index out of range

ibuda · 2020-02-17T09:43:52Z

@ArpitSisodia there is something wrong with the issue you are reporting, as looking at the source code of the error you specified, the line 1352 refers to class XGBTreeModelLoader(object): but not to Catboost.

Please provide some code which we could run to reproduce the error you are getting.

wagnerjorge · 2020-02-21T13:11:32Z

@ibuda thanks for your dedication. I found the same error using shap and catboost version 0.34.0 and 0.21, respectively.

Minimum example:

import shap
import catboost
import pandas as pd

from catboost import Pool
from sklearn.datasets import load_boston, load_iris
from sklearn.utils import shuffle

iris_dataset = load_iris()
x = pd.DataFrame(
iris_dataset.data,
columns=iris_dataset.feature_names
)
y = iris_dataset.target

x, y = shuffle(x, y)

train_set = Pool(
data=x.iloc[:100],
label=y[:100],
)
valid_set = Pool(
data=x.iloc[100:100+50],
label=y[100:100+50],
)

model = catboost.CatBoostClassifier(
iterations=1000,
eval_metric="MultiClass",
)

model = model.fit(
train_set,
eval_set=[valid_set],
verbose=True,
early_stopping_rounds=5,
use_best_model=False,
)

explainer = shap.TreeExplainer(model)

Error message:

IndexError Traceback (most recent call last)
C:/Users/Oncase/mpd/data_analysis/metricas_classificacao.py in
----> 1 explainer = shap.TreeExplainer(classifier1)
2 #shap_values = explainer.shap_values(train1)
3 #shap.summary_plot(shap_values, X1_train)
4
5 #classifier1.get_feature_importance(train1, type='ShapValues')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in init(self, model, data, model_output, feature_perturbation, **deprecated_options)
110 self.feature_perturbation = feature_perturbation
111 self.expected_value = None
--> 112 self.model = TreeEnsemble(model, self.data, self.data_missing)
113
114 if feature_perturbation not in feature_perturbation_codes:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in init(self, model, data, data_missing)
738 self.input_dtype = np.float32
739 cb_loader = CatBoostTreeModelLoader(model)
--> 740 self.trees = cb_loader.get_trees(data=data, data_missing=data_missing)
741 self.tree_output = "log_odds"
742 self.objective = "binary_crossentropy"

~\AppData\Local\Continuum\anaconda3\lib\site-packages\shap\explainers\tree.py in get_trees(self, data, data_missing)
1354
1355 # load the per-tree params
-> 1356 depth = len(self.loaded_cb_model['oblivious_trees'][tree_index]['splits'])
1357
1358 # load the nodes

IndexError: list index out of range

slundberg · 2020-02-21T18:48:30Z

Thanks @ibuda and @wagnerjorge , using the example I was able to find and fix the issue :)

* Correct 1 usage of lodash/filter * Update fairness explanations notebook * Update fairness explanations notebook * Clean up fairness notebook * Fix XGBoost 1.0 issue * Fix shap#1061 for CatBoost cat features * New unit test for CatBoost categorical features * Allow ragged arrays sizes for Catboost shap#745 Co-authored-by: Scott Lundberg <slundberg@users.noreply.github.com>

ibuda closed this as completed Aug 15, 2019

ibuda mentioned this issue Aug 15, 2019

Fix for CatBoostClassifier TreeExplainer with evaluation and early stopping #749

Merged

ibuda changed the title ~~TypeError: object of type 'NoneType' has no len()~~ IndexError: list index out of range Aug 18, 2019

ibuda reopened this Aug 18, 2019

ibuda mentioned this issue Jan 2, 2020

IndexError: list index out of range #979

Closed

ibuda closed this as completed Feb 1, 2020

slundberg added a commit that referenced this issue Feb 21, 2020

Allow ragged arrays sizes for Catboost #745

66b1f98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: list index out of range #745

IndexError: list index out of range #745

ibuda commented Aug 14, 2019 •

edited

ibuda commented Aug 15, 2019

ibuda commented Aug 18, 2019

ruslanmustafin commented Nov 13, 2019

ibuda commented Nov 13, 2019

ruslanmustafin commented Nov 13, 2019

Garve commented Nov 14, 2019

ibuda commented Nov 15, 2019

rightx2 commented Dec 27, 2019

ibuda commented Dec 27, 2019 •

edited

slundberg commented Dec 27, 2019

ibuda commented Dec 28, 2019

yoavweg commented Jan 8, 2020

ibuda commented Jan 8, 2020

ibuda commented Feb 1, 2020

okunahe commented Feb 11, 2020

ibuda commented Feb 11, 2020

okunahe commented Feb 11, 2020 •

edited

ibuda commented Feb 11, 2020

ArpitSisodia commented Feb 17, 2020

ibuda commented Feb 17, 2020

wagnerjorge commented Feb 21, 2020

slundberg commented Feb 21, 2020

IndexError: list index out of range #745

IndexError: list index out of range #745

Comments

ibuda commented Aug 14, 2019 • edited

ibuda commented Aug 15, 2019

ibuda commented Aug 18, 2019

ruslanmustafin commented Nov 13, 2019

ibuda commented Nov 13, 2019

ruslanmustafin commented Nov 13, 2019

Garve commented Nov 14, 2019

ibuda commented Nov 15, 2019

rightx2 commented Dec 27, 2019

ibuda commented Dec 27, 2019 • edited

slundberg commented Dec 27, 2019

ibuda commented Dec 28, 2019

yoavweg commented Jan 8, 2020

ibuda commented Jan 8, 2020

ibuda commented Feb 1, 2020

okunahe commented Feb 11, 2020

ibuda commented Feb 11, 2020

okunahe commented Feb 11, 2020 • edited

define model

compile the model

summarize the model

fit the model

evaluate the model

ibuda commented Feb 11, 2020

ArpitSisodia commented Feb 17, 2020

ibuda commented Feb 17, 2020

wagnerjorge commented Feb 21, 2020

slundberg commented Feb 21, 2020

ibuda commented Aug 14, 2019 •

edited

ibuda commented Dec 27, 2019 •

edited

okunahe commented Feb 11, 2020 •

edited