BUG: Shap value cannot be calculated according to NLP tutorial on the original website #3522

Cnemoc · 2024-02-26T10:42:00Z

Issue Description

I try to implement however shap_value.shape is (10, None, 4). Perhabs, because of this, I have encountered error : "setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10,) + inhomogeneous part."

Minimal Reproducible Example

import datasets
import numpy as np
import transformers

import shap
dataset = datasets.load_dataset("imdb", split="test")

# shorten the strings to fit into the pipeline model
short_data = [v[:500] for v in dataset["text"][:20]]
classifier = transformers.pipeline("sentiment-analysis", return_all_scores=True)
classifier(short_data[:2])
pmodel = shap.models.TransformersPipeline(classifier, rescale_to_logits=True)
explainer3 = shap.Explainer(pmodel, classifier.tokenizer)
shap_values3 = explainer3(short_data[:10])
shap.plots.text(shap_values3[:, :, 1])#===>error 1
shap.plots.bar(shap_values3[:, :, 1].mean(0))#===>error 2

Traceback

#===>error 1
ValueError                                Traceback (most recent call last)
<ipython-input-145-b849a902a16a> in <cell line: 1>()
----> 1 shap.plots.text(shap_values3[:, :, 1])

4 frames
/usr/local/lib/python3.10/dist-packages/slicer/slicer_internal.py in tail_slice(cls, o, tail_index, max_dim, flatten)
    441                 import numpy
    442 
--> 443                 return numpy.array(inner)
    444             elif _safe_isinstance(o, "torch", "Tensor"):
    445                 import torch

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part." 

#===>error 2
ValueError                                Traceback (most recent call last)
<ipython-input-146-2f2b208f9c91> in <cell line: 1>()
----> 1 shap.plots.bar(shap_values3[:, :, 1].mean(0))

4 frames
/usr/local/lib/python3.10/dist-packages/slicer/slicer_internal.py in tail_slice(cls, o, tail_index, max_dim, flatten)
    441                 import numpy
    442 
--> 443                 return numpy.array(inner)
    444             elif _safe_isinstance(o, "torch", "Tensor"):
    445                 import torch

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Expected Behavior

No response

Bug report checklist

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest release of shap.
I have confirmed this bug exists on the master branch of shap.
I'd be interested in making a PR to fix this bug

Installed Versions

version: (0.44.1)

connortann · 2024-02-26T11:24:00Z

Possibly related:

Cnemoc · 2024-02-26T11:37:12Z

MY GOD!! The same can be accepted for emotional tutorial. ::(:((


import datasets
import pandas as pd
import transformers
import shap

dataset = datasets.load_dataset("emotion", split="train")
data = pd.DataFrame({"text": dataset["text"], "emotion": dataset["label"]})

# load the model and tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(
    "nateraw/bert-base-uncased-emotion", use_fast=True
)
model = transformers.AutoModelForSequenceClassification.from_pretrained(
    "nateraw/bert-base-uncased-emotion"
).cuda()

# build a pipeline object to do predictions
pred = transformers.pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    device=0,
    return_all_scores=True,
)
explainer = shap.Explainer(pred)
shap_values = explainer(data["text"][:3])
print(shap_values.shape)
shap.plots.text(shap_values[:, :, "anger"])
###gives the same error: 
"""ValueError                                Traceback (most recent call last)
[<ipython-input-7-1dae8a2e310b>](https://localhost:8080/#) in <cell line: 1>()
----> 1 shap.plots.text(shap_values[:, :, "anger"])

4 frames
[/usr/local/lib/python3.10/dist-packages/slicer/slicer_internal.py](https://localhost:8080/#) in tail_slice(cls, o, tail_index, max_dim, flatten)
    441                 import numpy
    442 
--> 443                 return numpy.array(inner)
    444             elif _safe_isinstance(o, "torch", "Tensor"):
    445                 import torch

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part."""

Edit: I checked not only 44.1 but also 42.0. Unfortunaly doing both of them. Can you give me which version is suitable to get shap value or maybe you can share which tutorial is prepared with shap version?

Cnemoc · 2024-02-26T18:54:35Z

@connortann hi, can give me information about this "Can you give me which version is suitable to get shap value or maybe you can share which tutorial is prepared with shap version?"

connortann · 2024-02-26T19:10:18Z

I'm afraid I'm not the original author. Unfortunately it looks like this tutorial has been broken for a while, and from a quick look I don't see an obvious fix.

The maintainers are working through updating everything in the docs that is broken (tracked in #3036), but there is lots more that remains to be done. Any PRs for debugging or fixing this tutorial notebook would be welcome.

Cnemoc · 2024-02-26T19:33:42Z

@connortann Thank you for responding. What should I understand? 1- shap.explanier works but tutotial is wrong, or is shap.explanier wrong and there is actually no problem in the tutorial?

CloseChoice · 2024-02-27T06:29:44Z

The closest thing to this is our notebooks CI job that is already activated for a couple of notebooks. So all notebooks that are NOT in the lists allow_to_fail and allow_to_timeout (see here) runs with every CI job we trigger

Cnemoc · 2024-02-27T09:53:48Z

Okay, I find some insight about problem. I found this notebook link This notebook uses your tutorial in version 0.41.0. However, I don't know what happened after that, but shap was corrupted and the shap library did not work until now after 42.0. Therefore, the problem has nothing to do with education. Problem with Shap version. Otherwise the code in the notebook should not have worked. However, if we want to use shap 0.41, in this case numpy (related to shap) gives error because of the numpy after 1.20 deprecated some features. @CloseChoice , @connortann

CloseChoice · 2024-02-27T10:00:31Z

@connortann Do we backport fixes to previous shap versions?
@Cnemoc In my opinion we should get the notebooks to work on the latest shap version. Or do you have a specific need to run them on an old version?

connortann · 2024-02-27T10:15:13Z

Do we backport fixes to previous shap versions?

No - I don't think we have the bandwidth for that at the moment. That could change if the library attracts a larger group of maintainers though.

Cnemoc · 2024-02-27T10:28:42Z

@CloseChoice Dear CloseChoice, Unfortunately, the reason why it doesn't work in later versions seems to be related to the source code. This was the direct question I asked you from the beginning. Is there a problem with the tutorial or the library? I've been trying for a while and there doesn't seem to be a way. If there is a way, let me know if I haven't been able to do it, but it seems that it's hard to discover something without a compass, but it's worth it if there is a pole point. The easiest thing to do now that there doesn't seem to be a pole point is to remap the numpy and related units from the old version: (because this removes the usability of all NLP work, not just part of the code. For example, a summary may not be drawn, but other explanations may be given, but the problems indicate that the NLP part of Shap is unusable. This means major restrictions on the accessibility and use of the library for an indefinite period of time. If there is no problem in the current code, it is more important to fix it, but if there is a problem in the current source code, I think this will mean that the NLP part of the library will be non-functional for a very long time. Especially considering the problem #2634 date. This is my suggestion to continue the usability of the library, thank you for everything. I hope it will be resolved as soon as possible.

connortann · 2024-02-27T11:11:23Z

@Cnemoc we are all agreed that we want to ensure everything in the package and the tutorials works without bugs. As a very small team who have only rather recently joined the shap project, we're enthusiastic about fixing things up, but given the size of the issue tracker it is going to take time to work through the issues that accumulated when this package had no active maintainers.

Let's focus on debugging the issue above and getting the tutorial working again. Pull Requests are welcome as always.

Cnemoc · 2024-02-27T16:21:04Z

@CloseChoice @connortann I solved problem, at least best solution to do not spend too much time :) The current version is working different numpy. however if you change the numpy this ==> !pip3 install mxnet-mkl==1.6.0 numpy==1.23.1 everything is fine!! You make this change in the source code to use this version of numpy only when the user wants to get a text description; In this case, Shap and slicer will be compatible and can be used without any problems. Additionally, if the user prefers this, there will be no warnings etc. while getting the chart. So the main problem is slicer and numpy incompatibility so slicer needs to be updated or numpy needs to be downgraded. Thank you for effort and patient

linusnetze · 2024-03-05T13:48:04Z

The issue is fixed by this slicer PR: interpretml/slicer#3. Unfortunately it has not been merged yet. But manually applying this fix might also be an option for those not able to downgrade numpy.

connortann · 2024-03-05T17:38:52Z

@linusnetze thanks for the pointer. It would be great if we can get someone in the interpretml org to take a look at that library, and hopefully include that PR.

Do you happen to have a minimal reproducible example that we could use as a unit test for this case?

EDIT: Shortening the strings from 500 to 5 gets the example down to something that runs in a few seconds, so we can use that.

import datasets
import transformers

import shap
dataset = datasets.load_dataset("imdb", split="test")

# shorten the strings to fit into the pipeline model
short_data = [v[:5] for v in dataset["text"][:10]]

classifier = transformers.pipeline("sentiment-analysis", return_all_scores=True)
classifier(short_data[:2])
pmodel = shap.models.TransformersPipeline(classifier, rescale_to_logits=True)
explainer = shap.Explainer(pmodel, classifier.tokenizer)
explanation = explainer(short_data[:2])
shap_values3[:, 0]

connortann · 2024-03-07T14:08:39Z

Glad to say interpretml/slicer#3 has been merged. I've made a couple of further PRs to slicer to fix a scipy.sparse issue, and hopefully we'll have a release soon. We can then update our pinned version of slicer, which should hopefully fix this issue.

CloseChoice · 2024-03-08T17:40:50Z

Hmm, the problem seems to me that the shape:

shap_values3.shape
# Output: (10, None, 2)

How confident are we that this is fixed in the slicer PR? @linusnetze do you have an example how we can use interpretml/slicer to use __getitem__ correctly?

I found pretty weird shapes here:

(Pdb++) shap_values3.values.shape
(10,)
(Pdb++) shap_values3.values
array([array([[ 0.        ,  0.        ],
              [-1.53847909,  1.53847046],
              [-2.37939414,  2.37939718],
              [-2.34919546,  2.34919846],
              [ 0.        ,  0.        ]]),
       array([[ 0.        ,  0.        ],
              [-6.28528881,  6.28555003],
              [ 0.        ,  0.        ]]),
       array([[ 0.        ,  0.        ],
              [-0.9052118 ,  0.90521145],
              [-1.53625   ,  1.53625011],
              [ 0.        ,  0.        ]]),
       array([[ 0.        ,  0.        ],
              [-5.59007929,  5.58981216],
              [ 0.        ,  0.        ]]),
       array([[ 0.        ,  0.        ],
              [-3.4587516 ,  3.45875463],
              [ 0.        ,  0.        ]]),
       array([[ 0.        ,  0.        ],
              [-2.86465472,  2.86465945],
              [-1.13306585,  1.13307001],
              [ 0.        ,  0.        ]]),
       array([[ 0.        ,  0.        ],
              [-1.77842854,  1.77842877],
              [ 0.        ,  0.        ]]),
       array([[ 0.        ,  0.        ],
              [-4.93982813,  4.93982157],
              [-0.32781138,  0.32780293],
              [ 0.        ,  0.        ]]),
       array([[ 0.        ,  0.        ],
              [-1.7631128 ,  1.76311369],
              [-1.23541839,  1.23541925],
              [ 0.        ,  0.        ]]),
       array([[ 0.        ,  0.        ],
              [-4.43581163,  4.43581649],
              [-1.28670001,  1.28669448],
              [ 0.        ,  0.        ]])], dtype=object)
(Pdb++) shap_values3.shape
(10, None, 2)

I guess the root problem is that we have arrays within arrays and the explainer class can't handle that.

CloseChoice · 2024-03-09T02:19:25Z

Just checked with the slicer master branch and this seems to throw the same error.

Edit: This is wrong. Seems to work with master. We can wait for a release or install the latest commit of slicer directly.

Cnemoc added the bug Indicates an unexpected problem or unintended behaviour label Feb 26, 2024

connortann added the documentation Relating to readthedocs, notebooks, and exposition in docstrings label Feb 26, 2024

connortann added the help wanted Indicates that a maintainer wants help on an issue or pull request label Feb 26, 2024

Cnemoc changed the title ~~BUG: Tutorial of Positive vs. Negative Sentiment Classification in original web site do not work~~ BUG: Shap value cannot be calculated according to NLP tutorial on the original website Feb 26, 2024

CloseChoice mentioned this issue Mar 9, 2024

Update slicer pin to 0.0.8 and add test #3560

Merged

2 tasks

connortann closed this as completed in #3560 Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Shap value cannot be calculated according to NLP tutorial on the original website #3522

BUG: Shap value cannot be calculated according to NLP tutorial on the original website #3522

Cnemoc commented Feb 26, 2024 •

edited

Loading

connortann commented Feb 26, 2024 •

edited

Loading

Cnemoc commented Feb 26, 2024 •

edited

Loading

Cnemoc commented Feb 26, 2024

connortann commented Feb 26, 2024

Cnemoc commented Feb 26, 2024

CloseChoice commented Feb 27, 2024

Cnemoc commented Feb 27, 2024

CloseChoice commented Feb 27, 2024

connortann commented Feb 27, 2024

Cnemoc commented Feb 27, 2024

connortann commented Feb 27, 2024

Cnemoc commented Feb 27, 2024 •

edited

Loading

linusnetze commented Mar 5, 2024

connortann commented Mar 5, 2024 •

edited

Loading

connortann commented Mar 7, 2024

CloseChoice commented Mar 8, 2024 •

edited

Loading

CloseChoice commented Mar 9, 2024 •

edited

Loading

BUG: Shap value cannot be calculated according to NLP tutorial on the original website #3522

BUG: Shap value cannot be calculated according to NLP tutorial on the original website #3522

Comments

Cnemoc commented Feb 26, 2024 • edited Loading

Issue Description

Minimal Reproducible Example

Traceback

Expected Behavior

Bug report checklist

Installed Versions

connortann commented Feb 26, 2024 • edited Loading

Cnemoc commented Feb 26, 2024 • edited Loading

Cnemoc commented Feb 26, 2024

connortann commented Feb 26, 2024

Cnemoc commented Feb 26, 2024

CloseChoice commented Feb 27, 2024

Cnemoc commented Feb 27, 2024

CloseChoice commented Feb 27, 2024

connortann commented Feb 27, 2024

Cnemoc commented Feb 27, 2024

connortann commented Feb 27, 2024

Cnemoc commented Feb 27, 2024 • edited Loading

linusnetze commented Mar 5, 2024

connortann commented Mar 5, 2024 • edited Loading

connortann commented Mar 7, 2024

CloseChoice commented Mar 8, 2024 • edited Loading

CloseChoice commented Mar 9, 2024 • edited Loading

Cnemoc commented Feb 26, 2024 •

edited

Loading

connortann commented Feb 26, 2024 •

edited

Loading

Cnemoc commented Feb 26, 2024 •

edited

Loading

Cnemoc commented Feb 27, 2024 •

edited

Loading

connortann commented Mar 5, 2024 •

edited

Loading

CloseChoice commented Mar 8, 2024 •

edited

Loading

CloseChoice commented Mar 9, 2024 •

edited

Loading