How to interpret the Shop force plot? #977

SSMK-wq · 2020-01-02T05:27:14Z

Hello Everyone,

I am trying to practice and learn shapley value approach to explain my predictions on a binary classification problem. However am having difficulty in understanding the below plot.

Does it indicate the day_2_balance influences prediction to 1? or does blue values leads to prediction 1
What about the axis scale? (-4.357 to 5.643). How is this obtained?
What does base value mean?
When I hover around the pink color , I see few more column names with some values. What do they indicate?
does the size of features represent their importance? Meaning PEEP_min=5 has a larger size than other features?
What does higher to lower and lower to higher indicate?
Why is -2.92 alone is in bold format? If it's predicted value, how can it be because am working on a binary classification problem with label 1 and label 0?

Can someone help me with this?

The text was updated successfully, but these errors were encountered:

ibuda · 2020-01-02T06:51:41Z

Hi, I will try to respond in the order of the questions asked.

As a matter of fact, all of the features (day_2_balance, PEEP_min, Fi02_100_max, etc) values lead to the prediction value of -2.92, which is then "transformed" to a value of 1. Here, by all values I mean even those that are not shown in the plot. However, Shap plots the top most influential features for the sample under study. Features in red color influence positively, i.e. drag the prediction value closer to 1, features in blue color - the opposite.
As you already might have understood, the model prediction values are not 0 and 1 (discrete), but real (float) number values - raw values. The scale here represents a visualization of a small interval around the output and base values.
The base value is the average of all output values of the model on the training
The pink (red) color features in your example are many with small (low importance) values. The plot stacked them all together and shows their values on hover. The values you see are those raw values I mentioned above. They represent how much those features influence the final output of the model for the sample under study.
Correct. In the case of PEEP_min, it has negative magnitude importance, i.e. prediction tends more to 0 because of its value.
Higher in pink(red) means that high pink values drag the prediction to 1 (i.e. increase the output raw value), whereas the blue drag it towards 0 (i.e. decrease the model output value).
You are right, -2.92 is the model output for your sample with index 4100. This is the "raw" value which is then transformed into probability space, to give you the final output of 0 and 1 (< 0.5 and > 0.5).
If you need more details on some of the above questions, or additional ones arise, please consult the tutorial notebooks - these are well explained and illustrate the entire span of the use cases of this incredible package.

SSMK-wq · 2020-01-02T07:29:49Z

Hi @ibuda - Just a quick question regarding point 1 of your response. "the prediction value of -2.92, which is then "transformed" to a value of 1" - "should this be "transformed to a value of 0"?

Because -2.92 is less than 0, so the final predicted output for this observation (4100) is class 0. Am I right?

ibuda · 2020-01-02T07:48:10Z

Hi @ibuda - Just a quick question regarding point 1 of your response. "the prediction value of -2.92, which is then "transformed" to a value of 1" - "should this be "transformed to a value of 0"?

Because -2.92 is less than 0, so the final predicted output for this observation (4100) is class 0. Am I right?

In general, yes and no. Even though it is lower than the base value of 0.6427, it might be 1 for some cases. I've seen this happen when the training dataset is highly unbalanced.
Coming back to your case, I think you are right, it is 0. I said it "transforms" to 1 (i.e some probability higher than 0.5) because you mentioned "influences prediction to 1? " in your question. So I guess it is my typo because of not paying attention to the context. :)
Let me know if you have any other questions.

vinrok · 2020-12-21T06:06:28Z

Hello All,

I have just started learning about Explainable AI and was implementing the SHAP algorithm for that. But I am facing difficulty in interpreting the results of the SHAP force_plot. I will be grateful if someone can help me with an intuitive way of interpreting it. 🙂 🙏

Consider, below force_plot for about 43 test samples of heart_data.

Does the numbering on the top X-axis represent samples in the dataset?
How to actually interpret the force_plot result as to which feature contributes more in predicting whether the patient has heart disease or not?

Here is the force_plot for the 10th sample individually and how we can relate it with the above:

Here is the link to my notebook - Explainable AI using SHAP

ibuda · 2020-12-21T07:22:55Z

Hi @vinrok, great to hear that you are using shap. The answers to your questions are:

Yes, the upper numbers on the x-axis are the indices of the sample data.
The above plot is just a summary of 43 horizontally stacked individual plots. If you reverse it (or vice versa if you reverse the second plot) you'll see that they coincide for the 10th sample. You need to use summary_plot in order to see what feature and in what way contributes more or less to the classification of your patients.

I looked at your notebook, you did use the summary_plot. There you can see that cp and thal are your top 2 features that influence the output of your model.

For force_plot I would add only the following - there is a drop box in the upper-middle region of the graph. Use it to see what the plot shows you for cp - most important feature and chol - least important feature. For cp the width of its band is large, whereas for chol it's not.

I also suggest you play with the left side drop box as well. And try to use it in combination with the above-mentioned dorp box.

Hope that answers your question. If not, do let us know what other questions arise.

vinrok · 2020-12-21T08:35:39Z

Thank you so much this intuitive explanation @ibuda 😊🙏.

vinrok · 2020-12-21T09:21:04Z

But @ibuda I am still confused regarding the pink and the blue bands. I mean consider sample order by similarity and f(x). Can we say that for this plot we are putting the patients having the most similar features together?

Then in that case how to interpret the prediction in terms of pink and blue and by hovering over it?

As for samples in range 8-17, from the test dataset, most of them have the label - 1 (heart disease) but from this plot, the SHAP value goes below base value.

ibuda · 2020-12-21T15:03:11Z

I think you are confusing predictions with Y_test. Shap gives you info on what your model predicted, not what the real value is supposed to be.

That is why you get that discrepancy.

Anyways, the blue band is what features and how much are dragging the final output value down (to 0 class), and the pink bands are those that increase it (up to 1 class).

Try to look at your notebook with this in mind, and let me know if that helps.

vinrok · 2020-12-21T15:06:23Z

Got it @ibuda The idea is somewhat clear to me now. 😊

ibuda · 2020-12-21T15:22:47Z

Hi @SSMK-wq, for the sake of consistency, if your questions were answered, please consider closing this issue. Thank you.

herewego321 · 2021-03-02T15:52:48Z

Hi, I will try to respond in the order of the questions asked.

As a matter of fact, all of the features (day_2_balance, PEEP_min, Fi02_100_max, etc) values lead to the prediction value of -2.92, which is then "transformed" to a value of 1. Here, by all values I mean even those that are not shown in the plot. However, Shap plots the top most influential features for the sample under study. Features in red color influence positively, i.e. drag the prediction value closer to 1, features in blue color - the opposite.

As you already might have understood, the model prediction values are not 0 and 1 (discrete), but real (float) number values - raw values. The scale here represents a visualization of a small interval around the output and base values.

The base value is the average of all output values of the model on the training

The pink (red) color features in your example are many with small (low importance) values. The plot stacked them all together and shows their values on hover. The values you see are those raw values I mentioned above. They represent how much those features influence the final output of the model for the sample under study.

Correct. In the case of PEEP_min, it has negative magnitude importance, i.e. prediction tends more to 0 because of its value.

Higher in pink(red) means that high pink values drag the prediction to 1 (i.e. increase the output raw value), whereas the blue drag it towards 0 (i.e. decrease the model output value).

You are right, -2.92 is the model output for your sample with index 4100. This is the "raw" value which is then transformed into probability space, to give you the final output of 0 and 1 (< 0.5 and > 0.5).
If you need more details on some of the above questions, or additional ones arise, please consult the tutorial notebooks - these are well explained and illustrate the entire span of the use cases of this incredible package.

Hi, you mentioned this -2.92 is the raw output and it can be transformed into probability space. But how can I do such transformation? I encountered the same question. My prediction model is LightGBM and I am using shap.TreeExplainer(). Or maybe you could give me a hint where I can find the transform equation?

hjanh · 2022-01-18T16:34:49Z

Hi, you mentioned this -2.92 is the raw output and it can be transformed into probability space. But how can I do such transformation? I encountered the same question. My prediction model is LightGBM and I am using shap.TreeExplainer(). Or maybe you could give me a hint where I can find the transform equation?

These values are log-odds due to this example being a (binary) classification task. You can easily revert them:


def logodds_to_prob(logit):
        odds = math.exp(logit)
        return odds / (1 + odds)

prashanthin · 2022-03-02T05:07:51Z

How to interpret below shap Force plot ?
Hello everyone,

I am trying to plot a force plot with all points in my data, but having difficulty in its interpretation and understanding below plot.
here the code line - shap.force_plot(explainer.expected_value, shap_values, X_test)

In my case - Demand value in my dependent variable and Adobe Visits is my independent variable.

What does the dropdown on x axis means?
what does the dropdown on Y axis means?
what is meant by effects on Y axis?
What it means when I select adobe visits effects in y axis dropdown?
what does average sample means which pops out when we hover over the region in graph?
what does the f(x) means on yaxis dropdown? does that means a function to all my independent variables ?
How to interpret Adobe visits effective in deriving demand from this graph?
How other variables interaction /cross effects is calculated when we change variables in drop down ( in second image attached)?

Can someone please help on this ?
Thanks.

thomaschateau · 2022-05-03T14:27:54Z

How to interpret below shap Force plot ? Hello everyone,

I am trying to plot a force plot with all points in my data, but having difficulty in its interpretation and understanding below plot. here the code line - shap.force_plot(explainer.expected_value, shap_values, X_test)

In my case - Demand value in my dependent variable and Adobe Visits is my independent variable.

What does the dropdown on x axis means? what does the dropdown on Y axis means? what is meant by effects on Y axis? What it means when I select adobe visits effects in y axis dropdown? what does average sample means which pops out when we hover over the region in graph? what does the f(x) means on yaxis dropdown? does that means a function to all my independent variables ? How to interpret Adobe visits effective in deriving demand from this graph? How other variables interaction /cross effects is calculated when we change variables in drop down ( in second image attached)?

Can someone please help on this ? Thanks.

Hello @prashanthin,

Did you find any answer ?

Regards

MarioIuliano87 · 2023-09-05T15:22:37Z

Hi, you mentioned this -2.92 is the raw output and it can be transformed into probability space. But how can I do such transformation? I encountered the same question. My prediction model is LightGBM and I am using shap.TreeExplainer(). Or maybe you could give me a hint where I can find the transform equation?

These values are log-odds due to this example being a (binary) classification task. You can easily revert them:
def logodds_to_prob(logit):
        odds = math.exp(logit)
        return odds / (1 + odds)

This answer was the game changer in my case. Thanks a lot!

thatlittleboy added the question label Jun 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to interpret the Shop force plot? #977

How to interpret the Shop force plot? #977

SSMK-wq commented Jan 2, 2020 •

edited

ibuda commented Jan 2, 2020

SSMK-wq commented Jan 2, 2020

ibuda commented Jan 2, 2020

vinrok commented Dec 21, 2020

ibuda commented Dec 21, 2020

vinrok commented Dec 21, 2020

vinrok commented Dec 21, 2020

ibuda commented Dec 21, 2020

vinrok commented Dec 21, 2020

ibuda commented Dec 21, 2020

herewego321 commented Mar 2, 2021

hjanh commented Jan 18, 2022 •

edited

prashanthin commented Mar 2, 2022

thomaschateau commented May 3, 2022

MarioIuliano87 commented Sep 5, 2023

How to interpret the Shop force plot? #977

How to interpret the Shop force plot? #977

Comments

SSMK-wq commented Jan 2, 2020 • edited

ibuda commented Jan 2, 2020

SSMK-wq commented Jan 2, 2020

ibuda commented Jan 2, 2020

vinrok commented Dec 21, 2020

ibuda commented Dec 21, 2020

vinrok commented Dec 21, 2020

vinrok commented Dec 21, 2020

ibuda commented Dec 21, 2020

vinrok commented Dec 21, 2020

ibuda commented Dec 21, 2020

herewego321 commented Mar 2, 2021

hjanh commented Jan 18, 2022 • edited

prashanthin commented Mar 2, 2022

thomaschateau commented May 3, 2022

MarioIuliano87 commented Sep 5, 2023

SSMK-wq commented Jan 2, 2020 •

edited

hjanh commented Jan 18, 2022 •

edited