Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaced Boston dataset with California in Catboost tutorial #3214

Merged

Conversation

owenlamont
Copy link
Contributor

Overview

Description of the changes proposed in this pull request:

This is another step towards completing issue 2322.

Checklist

  • All pre-commit checks pass.
  • Unit tests added (if fixing a bug or adding a new feature)

@thatlittleboy
Copy link
Collaborator

Thanks @owenlamont for the contribution.

It's a little hard to review notebooks on GH, I'll use images to get my point across:

image

  • red box: Typo, should be AveRooms.
  • pink box: The coloring is by MedInc, not RAD.

image

  • there isn't an LSTAT feature in California, you can change it to MedInc.

Copy link
Collaborator

@thatlittleboy thatlittleboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above.

@owenlamont
Copy link
Contributor Author

Cheers, yeah I should've been aware the features would be different, I'll get onto those updates.

@owenlamont
Copy link
Contributor Author

I've tweaked the text, I'm struggling a little to give a reasonable interpretation of the dependence plot description, would welcome feedback on that (maybe there's a better feature I should have chosen for the dependence plot other than AveRooms that would have given an easier to interpret plot).

image

@thatlittleboy
Copy link
Collaborator

@owenlamont I went round all the variables and didn't really find any variables' dependence plot to be particularly illustrative / easy to interpret.

What do you think about this instead? It's the best I could come up with.

image

I'm seeing that:

  • at lower MedInc (let's say <=4), the HouseAge doesn't have a strong impact on the house prices (or at least, the dependence relationship isn't that clear tome)
  • at higher MedInc, older houses tend to have higher prices (if we squint a little, we can generally see that the red dots are higher than the blue dots, for the same MedInc).

@owenlamont
Copy link
Contributor Author

owenlamont commented Sep 9, 2023

Thanks for the feedback @thatlittleboy - I've used your example and tweaked your phrasing a little for the description. I hope that looks okay:

image

I also noticed the old Boston House Price example is used on the main README too... guess at some point that should be changed. I think we might want an entirely different dataset we can tell a nice story with using SHAP for that though.

@dsgibbons
Copy link
Collaborator

I also noticed the old Boston House Price example is used on the main README too... guess at some point that should be changed. I think we might want an entirely different dataset we can tell a nice story with using SHAP for that though.

@owenlamont this has been fixed by #3200. Happy for another dataset to be used if it tells a more compelling story.

@owenlamont
Copy link
Contributor Author

Yeah... I'm sure there must be a better dataset for this example. I'm afraid I'm a bit time poor and have some other priorities so I don't think I'll go looking for it myself but happy to try and keep contributing in small ways.

Copy link
Collaborator

@thatlittleboy thatlittleboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @owenlamont for the contribution, Sorry it took so long to get back to you. I took the liberty to clean up a couple more things, like using the new plots api (e.g. shap.plots.force instead of shap.force_plot) where possible.

clockwork-thumbsup

@thatlittleboy thatlittleboy added the documentation Relating to readthedocs, notebooks, and exposition in docstrings label Oct 1, 2023
@thatlittleboy thatlittleboy added this to the 0.43.0 milestone Oct 1, 2023
@thatlittleboy thatlittleboy merged commit 3f60bf0 into shap:master Oct 1, 2023
2 checks passed
@owenlamont
Copy link
Contributor Author

Thanks for finishing it off @thatlittleboy !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Relating to readthedocs, notebooks, and exposition in docstrings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants