Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying Shapley Value Analysis: Visualization Plots and Number of Features to Consider #7

Open
HellenNamulinda opened this issue Mar 13, 2024 · 1 comment
Assignees

Comments

@HellenNamulinda
Copy link
Collaborator

We are considering Shapley Value Analysis for the interpretability of our models. One crucial aspect is determining the optimal visualization plots for Shapley values and deciding the number of features to analyze.

Besides the waterfall plot, summary plot, and beeswarm plot(shared: 902c5a9) , exploring additional visualization plots such as force plots and heatmaps may provide further insights into the models.

Also, while we initially started with 22 features, which can be easier to analyse all. It will become more challenging to analyze all as more features are integrated. Considering we just analyze top 10/15?

@miquelduranfrigola, your input and expertise would be greatly appreciated.

@miquelduranfrigola
Copy link
Member

Thanks @HellenNamulinda.

I am quite happy with the default visualization capabilities of the SHAP library. However, I've done some extra work in this direction (for example, using tiles for a 2D visualization of Shapley values) that we can recycle, but this is just "cosmetics". I think the by-default Shapley plots are already quite informative. Another question would be whether we can find ways to map this information onto chemicals, this may potentially be useful. Let's discuss in the meeting.

As for the increasing number of features, I completely agree. We need to limit the number of features for interpretation. In my opinion, feature selection to restrict to, say, up to 100 features for Shapley analysis would make sense. I've never tried this package, but it looks good: https://github.com/AutoViML/featurewiz In any case, as always, let's first start with a good-old k-best feature selector (e.g. 100-best) and then we take it from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants