Skip to content

ESoC 2026 Project Ideas

Tobias Pitters edited this page Apr 1, 2026 · 5 revisions

About SHAP

SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. SHAP powers millions of users and your contributions could help to significantly improve their experience.

Move performance critical functions from numba to C++

Currently we have 15 functions that use numba's njit functionality to compile those just-in-time. This works reasonably well but we have had problems with numba recently, namely on our MacOS CI jobs and the delay in supporting new python versions quickly. Therefore we want to move those functions to C++ via nanobind and make sure that we match the speed of numba to get rid of the numba/llvmlite dependency for SHAP. We need to test this properly, to make sure no memory errors appear. Furthermore, we have one Cython function, which should be moved to C++ as well.

The goal is to move all of the following functions to C++:

Optionally, we could move the existing code in _cext (C API with PyObject overhead) to nanobind as well to provide a more modern approach for C++ extensions.

Expected Time: 200-250 hours
Difficulty Rating: Hard
Required Skills: C++, Python
Potential Mentors: Tobias Pitters

Unify SHAP value calculation from Tree-based Models

SHAP currently supports lots of small tree libraries explicitly, see here. We previously had lots of problems with a couple of these libraries, effectively blocking our CI, see this issue for further details. There is even interest in adding more models (see here). Currently SHAP creates a TreeEnsemble from all those different models, which is then used to calculate the SHAP values efficiently. In order to not support more and more libraries, we plan to provide a TreeEnsemble.from_trees classmethod, with a well defined and communicated interface such that users or package maintainers are able to provide a way to generate the TreeEnsemble object and SHAP values from there. Therefore the initial plan is to prototype a TreeEnsemble.from_trees classmethod, that generates the object necessary for SHAP to explain the model. Once this works, we want to have an exhaustive test suite for all currently supported models with this new behaviour. We'll target scikit-learns tree structure, see here: https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html. Finally adding DeprecationWarnings to a list of models (to be defined) will round out the feature.

Goals:

  • prototype a TreeEnsemble.from_trees classmethod
  • thoroughly test the method against all currently supported tree models
  • build a notebook, showing how to calculate SHAP values from currently supported models using this new functionality
  • deprecate a list of currently supported models (smaller libraries like gpboost, ngboost, etc.)

Expected Time: 200 hours
Difficulty Rating: Medium
Required Skills: Python, familiarity with tree-based ML models and scikit-learn
Potential Mentors: Tobias Pitters