Benchmark proposed in the paper "Interpretability in Symbolic Regression: a benchmark of Explanatory Methods using the Feynman data set", submited to Genetic Programming and Evolvable Machines.
In some situations, the interpretability of the machine learning models play a role as important as the model accuracy. This comes from the need of trusting the prediction model, verifying some of their properties or even enforcing such properties to improve fairness. To satisfy this need, many model-agnostic explainers were proposed with the goal of working with black-box. Most of these works are focused on classification models, even though an adaptation to regression models is usually straightforward. Regression task can be explored with techniques considered white-boxes (e.g., linear regression) or gray boxes (e.g., symbolic regression), which can deliver interpretable results. The use of explanation methods in the context of regression - and, in particular, symbolic regression - is studied in this paper, coupled with different explanation methods from the literature. Experiments were performed using 100 physics equations set together with different interpretable and non-interpretable regression methods and popular explanation methods, wrapped in a module and tested through an intensive benchmark. We adapted explanation quality metrics to inspect the performance of explainers for the regression task. The results showed that, for this specific problem domain, the Symbolic Regression models outperformed all the regression models in every quality measure. Among the tested methods, Partial Effects and SHAP presented more stable results while Integrated Gradients was unstable with tree-based models. As a byproduct of this work, we released a Python library for benchmarking explanation methods with regression models. This library will be maintened by expanding it with more explainers and regressors.
First, you need to clone the repository.
Inside the root, execute the following commands (on linux).
# make sure you have this sudo apt-get install build-essential # Creating a conda environment conda env create -f environment.yml conda activate iirs-env # Installing Operon first, it is the only dependence that is not on pypi # Use gcc-9 (or later) export CC=gcc-9 export CXX=gcc-9 # clone operon cd iirsBenchmark/regressors git clone https://github.com/heal-research/operon cd operon # run cmake with options mkdir build; cd build; cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_PYBIND=ON -DUSE_OPENLIBM=ON -DUSE_SINGLE_PRECISION=ON -DCERES_TINY_SOLVER=ON # build make VERBOSE=1 -j pyoperon # install python package make install # (going back to root and) Executing the local installation cd ../../../.. make
The following regressors are available in iirsBenchmark:
|Regressor||Class name||Type||Original implementation|
|XGB||XGB_regressor||Tree boosting||Scikit-learn XGB|
|RF||RF_regressor||Tree bagging||Scikit-learn RF|
|MLP||MLP_regressor||Neural network||Scikit-learn MLP|
|SVM||SVM_regressor||Vector machine||Scikit-learn SVM|
|k-NN||KNN_regressor||Instance method||Scikit-learn KNN|
|SR with coefficient optmization||Operon_regressor||Symbolic method||Operon framework|
|SR with IT representation||ITEA_regressor||Symbolic method||ITEA|
|Linear regression||Linear_regressor||regression analysis||Scikit-learn Linear regression|
|LASSO regression||Lasso_regressor||regression analysis||Scikit-learn Lasso|
|Single decision tree||DecisionTree_regressor||Decision tree||Scikit-learn Decision tree|
The nomenclature used was
<name of the regressor in Pascalcase>_regressor.
All regressors implemented provides a constructor with default values for all parameters, a fit and a predict method. If you are familiar with scikit, their usage should be straight-forward.
from iirsBenchmark.regressors import ITEA_regressor, Linear_regressor from sklearn import datasets housing_data = datasets.fetch_california_housing() X, y = housing_data['data'], housing_data['target'] linear = Linear_regressor().fit(X, y) # if you want to specify a parameter, it should be made by named arguments. # there is few exceptions in iirsBenchmark where arguments are positional. itea = ITEA_regressor(popsize=75).fit(X, y) print(itea.stochastic_executions) # True print(linear.stochastic_executions) # False print(itea.to_str()) # will print a symbolic equation print(linear.to_str()) # will print a linear regression equation
The regressors are used just like any scikit-learn regressor, but our implementations extends those classes by adding a few more attributes and methods in the interpretability context:
stochastic_executions: attribute indicating if the regressor have a stochastic behavior;
interpretability_spectrum: attribute with a string indicating if the regressor is considered a white-box, gray-box or black-box;
grid_params: attribute with a dictionary where each key is one parameter of the regressor and the values are lists of possible values considered in the experiments;
to_str(): method that returns a string representation of a fitted regressor (if applicable).
Several feature attribution explanatory methods were unified in this package. The available methods are displayed below:
|Explainer||Class name||Agnostic||Local||Global||Original implementation|
|Random explainer||RandomImportance_explainer||Y||Y||Y||Our implementation|
|SHapley Additive exPlanations (SHAP)||SHAP_explainer||Y||Y||Y||shap|
|Shapley Additive Global importancE (SAGE)||SAGE_explainer||Y||N||Y||sage|
|Local Interpretable Model-agnostic Explanations (LIME)||LIME_explainer||Y||Y||N||lime|
|Integrated Gradients||IntegratedGradients_explainer||Y||Y||N||Our implementation|
|Partial Effects (PE)||PartialEffects_explainer||N||Y||Y||Our implementation|
|Explain by Local Approximation (ELA)||ELA_explainer||Y||Y||N||Our implementation|
The naming convention is the same as the regressors, but
<name of the explainer in Pascalcase>_explainer.
To explain a fitted regressor (not only the ones provided in this benchmark, but any regressor that implements a predict method), you need to instanciate the explainer, fit it to the same training data used to train the regressor, and then use the methods
explain_global to obtain feature importance explanations. If the model is not agnostic, fit will raise an exception; and if it does not support local/global explanations, it will also raise an exception when the explain functions are called.
# you must pass the regressor as a named argument for every explainer constructor shap = SHAP_explainer(predictor=itea).fit(X, y) # Local explanation takes a matrix where each line is an observation, and # returns a matrix where each line is the feature importance for the respective input. # Single observations should be reshaped into a 2D array with x.reshape(1, -1). local_exps = shap.explain_local(X[5:10, :]) local_exp = shap.explain_local(X.reshape(1, -1)) # Global explanation take more than one sample (ideally the whole train/test data) # and returns a single global feature importance for each variable. global_exp = shap.explain_global(X)
As mentioned, this benchmark uses the Feynman equations compiled and provided by [ref].
The feynman equations can be used just as any regressor in the module, but takes as a required argument the data set name which the regressor should refer. Then, the created instance can be used to predict new values, using the physics equations related to the informed data set.
A table of all equations can be found here, where the Filename is the column with possible data set names argument.
Explanation robustness measures
We strongly advise to read the Section 3.1 of our paper to fully understand how this measures work, and also check their implementation in
iirsBenchmark.expl_measures. Although we did not propose any of this measures, we have adaptated them when implementing in iirsBenchmark.
Three different explanation measures were implemented:
The intuition is to measure the degree in which the local explanation changes for a given point compared to its neighbors.
The idea of infidelity is to measure the difference between two terms:
- The dot product between a significant perturbation to a given input
$X$we are trying to explain and its explanation, and
- The output observed for the perturbed point.
The Jaccard Index measures how similar two sets are, by calculating the ratio of its intersection size by its union size.
To use this measures you need a fitted regressor and explainer, and them only work for local explanations:
from iirsBenchmark import expl_measures # you need to provide a neighborhood to the observation being evaluated # with those measures obs_to_explain = X.reshape(1, -1) neighbors = expl_measures.neighborhood( obs_to_explain, # The observation X, # Training data to calculate the multivariate normal distribution factor=0.001, # spread of the neighbors size=30 # number of neighbors to sample ) expl_measures.stability( shap.explain_local, # the explainer we want to evaluate obs_to_explain, # the observation to explain neighbors # sampled neighbors to evaluate the metric )
The package implements everything we need to create experiments to evaluate interpretability quality and robustness in the regression context.
The experiments used in the paper are in
Feel free to contact the developers with suggestions, critics, or questions. You can always raise an issue on GitHub!
This package was built upon contributions of many researchers of the XAI field, as well as the scikit-learn and Operon framework for creating and fitting a regressor.
We would like to recognize the importance of their work. To get to know each depencence better, we suggest that you read the original works mentioned below.
- [SHAP] A Unified Approach to Interpreting Model Predictions;
- [SAGE] Understanding Global Feature Contributions With Additive Importance Measures;
- [LIME] "Why Should I Trust You?": Explaining the Predictions of Any Classifier;
- [Permutation Importance] Random Forests;
- [Morris Sensitivity] Factorial Sampling Plans for Preliminary Computational Experiments;
- [Integrated Gradients] Axiomatic Attribution for Deep Networks;
- [ELA] Explaining Symbolic Regression Predictions;
- [Partial Effects (for symbolic regression)] Measuring feature importance of symbolic regression models using partial effects;
- [Scikit-learn module] Scikit-learn: Machine Learning in Python;
- [ITEA] Interaction–Transformation Evolutionary Algorithm for Symbolic Regression;
- [Operon] Operon C++: an efficient genetic programming framework for symbolic regression;
- [Stability] Regularizing Black-box Models for Improved Interpretability;
- [Infidelity] On the (In)fidelity and Sensitivity of Explanations;
- [Jaccard Index] S-LIME: Stabilized-LIME for Model Explanation.
The development of this research is still active. We plan to extend our study by including more symbolic regression methods. As for the github, we plan to build a documentation page and provide maintence for this repository.