Skip to content


Repository files navigation


Benchmark proposed in the paper "Interpretability in Symbolic Regression: a benchmark of Explanatory Methods using the Feynman data set", submited to Genetic Programming and Evolvable Machines.

Paper abstract

In some situations, the interpretability of the machine learning models play a role as important as the model accuracy. This comes from the need of trusting the prediction model, verifying some of their properties or even enforcing such properties to improve fairness. To satisfy this need, many model-agnostic explainers were proposed with the goal of working with black-box. Most of these works are focused on classification models, even though an adaptation to regression models is usually straightforward. Regression task can be explored with techniques considered white-boxes (e.g., linear regression) or gray boxes (e.g., symbolic regression), which can deliver interpretable results. The use of explanation methods in the context of regression - and, in particular, symbolic regression - is studied in this paper, coupled with different explanation methods from the literature. Experiments were performed using 100 physics equations set together with different interpretable and non-interpretable regression methods and popular explanation methods, wrapped in a module and tested through an intensive benchmark. We adapted explanation quality metrics to inspect the performance of explainers for the regression task. The results showed that, for this specific problem domain, the Symbolic Regression models outperformed all the regression models in every quality measure. Among the tested methods, Partial Effects and SHAP presented more stable results while Integrated Gradients was unstable with tree-based models. As a byproduct of this work, we released a Python library for benchmarking explanation methods with regression models. This library will be maintened by expanding it with more explainers and regressors.


First, you need to clone the repository.

Inside the root, execute the following commands (on linux).

# make sure you have this
sudo apt-get install build-essential

# Creating a conda environment
conda env create -f environment.yml
conda activate iirs-env

# Installing Operon first, it is the only dependence that is not on pypi

# Use gcc-9 (or later)
export CC=gcc-9
export CXX=gcc-9

# clone operon
cd iirsBenchmark/regressors
git clone
cd operon

# run cmake with options
mkdir build; cd build;

# build
make VERBOSE=1 -j pyoperon

# install python package
make install

# (going back to root and) Executing the local installation 
cd ../../../..

Implemented regressors

The following regressors are available in iirsBenchmark:

Regressor Class name Type Original implementation
XGB XGB_regressor Tree boosting Scikit-learn XGB
RF RF_regressor Tree bagging Scikit-learn RF
MLP MLP_regressor Neural network Scikit-learn MLP
SVM SVM_regressor Vector machine Scikit-learn SVM
k-NN KNN_regressor Instance method Scikit-learn KNN
SR with coefficient optmization Operon_regressor Symbolic method Operon framework
SR with IT representation ITEA_regressor Symbolic method ITEA
Linear regression Linear_regressor regression analysis Scikit-learn Linear regression
LASSO regression Lasso_regressor regression analysis Scikit-learn Lasso
Single decision tree DecisionTree_regressor Decision tree Scikit-learn Decision tree

The nomenclature used was <name of the regressor in Pascalcase>_regressor.

All regressors implemented provides a constructor with default values for all parameters, a fit and a predict method. If you are familiar with scikit, their usage should be straight-forward.

from iirsBenchmark.regressors import ITEA_regressor, Linear_regressor

from sklearn import datasets

housing_data = datasets.fetch_california_housing()
X, y = housing_data['data'], housing_data['target']

linear = Linear_regressor().fit(X, y)

# if you want to specify a parameter, it should be made by named arguments.
# there is few exceptions in iirsBenchmark where arguments are positional.
itea = ITEA_regressor(popsize=75).fit(X, y)

print(itea.stochastic_executions)   # True
print(linear.stochastic_executions) # False

print(itea.to_str())   # will print a symbolic equation
print(linear.to_str()) # will print a linear regression equation

The regressors are used just like any scikit-learn regressor, but our implementations extends those classes by adding a few more attributes and methods in the interpretability context:

  • stochastic_executions: attribute indicating if the regressor have a stochastic behavior;
  • interpretability_spectrum: attribute with a string indicating if the regressor is considered a white-box, gray-box or black-box;
  • grid_params: attribute with a dictionary where each key is one parameter of the regressor and the values are lists of possible values considered in the experiments;
  • to_str(): method that returns a string representation of a fitted regressor (if applicable).

Implemented explainers

Several feature attribution explanatory methods were unified in this package. The available methods are displayed below:

Explainer Class name Agnostic Local Global Original implementation
Random explainer RandomImportance_explainer Y Y Y Our implementation
Permutation Importance PermutationImportance_explainer Y N Y scikit.inspection
Morris Sensitivity MorrisSensitivity_explainer Y N Y interpretml
SHapley Additive exPlanations (SHAP) SHAP_explainer Y Y Y shap
Shapley Additive Global importancE (SAGE) SAGE_explainer Y N Y sage
Local Interpretable Model-agnostic Explanations (LIME) LIME_explainer Y Y N lime
Integrated Gradients IntegratedGradients_explainer Y Y N Our implementation
Partial Effects (PE) PartialEffects_explainer N Y Y Our implementation
Explain by Local Approximation (ELA) ELA_explainer Y Y N Our implementation

The naming convention is the same as the regressors, but <name of the explainer in Pascalcase>_explainer.

To explain a fitted regressor (not only the ones provided in this benchmark, but any regressor that implements a predict method), you need to instanciate the explainer, fit it to the same training data used to train the regressor, and then use the methods explain_local and explain_global to obtain feature importance explanations. If the model is not agnostic, fit will raise an exception; and if it does not support local/global explanations, it will also raise an exception when the explain functions are called.

from iirsBenchmark.explainers import SHAP_explainer, PermutationImportance_explainer

# you must pass the regressor as a named argument for every explainer constructor
shap = SHAP_explainer(predictor=itea).fit(X, y)

# Local explanation takes a matrix where each line is an observation, and
# returns a matrix where each line is the feature importance for the respective input.
# Single observations should be reshaped into a 2D array with x.reshape(1, -1).
local_exps  = shap.explain_local(X[5:10, :])
local_exp   = shap.explain_local(X[3].reshape(1, -1))

# Global explanation take more than one sample (ideally the whole train/test data)
# and returns a single global feature importance for each variable.
pe = PermutationImportance_explainer(predictor=itea).fit(X, y)

global_exp = pe.explain_global(X, y)


Feynman regressors

As mentioned, this benchmark uses the Feynman equations compiled and provided by [ref].

The feynman equations can be used just as any regressor in the module, but takes as a required argument the data set name which the regressor should refer. Then, the created instance can be used to predict new values, using the physics equations related to the informed data set.

A table of all equations can be found here, where the Filename is the column with possible data set names argument.

Explanation robustness measures

We strongly advise to read the Section 3.1 of our paper to fully understand how this measures work, and also check their implementation in iirsBenchmark.expl_measures. Although we did not propose any of this measures, we have adaptated them when implementing in iirsBenchmark.

Three different explanation measures were implemented:


The intuition is to measure the degree in which the local explanation changes for a given point compared to its neighbors.


The idea of infidelity is to measure the difference between two terms:

  • The dot product between a significant perturbation to a given input $X$ we are trying to explain and its explanation, and
  • The output observed for the perturbed point.

Jaccard Index

The Jaccard Index measures how similar two sets are, by calculating the ratio of its intersection size by its union size.


To use this measures you need a fitted regressor and explainer, and them only work for local explanations:

from iirsBenchmark import expl_measures

# you need to provide a neighborhood to the observation being evaluated
# with those measures

obs_to_explain = X[3].reshape(1, -1)

neighbors = expl_measures.neighborhood(
    obs_to_explain, # The observation 
    X,              # Training data to calculate the multivariate normal distribution
    factor=0.001,   # spread of the neighbors
    size=30         # number of neighbors to sample

    shap,           # the explainer we want to evaluate
    obs_to_explain, # the observation to explain
    neighbors       # sampled neighbors to evaluate the metric


The package implements everything we need to create experiments to evaluate interpretability quality and robustness in the regression context.

The experiments used in the paper are in ./experiments.


Feel free to contact the developers with suggestions, critics, or questions. You can always raise an issue on GitHub!


This package was built upon contributions of many researchers of the XAI field, as well as the scikit-learn and Operon framework for creating and fitting a regressor.

We would like to recognize the importance of their work. To get to know each depencence better, we suggest that you read the original works mentioned below.





The development of this research is still active. We plan to extend our study by including more symbolic regression methods. As for the github, we plan to build a documentation page and provide maintence for this repository.


Repository presented in the paper "Interpretability in Symbolic Regression: a benchmark of Explanatory Methods using the Feynman data set"







No releases published


No packages published