Ranking for Tabular Counterfactual Explanation Generators

This repository shows the benchmark of counterfactual generation algorithms in terms of (click for details):

Coverage

how many factuals are converted to counterfactuals?

Sparsity

how many features are unchanged?

L2 distance

how far are the counterfactuals from the factual data?

Mean Absolute Deviation

how different are the counterfactuals from the factual data considering feature variations?

Mahalanobis distance

how different are the counterfactuals from the factual data considering the data distribution?

Time

how long does it take to generate a counterfactual?

How to include your CF generation algorithm

Follow the instructions on the CounterfactualBenchmark repository

RESULTS

All experiments consider a confidence level of 95%.

Ranking Table

Click here to see why we use ranking instead of the metrics itself

Most metrics cannot be directly compared as each algorithm has a different coverage. For example, if one algorithm only creates a single counterfactual and has a sparsity of 90%, we cannot say it is better than another algorithm that creates 1 000 counterfactuals and with sparsity of 88%. Therefore, the ranking consider these cases, giving a better picture of the algorithms' performance.

The rankings below were created with Friedman's test to evaluate the null hypothesis that the algorithms are equal. And Nemenyi's test to evaluate the significance of the difference between the algorithms. The highlighted results are the ones that are statistically significant.

Ranking for all datasets

framework	alibi_nograd	alibi	cadex	cfnow_random	cfnow_greedy	dice	growingspheres	synas	lore	sedc	cfnow_random_simple	cfnow_greedy_simple	N
index
validity	7.55	7.56	6.10	🥇4.45	🥇4.45	6.24	8.42	7.43	8.39	8.54	🥇4.45	🥇4.45	3925
sparsity	7.65	7.82	8.55	4.20	🥇3.78	5.90	9.17	6.10	7.99	7.76	5.51	🥇3.58	3925
L2	6.68	6.89	6.81	🥇3.34	3.81	8.22	7.07	7.75	8.70	9.24	4.92	4.56	3925
MAD	7.39	7.63	7.63	3.42	🥇3.05	7.52	8.58	7.86	8.65	8.49	4.30	3.47	3925
MD	6.91	7.00	6.79	🥇3.56	🥇3.54	8.16	7.37	7.76	8.70	9.40	4.62	4.18	3925

Ranking for categorical datasets

framework	alibi_nograd	alibi	cadex	cfnow_random	cfnow_greedy	dice	growingspheres	synas	lore	sedc	cfnow_random_simple	cfnow_greedy_simple	N
index
validity	8.39	8.79	5.93	🥇4.19	🥇4.19	🥇4.19	10.19	7.26	6.27	10.19	🥇4.19	🥇4.19	1327
sparsity	8.20	8.74	8.19	🥇3.31	🥇3.37	5.97	10.19	6.84	5.91	10.19	🥇3.69	🥇3.38	1327
L2	8.20	8.74	8.19	🥇3.31	🥇3.37	5.97	10.19	6.84	5.91	10.19	🥇3.69	🥇3.38	1327
MAD	8.49	9.29	7.85	🥇3.10	🥇3.16	5.56	9.96	7.81	6.18	9.96	🥇3.46	🥇3.16	1327
MD	8.10	8.71	8.20	🥇3.43	🥇3.34	5.88	10.19	6.92	5.93	10.19	🥇3.77	🥇3.34	1327

Ranking for numerical datasets

framework	alibi_nograd	alibi	cadex	cfnow_random	cfnow_greedy	dice	growingspheres	synas	lore	sedc	cfnow_random_simple	cfnow_greedy_simple	N
index
validity	6.64	6.64	6.93	🥇5.09	🥇5.09	6.78	6.12	8.14	9.42	6.97	🥇5.09	🥇5.09	1598
sparsity	7.10	7.14	9.61	5.20	4.49	4.57	7.96	6.07	8.77	5.25	7.87	🥇3.97	1598
L2	4.83	4.84	5.54	4.28	4.72	9.75	🥇2.80	9.19	10.49	8.64	6.59	6.34	1598
MAD	6.02	6.06	8.41	🥇3.38	🥇3.44	8.17	6.83	8.02	10.15	7.03	6.02	4.47	1598
MD	5.32	5.34	5.70	4.02	4.21	9.68	🥇3.54	8.89	10.45	8.86	6.33	5.66	1598

Ranking for mixed datasets

framework	alibi_nograd	alibi	cadex	cfnow_random	cfnow_greedy	dice	growingspheres	synas	lore	sedc	cfnow_random_simple	cfnow_greedy_simple	N
index
validity	7.89	7.38	5.00	🥇3.75	🥇3.75	8.07	9.75	6.51	9.56	8.84	🥇3.75	🥇3.75	1000
sparsity	7.81	7.67	7.31	3.76	🥇3.19	7.93	9.75	5.16	9.50	8.55	4.15	🥇3.22	1000
L2	7.64	7.70	7.01	🥇1.89	2.95	8.75	9.75	6.67	9.56	8.95	3.88	3.26	1000
MAD	8.11	7.96	6.11	3.92	🥇2.27	9.06	9.54	7.67	9.53	8.88	🥇2.67	🥇2.27	1000
MD	7.86	7.39	6.68	🥇3.01	🥇2.75	8.74	9.75	7.07	9.58	9.23	🥇3.02	🥇2.93	1000

Coverage analysis

The results below consider valid counterfactuals. In other words, counterfactuals that: (1) have a different prediction class if compared to the factual and (2) respects binary and one-hot encoding rules.

Coverage (%) for all datasets

Coverage (%) for categorical datasets

Coverage (%) for numerical continuous datasets

Coverage (%) for mixed datasets

Time Analysis

Time spent (in seconds) to generate a counterfactual explanation

Generation time (seconds) for all datasets

Generation time (seconds) for categorical datasets

Generation time (seconds) for numerical continuous datasets

Generation time (seconds) for mixed datasets

Reference

If you used this package on your experiments, here's the reference paper:

@Article{app11167274,
AUTHOR = {de Oliveira, Raphael Mazzine Barbosa and Martens, David},
TITLE = {A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data},
JOURNAL = {Applied Sciences},
VOLUME = {11},
YEAR = {2021},
NUMBER = {16},
ARTICLE-NUMBER = {7274},
URL = {https://www.mdpi.com/2076-3417/11/16/7274},
ISSN = {2076-3417},
DOI = {10.3390/app11167274}
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
charts		charts
main		main
results		results
tables		tables
.gitignore		.gitignore
README.md		README.md
README_template.md		README_template.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ranking for Tabular Counterfactual Explanation Generators

How to include your CF generation algorithm

RESULTS

Ranking Table

Ranking for all datasets

Ranking for categorical datasets

Ranking for numerical datasets

Ranking for mixed datasets

Coverage analysis

Coverage (%) for all datasets

Coverage (%) for categorical datasets

Coverage (%) for numerical continuous datasets

Coverage (%) for mixed datasets

Time Analysis

Generation time (seconds) for all datasets

Generation time (seconds) for categorical datasets

Generation time (seconds) for numerical continuous datasets

Generation time (seconds) for mixed datasets

Reference

About

Releases

Packages

Languages

rmazzine/Ranking-Tabular-CF

Folders and files

Latest commit

History

Repository files navigation

Ranking for Tabular Counterfactual Explanation Generators

How to include your CF generation algorithm

RESULTS

Ranking Table

Ranking for all datasets

Ranking for categorical datasets

Ranking for numerical datasets

Ranking for mixed datasets

Coverage analysis

Coverage (%) for all datasets

Coverage (%) for categorical datasets

Coverage (%) for numerical continuous datasets

Coverage (%) for mixed datasets

Time Analysis

Generation time (seconds) for all datasets

Generation time (seconds) for categorical datasets

Generation time (seconds) for numerical continuous datasets

Generation time (seconds) for mixed datasets

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages