Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiCE with option "kd-tree" always generates the same counterfactuals for categorical data #303

Closed
lange-martin opened this issue Jun 1, 2022 · 0 comments · Fixed by #304

Comments

@lange-martin
Copy link
Contributor

If one uses the option kd-tree and tries to generate counterfactuals for a dataset with purely categorical columns, DiCE will always show the same set of counterfactuals. The screenshot shows an excerpt of the DiCE_model_agnostic_CFs notebook. You can see that the indices of the counterfactuals are the same even though the two query instances are quite different. The only thing I changed about the notebook was removing the two numerical columns age and hours_per_week from the dataset.

image

I believe the issue originates from these lines:

query_instance_df_dummies = pd.get_dummies(query_instance_orig)
for col in pd.get_dummies(data_df_copy[self.data_interface.feature_names]).columns:
if col not in query_instance_df_dummies.columns:
query_instance_df_dummies[col] = 0

This generates a one-hot-encoded version of the query instance. However, the order of the columns does not match the order of the columns for data in the KD-tree. Sklearn treats the data for the KD-tree as an array, not as a dataframe. Therefore, the switched order goes unnoticed when entering the query instance into the KD-tree here:

KD_tree_output = self.KD_tree.query(KD_query_instance, num_queries)

But sklearn only sees the array format of the dataframe which is always the same, since the one-hot-encoded columns with a 1 are set first in the KD_query_instance. I will probably add a pull request to this issue soon that should fix the problem.

lange-martin added a commit to lange-martin/DiCE that referenced this issue Jun 1, 2022
lange-martin added a commit to lange-martin/DiCE that referenced this issue Jun 3, 2022
Signed-off-by: Martin Lange <ml_ks@web.de>
amit-sharma pushed a commit that referenced this issue Jun 27, 2022
Signed-off-by: Martin Lange <ml_ks@web.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

1 participant