Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customised model #32

Closed
Yanjiayork opened this issue Aug 13, 2020 · 4 comments
Closed

Customised model #32

Yanjiayork opened this issue Aug 13, 2020 · 4 comments

Comments

@Yanjiayork
Copy link

Hi, I notice that the model you have trained is using the one hot encoded training data for categorical features, e.g. for workclass , you have three features to represent it -- workclass_Government, workclass_Other/Unknown, workclass_Private

However, we often train models just using one single feature to represent it with 0 (Government), 1(Other/Unknown), 2(Private). Is there a simple way that I can use my model directly when I am trying to generate the counterfactural example for a certain query instance or I have to convert all of the categorical features into one hot encored format when I am training the model in order to use " exp = dice_ml.Dice". ?

Thanks!

Yan

@raam93
Copy link
Collaborator

raam93 commented Aug 15, 2020

Hey Yan, thanks for this question! Yes, DiCE one-hot-encodes categorical features internally and we assume the model is also trained on one-hot-encoded data. In this case, though the optimization has to run on more columns, we didn't want to assume any particular order of values for categorical features.

However, if your model is trained with label encoded data, you can simply present all features as continuous features to DiCE and it should work. For instance, for the adult data, you could do something like
d = dice_ml.Data(dataframe=dataset, continuous_features=list(dataset.columns), outcome_name='income')

While outputting the CFs, DiCE outputs an integer (0 for Government, 2 for Private etc.) instead of strings ("Government" or "Private") for categorical features then.

@Yanjiayork
Copy link
Author

Yanjiayork commented Aug 16, 2020

Thanks, that seems a very good idea! So if I treat workclass as a continuous feature, does that mean I should enter the feature like the following format or just give it an interval, i.e. 'workclass': [0, 3]?

d = dice_ml.Data(features={
'age':[17, 90],
'workclass': [0, 1, 2, 3],
'education': [0, 1, 2, 3, 4, 5, 6, 7],
'marital_status': [0, 1, 2, 3, 4],
'occupation':[0, 1, 2, 3, 4, 5],
'race': [0, 1],
'gender':[0, 1],
'hours_per_week': [1, 99]},
outcome_name='income')

One more thing that I am concerned is that if we enter the workclass as a continuous feature, when we generate the counterfactual examples, is it possible to output 1.5 for workclass rather than just 0, 1, 2 ?

And I tried to use a 3D dataset, it seems the library is not compatible with 3-dimentional input dataset and 3d model, e.g. Convolutional neural network.

Look forward to your advice! Thank you very much.

Yan

@raam93
Copy link
Collaborator

raam93 commented Aug 17, 2020

You need not provide the range since it is inferred from the data. Just provide all feature names in a list to the continuous_features parameter like this: d = dice_ml.Data(dataframe=dataset, continuous_features=list(dataset.columns), outcome_name='income'). Make sure that the categorical features are of type int or float in your case .ie., for instance, 'workclass' takes integer values 0,1,2, and 3.

The precision of a variable is also inferred from the data, so if a variable takes only 0, 1, and 2, then the resulting CF will take only one of these 3 values and nothing else. Try it out and let me know if you get something else.

@raam93
Copy link
Collaborator

raam93 commented Aug 17, 2020

Yes, currently DiCE supports only 2D tabular data with rows as observations and columns as features. We have not tested on image and related datasets yet, but it will be an interesting experiment (both conceptually and empirically) to do and we are looking forward to it at some point in the future!

@raam93 raam93 closed this as completed Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants