-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customised model #32
Comments
Hey Yan, thanks for this question! Yes, DiCE one-hot-encodes categorical features internally and we assume the model is also trained on one-hot-encoded data. In this case, though the optimization has to run on more columns, we didn't want to assume any particular order of values for categorical features. However, if your model is trained with label encoded data, you can simply present all features as continuous features to DiCE and it should work. For instance, for the adult data, you could do something like While outputting the CFs, DiCE outputs an integer (0 for Government, 2 for Private etc.) instead of strings ("Government" or "Private") for categorical features then. |
Thanks, that seems a very good idea! So if I treat workclass as a continuous feature, does that mean I should enter the feature like the following format or just give it an interval, i.e. 'workclass': [0, 3]? d = dice_ml.Data(features={ One more thing that I am concerned is that if we enter the workclass as a continuous feature, when we generate the counterfactual examples, is it possible to output 1.5 for workclass rather than just 0, 1, 2 ? And I tried to use a 3D dataset, it seems the library is not compatible with 3-dimentional input dataset and 3d model, e.g. Convolutional neural network. Look forward to your advice! Thank you very much. Yan |
You need not provide the range since it is inferred from the data. Just provide all feature names in a list to the The precision of a variable is also inferred from the data, so if a variable takes only 0, 1, and 2, then the resulting CF will take only one of these 3 values and nothing else. Try it out and let me know if you get something else. |
Yes, currently DiCE supports only 2D tabular data with rows as observations and columns as features. We have not tested on image and related datasets yet, but it will be an interesting experiment (both conceptually and empirically) to do and we are looking forward to it at some point in the future! |
Hi, I notice that the model you have trained is using the one hot encoded training data for categorical features, e.g. for workclass , you have three features to represent it -- workclass_Government, workclass_Other/Unknown, workclass_Private
However, we often train models just using one single feature to represent it with 0 (Government), 1(Other/Unknown), 2(Private). Is there a simple way that I can use my model directly when I am trying to generate the counterfactural example for a certain query instance or I have to convert all of the categorical features into one hot encored format when I am training the model in order to use " exp = dice_ml.Dice". ?
Thanks!
Yan
The text was updated successfully, but these errors were encountered: