Customised model #32

Yanjiayork · 2020-08-13T21:11:30Z

Hi, I notice that the model you have trained is using the one hot encoded training data for categorical features, e.g. for workclass , you have three features to represent it -- workclass_Government, workclass_Other/Unknown, workclass_Private

However, we often train models just using one single feature to represent it with 0 (Government), 1(Other/Unknown), 2(Private). Is there a simple way that I can use my model directly when I am trying to generate the counterfactural example for a certain query instance or I have to convert all of the categorical features into one hot encored format when I am training the model in order to use " exp = dice_ml.Dice". ?

Thanks!

Yan

raam93 · 2020-08-15T06:21:33Z

Hey Yan, thanks for this question! Yes, DiCE one-hot-encodes categorical features internally and we assume the model is also trained on one-hot-encoded data. In this case, though the optimization has to run on more columns, we didn't want to assume any particular order of values for categorical features.

However, if your model is trained with label encoded data, you can simply present all features as continuous features to DiCE and it should work. For instance, for the adult data, you could do something like
d = dice_ml.Data(dataframe=dataset, continuous_features=list(dataset.columns), outcome_name='income')

While outputting the CFs, DiCE outputs an integer (0 for Government, 2 for Private etc.) instead of strings ("Government" or "Private") for categorical features then.

Yanjiayork · 2020-08-16T12:30:06Z

Thanks, that seems a very good idea! So if I treat workclass as a continuous feature, does that mean I should enter the feature like the following format or just give it an interval, i.e. 'workclass': [0, 3]?

d = dice_ml.Data(features={
'age':[17, 90],
'workclass': [0, 1, 2, 3],
'education': [0, 1, 2, 3, 4, 5, 6, 7],
'marital_status': [0, 1, 2, 3, 4],
'occupation':[0, 1, 2, 3, 4, 5],
'race': [0, 1],
'gender':[0, 1],
'hours_per_week': [1, 99]},
outcome_name='income')

One more thing that I am concerned is that if we enter the workclass as a continuous feature, when we generate the counterfactual examples, is it possible to output 1.5 for workclass rather than just 0, 1, 2 ?

And I tried to use a 3D dataset, it seems the library is not compatible with 3-dimentional input dataset and 3d model, e.g. Convolutional neural network.

Look forward to your advice! Thank you very much.

Yan

raam93 · 2020-08-17T11:29:13Z

You need not provide the range since it is inferred from the data. Just provide all feature names in a list to the continuous_features parameter like this: d = dice_ml.Data(dataframe=dataset, continuous_features=list(dataset.columns), outcome_name='income'). Make sure that the categorical features are of type int or float in your case .ie., for instance, 'workclass' takes integer values 0,1,2, and 3.

The precision of a variable is also inferred from the data, so if a variable takes only 0, 1, and 2, then the resulting CF will take only one of these 3 values and nothing else. Try it out and let me know if you get something else.

raam93 · 2020-08-17T11:31:48Z

Yes, currently DiCE supports only 2D tabular data with rows as observations and columns as features. We have not tested on image and related datasets yet, but it will be an interesting experiment (both conceptually and empirically) to do and we are looking forward to it at some point in the future!

raam93 closed this as completed Aug 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customised model #32

Customised model #32

Yanjiayork commented Aug 13, 2020

raam93 commented Aug 15, 2020

Yanjiayork commented Aug 16, 2020 •

edited

Loading

raam93 commented Aug 17, 2020

raam93 commented Aug 17, 2020 •

edited

Loading

Customised model #32

Customised model #32

Comments

Yanjiayork commented Aug 13, 2020

raam93 commented Aug 15, 2020

Yanjiayork commented Aug 16, 2020 • edited Loading

raam93 commented Aug 17, 2020

raam93 commented Aug 17, 2020 • edited Loading

Yanjiayork commented Aug 16, 2020 •

edited

Loading

raam93 commented Aug 17, 2020 •

edited

Loading