Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between PD profile and CP profile when we manually specify the instances #444

Closed
Nehagupta90 opened this issue Jul 28, 2021 · 8 comments

Comments

@Nehagupta90
Copy link

Hi

If I have 100 observations/instances and want to create the PD profile of few important features, then what is the difference between the PD profile of these instances using

variab= c("var1","var2", "var3" )
pdp <- model_profile(explainer = explainer, variables = variab)
plot(pdp)

and between the CP profile when we manually specify the instances like

new_observation= data[c(2,7,8,11,15,16,21,24,25,26,30,45,46),]
cp<- ceteris_paribus(explainer, new_observation)
cp_agg <- aggregate_profiles(cp, variables = variab)
plot(cp_agg)

@hbaniecki
Copy link
Member

Hi, the diffrerence is that in the second case you create the PD profile based on only a subset of data rows (I see 13 observations), while in the first case you use N = 100 observations to estimate the PD profile.

@Nehagupta90
Copy link
Author

Nehagupta90 commented Jul 29, 2021 via email

@hbaniecki
Copy link
Member

It depends on the explanation that you want to achieve. In model_profile there are groups (group profiles/instances by a variable) and k (cluster profiles/instances) parameters that allow for customization. Actually, we discuss the context of creating explanations in the paper https://arxiv.org/abs/2105.13787, and specifically the context of profiles in https://arxiv.org/abs/2105.12837. Hope it helps

Kind regards

@Nehagupta90
Copy link
Author

Nehagupta90 commented Jul 30, 2021 via email

@hbaniecki
Copy link
Member

It is all in the documentation https://modeloriented.github.io/DALEX/reference/model_profile.html. Yes, you might want to group observations by a categorical variable.

@Nehagupta90
Copy link
Author

Nehagupta90 commented Jul 30, 2021 via email

@hbaniecki
Copy link
Member

Well, the colour is a group determined by a category. You could probably dichotomize a variable. Any specific use case can be customized/programmed, so can the groups in aggregate_profile

@Nehagupta90
Copy link
Author

Nehagupta90 commented Jul 30, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants