Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor plot_dependence and implement fast version of pdp #85

Merged
merged 1 commit into from
Jun 21, 2023

Conversation

aloctavodia
Copy link
Member

This implements a faster version of the partial dependence plots (~15x faster). I also took the opportunity to clean the code in utils, plot_dependence is deprecated in favor of two separated functions plot_pdp and plot_ice.

pymc_bart/tree.py Outdated Show resolved Hide resolved
pymc_bart/tree.py Outdated Show resolved Hide resolved
@juanitorduz
Copy link
Contributor

I just added some comments regarding the mypy errors. Let me know if you want support on fixing the other ones :)

@aloctavodia
Copy link
Member Author

Thanks!!! I will try to fix the mypy errors this week. If you think you can do it go ahead. Or if you have more tips to share they are going to be more than welcome.

@juanitorduz
Copy link
Contributor

juanitorduz commented Jun 19, 2023

Ok! Here is an initial feedback from a user perspective. I tried the changes in https://www.pymc.io/projects/examples/en/latest/case_studies/BART_introduction.html

For the constant response, everything looks nice:

image

The ICE plots also work but take much longer (3 min in my Mac intel), but maybe as expected ...

image

For the linear response, things look weird:

image

I tried adding more threes to see if it would capture more complexity, but it did not work, although the results look a bit better as the ranges are more reasonable.

image

So I think the plots are working as expected. The linear response issue might be related to the model predictions in the linear response case, right?

@aloctavodia
Copy link
Member Author

aloctavodia commented Jun 19, 2023

Right, ICE are expected to be much slower. They requiere a lot of predictions. I think the only way to accelerate it is to improve the inner working of the tree methods. And probably that requiere rewriting them in cython, RUST or something similar.

There is something funny going on with the linear response predictions. Still not sure what exactly. If we take the mean of the values stored at the leaf nodes, the plots looks reasonable. So yes, there is something with the predictions that we need to fix

p_d += weight * node.value
else:
# this produce nonsensical results
p_d += weight * ((params[0] + params[1] * x[node.idx_split_variable]) / m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick (naive) question...when fitting the linear model, we use as covariates X[idx_data_point, selected_predictor] (see https://github.com/pymc-devs/pymc-bart/blob/main/pymc_bart/pgbart.py#L433) ... how does this map to x[node.idx_split_variable] aren't we missing the selected predictor specification?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I ask is because when using p_d += weight * node.value.mean() I do get a nice pdp
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we call selected_predictor here X[idx_data_point, selected_predictor] is equivalent to what we call idx_split_variable] here x[node.idx_split_variable], in the first case X is a matrix and has many idx_data_points in the second case x is a vector or a single data_point.

By doing node.value.mean() we are using the mean of the stored/fitted values when doing p_d += weight * ((params[0] + params[1] * x[node.idx_split_variable]) / m) we are generating new predictions. This seems to indicate the fitting is ok, but the predictions are not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks! ☺️

@juanitorduz
Copy link
Contributor

@aloctavodia I was just looking into the ICE plots above... thy seem shifted on the y-axis as they have the same shape but can be negative ... I do not think this is expected right? Their mean should be the pdp right?

@aloctavodia
Copy link
Member Author

The scale will be off unless you specify the argument func=np.exp (in plot_pdp/ice)

@aloctavodia aloctavodia merged commit 77658b2 into main Jun 21, 2023
4 checks passed
@aloctavodia aloctavodia deleted the fast_pdp branch June 21, 2023 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants