refactor plot_dependence and implement fast version of pdp #85

aloctavodia · 2023-06-04T22:11:09Z

This implements a faster version of the partial dependence plots (~15x faster). I also took the opportunity to clean the code in utils, plot_dependence is deprecated in favor of two separated functions plot_pdp and plot_ice.

pymc_bart/tree.py

juanitorduz · 2023-06-12T20:15:36Z

I just added some comments regarding the mypy errors. Let me know if you want support on fixing the other ones :)

aloctavodia · 2023-06-12T20:44:55Z

Thanks!!! I will try to fix the mypy errors this week. If you think you can do it go ahead. Or if you have more tips to share they are going to be more than welcome.

juanitorduz · 2023-06-19T10:20:03Z

Ok! Here is an initial feedback from a user perspective. I tried the changes in https://www.pymc.io/projects/examples/en/latest/case_studies/BART_introduction.html

For the constant response, everything looks nice:

The ICE plots also work but take much longer (3 min in my Mac intel), but maybe as expected ...

For the linear response, things look weird:

I tried adding more threes to see if it would capture more complexity, but it did not work, although the results look a bit better as the ranges are more reasonable.

So I think the plots are working as expected. The linear response issue might be related to the model predictions in the linear response case, right?

aloctavodia · 2023-06-19T10:36:48Z

Right, ICE are expected to be much slower. They requiere a lot of predictions. I think the only way to accelerate it is to improve the inner working of the tree methods. And probably that requiere rewriting them in cython, RUST or something similar.

There is something funny going on with the linear response predictions. Still not sure what exactly. If we take the mean of the values stored at the leaf nodes, the plots looks reasonable. So yes, there is something with the predictions that we need to fix

juanitorduz · 2023-06-19T11:46:54Z

pymc_bart/tree.py

+                    p_d += weight * node.value
+                else:
+                    # this produce nonsensical results
+                    p_d += weight * ((params[0] + params[1] * x[node.idx_split_variable]) / m)


One quick (naive) question...when fitting the linear model, we use as covariates X[idx_data_point, selected_predictor] (see https://github.com/pymc-devs/pymc-bart/blob/main/pymc_bart/pgbart.py#L433) ... how does this map to x[node.idx_split_variable] aren't we missing the selected predictor specification?

The reason I ask is because when using p_d += weight * node.value.mean() I do get a nice pdp

what we call selected_predictor here X[idx_data_point, selected_predictor] is equivalent to what we call idx_split_variable] here x[node.idx_split_variable], in the first case X is a matrix and has many idx_data_points in the second case x is a vector or a single data_point.

By doing node.value.mean() we are using the mean of the stored/fitted values when doing p_d += weight * ((params[0] + params[1] * x[node.idx_split_variable]) / m) we are generating new predictions. This seems to indicate the fitting is ok, but the predictions are not.

Got it! Thanks! ☺️

juanitorduz · 2023-06-19T11:57:41Z

@aloctavodia I was just looking into the ICE plots above... thy seem shifted on the y-axis as they have the same shape but can be negative ... I do not think this is expected right? Their mean should be the pdp right?

aloctavodia · 2023-06-19T12:20:26Z

The scale will be off unless you specify the argument func=np.exp (in plot_pdp/ice)

juanitorduz reviewed Jun 12, 2023

View reviewed changes

pymc_bart/tree.py Outdated Show resolved Hide resolved

juanitorduz reviewed Jun 12, 2023

View reviewed changes

pymc_bart/tree.py Outdated Show resolved Hide resolved

juanitorduz reviewed Jun 12, 2023

View reviewed changes

pymc_bart/tree.py Show resolved Hide resolved

refactor plot_dependence and implement fast version of pdp

18655a4

aloctavodia force-pushed the fast_pdp branch from dfaed3a to 18655a4 Compare June 15, 2023 17:23

aloctavodia requested a review from juanitorduz June 15, 2023 17:24

juanitorduz reviewed Jun 19, 2023

View reviewed changes

aloctavodia merged commit 77658b2 into main Jun 21, 2023
4 checks passed

aloctavodia deleted the fast_pdp branch June 21, 2023 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor plot_dependence and implement fast version of pdp #85

refactor plot_dependence and implement fast version of pdp #85

aloctavodia commented Jun 4, 2023

juanitorduz commented Jun 12, 2023

aloctavodia commented Jun 12, 2023

juanitorduz commented Jun 19, 2023 •

edited

aloctavodia commented Jun 19, 2023 •

edited

juanitorduz Jun 19, 2023

juanitorduz Jun 19, 2023

aloctavodia Jun 19, 2023

juanitorduz Jun 19, 2023

juanitorduz commented Jun 19, 2023

aloctavodia commented Jun 19, 2023

refactor plot_dependence and implement fast version of pdp #85

refactor plot_dependence and implement fast version of pdp #85

Conversation

aloctavodia commented Jun 4, 2023

juanitorduz commented Jun 12, 2023

aloctavodia commented Jun 12, 2023

juanitorduz commented Jun 19, 2023 • edited

aloctavodia commented Jun 19, 2023 • edited

juanitorduz Jun 19, 2023

Choose a reason for hiding this comment

juanitorduz Jun 19, 2023

Choose a reason for hiding this comment

aloctavodia Jun 19, 2023

Choose a reason for hiding this comment

juanitorduz Jun 19, 2023

Choose a reason for hiding this comment

juanitorduz commented Jun 19, 2023

aloctavodia commented Jun 19, 2023

juanitorduz commented Jun 19, 2023 •

edited

aloctavodia commented Jun 19, 2023 •

edited