-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linear nodes attempt (v2) #79
Linear nodes attempt (v2) #79
Conversation
I think we are (hopefully) ready to add the chances into the if response == "mix":
response = "linear" if np.random.random() >= 0.5 else "constant"
left_node_value, left_node_linear_params = draw_leaf_value(
sum_trees_output[left_node_idx_data_points],
X[left_node_idx_data_points, selected_predictor],
mean,
linear_fit,
m,
normal,
mu_std,
response,
)
right_node_value, right_node_linear_params = draw_leaf_value(
sum_trees_output[right_node_idx_data_points],
X[right_node_idx_data_points, selected_predictor],
mean,
linear_fit,
m,
normal,
mu_std,
response,
) where should we add this? Or is there a recommended way of improving this? |
improve code logic add m param add m param to tests add signature in pbbart to add linear nodes
I squashed the past commits during the rebase 🤦 (sorry!) |
Something like this? for idx in range(2):
idx_data_point = new_idx_data_points[idx]
node_value = draw_leaf_value(
X[idx_data_point, selected_predictor],
sum_trees[:, idx_data_point],
m,
normal.rvs() * kfactor,
shape,
) |
Co-authored-by: Osvaldo A Martin <aloctavodia@gmail.com>
Thank you @aloctavodia for the quick feedback! I think pymc.sampling.parallel.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 122, in run
self._start_loop()
File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 174, in _start_loop
point, stats = self._step_method.step(self._point)
File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/pymc/step_methods/compound.py", line 231, in step
point, sts = method.step(point)
File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/pymc/step_methods/arraystep.py", line 100, in step
apoint, stats = self.astep(q)
File "/Users/juanitorduz/Documents/pymc-bart/pymc_bart/pgbart.py", line 208, in astep
if p.sample_tree(
File "/Users/juanitorduz/Documents/pymc-bart/pymc_bart/pgbart.py", line 68, in sample_tree
idx_new_nodes = grow_tree(
File "/Users/juanitorduz/Documents/pymc-bart/pymc_bart/pgbart.py", line 426, in grow_tree
node_value, linear_parms = draw_leaf_value(
File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
error_rewrite(e, 'typing')
File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Cannot unify float64 and array(float64, 1d, C) for 'mu_mean.4', defined at /Users/juanitorduz/Documents/pymc-bart/pymc_bart/pgbart.py (484)
File "pymc_bart/pgbart.py", line 484:
def draw_leaf_value(y_mu_pred, x_mu, m, norm, shape, response):
<source elided>
return norm + mu_mean, linear_params Which I think it boils down to a numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Cannot unify float64 and array(float64, 1d, C) for 'mu_mean.4', defined at /Users/juanitorduz/Documents/pymc-bart/pymc_bart/pgbart.py (484) probably something with the |
@aloctavodia In 796c9a6 I fixed the numba shapes 🙌 ! The CI passes (current tests and mypy). |
Now the base implementation with linear response seems to work. The issue now is the vectorization for shape parameters 🥲 |
Co-authored-by: Osvaldo A Martin <aloctavodia@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copy method is missing one value linear_params
def copy(self) -> "Tree":
tree: Dict[int, Node] = {
k: Node(v.value, v.idx_data_points, v.idx_split_variable, v.linear_params)
for k, v in self.tree_structure.items()
}
idx_leaf_nodes = self.idx_leaf_nodes.copy() if self.idx_leaf_nodes is not None else None
return Tree(tree_structure=tree, idx_leaf_nodes=idx_leaf_nodes, output=self.output)
ok! I think the code looks much nice now and I believe we are getting closer bit-by-bit 😅 ! The Click me to see a reproducible example and the error traceimport pymc_bart as pmb
import pymc as pm
import numpy as np
X = np.random.normal(0, 1, size=(250, 3))
Y = np.random.normal(0, 1, size=250)
X[:, 0] = np.random.normal(Y, 0.1)
with pm.Model() as model:
mu = pmb.BART("mu", X, Y, m=10, response="linear")
sigma = pm.HalfNormal("sigma", 1)
y = pm.Normal("y", mu, sigma, observed=Y)
idata = pm.sample(random_seed=3415, chains=2) yields ValueError Traceback (most recent call last)
File [~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/link/vm.py:414](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/link/vm.py:414), in Loop.__call__(self)
411 for thunk, node, old_storage in zip_longest(
412 self.thunks, self.nodes, self.post_thunk_clear, fillvalue=()
413 ):
--> 414 thunk()
415 for old_s in old_storage:
File [~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/graph/op.py:543](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/graph/op.py:543), in Op.make_py_thunk..rval(p, i, o, n, params)
539 @is_thunk_type
540 def rval(
541 p=p, i=node_input_storage, o=node_output_storage, n=node, params=None
542 ):
--> 543 r = p(n, [x[0] for x in i], o)
544 for o in node.outputs:
File [~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/tensor/random/op.py:378](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/tensor/random/op.py:378), in RandomVariable.perform(self, node, inputs, outputs)
376 rng_var_out[0] = rng
--> 378 smpl_val = self.rng_fn(rng, *(args + [size]))
380 if (
381 not isinstance(smpl_val, np.ndarray)
382 or str(smpl_val.dtype) != out_var.type.dtype
383 ):
File [~/Documents/pymc-bart/pymc_bart/bart.py:55](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/Documents/pymc-bart/pymc_bart/bart.py:55), in BARTRV.rng_fn(cls, rng, X, Y, m, alpha, split_prior, size)
54 else:
---> 55 return _sample_posterior(cls.all_trees, cls.X, cls.m, rng=rng).squeeze().T
File [~/Documents/pymc-bart/pymc_bart/utils.py:69](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/Documents/pymc-bart/pymc_bart/utils.py:69), in _sample_posterior(all_trees, X, m, rng, size, excluded)
68 for tree in stacked_trees[idx[ind]]:
---> 69 p += np.vstack([tree.predict(x=x, m=m, excluded=excluded) for x in X])
70 pred.reshape((*size_iter, shape, -1))
File <__array_function__ internals>:200, in vstack(*args, **kwargs)
File [~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/numpy/core/shape_base.py:296](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/numpy/core/shape_base.py:296), in vstack(tup, dtype, casting)
295 arrs = [arrs]
--> 296 return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
File <__array_function__ internals>:200, in concatenate(*args, **kwargs)
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 104 and the array at index 1 has size 143
During handling of the above exception, another exception occurred:
...
Inputs values: [Generator(PCG64) at 0x7FA8D903A7A0, array([], dtype=int64), array(11), 'not shown', 'not shown', array(10, dtype=int8), array(0.25, dtype=float32), array([], dtype=float64)]
Outputs clients: [['output'], [Elemwise{second,no_inplace}(mu, TensorConstant{(1,) of -0..5211646611})]] which is a shape error 😢 . Any tips are welcome! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We forgot to add "v.linear_params" to the trim function. Something like this
k: Node(v.value, None, v.idx_split_variable, v.linear_params) for k, v in self.tree_structure.items()
The trim function is used to delete some attributes we no longer need once we have sampled a tree, we do this just to save memory.
Thanks! Fixed in 6d46784 |
@aloctavodia thanks for the suggestion! In c2de9fb I added your code. The tests for Therefore I think we still have two problems for the linear response:
|
The vectorization has been resolved 🥳 ! The only edge case missing is |
I think is finally ready for review 😆 ! |
thanks a lot for this @juanitorduz!!! |
Attempts to address #51
From the very initial attempt (#74)
For this iteration I suggest I add some code, add questions and incorporate feedback bit-by-bit.