Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear nodes attempt (v2) #79

Merged
merged 29 commits into from
May 30, 2023
Merged

Linear nodes attempt (v2) #79

merged 29 commits into from
May 30, 2023

Conversation

juanitorduz
Copy link
Contributor

Attempts to address #51

From the very initial attempt (#74)

I really want to have the linear nodes hehe. So this is a draft PR to try to port the changes added in pymc-devs/pymc#5044 . The code-base has changed a bit so I need to catch up in understanding the code more. If someone could give me some guidance it would be much appreciated. I feel this is a relatively "low-hanging-fruit" but as always, the devil is in the details.

For this iteration I suggest I add some code, add questions and incorporate feedback bit-by-bit.

pymc_bart/tree.py Outdated Show resolved Hide resolved
@juanitorduz juanitorduz marked this pull request as draft April 3, 2023 13:36
@juanitorduz
Copy link
Contributor Author

I think we are (hopefully) ready to add the chances into the pgbart.by module. In 7d3edc7 I started with the basic signatures. From the initial PR it looks we need to add something like:

    if response == "mix":
        response = "linear" if np.random.random() >= 0.5 else "constant"

    left_node_value, left_node_linear_params = draw_leaf_value(
        sum_trees_output[left_node_idx_data_points],
        X[left_node_idx_data_points, selected_predictor],
        mean,
        linear_fit,
        m,
        normal,
        mu_std,
        response,
    )

    right_node_value, right_node_linear_params = draw_leaf_value(
        sum_trees_output[right_node_idx_data_points],
        X[right_node_idx_data_points, selected_predictor],
        mean,
        linear_fit,
        m,
        normal,
        mu_std,
        response,
    )

where should we add this? Or is there a recommended way of improving this?

improve code logic

add m param

add m param to tests

add signature in pbbart to add linear nodes
@juanitorduz
Copy link
Contributor Author

I squashed the past commits during the rebase 🤦 (sorry!)

pymc_bart/pgbart.py Outdated Show resolved Hide resolved
pymc_bart/pgbart.py Outdated Show resolved Hide resolved
@aloctavodia
Copy link
Member

I think we are (hopefully) ready to add the chances into the pgbart.by module. In 7d3edc7 I started with the basic signatures. From the initial PR it looks we need to add something like:

    if response == "mix":
        response = "linear" if np.random.random() >= 0.5 else "constant"

    left_node_value, left_node_linear_params = draw_leaf_value(
        sum_trees_output[left_node_idx_data_points],
        X[left_node_idx_data_points, selected_predictor],
        mean,
        linear_fit,
        m,
        normal,
        mu_std,
        response,
    )

    right_node_value, right_node_linear_params = draw_leaf_value(
        sum_trees_output[right_node_idx_data_points],
        X[right_node_idx_data_points, selected_predictor],
        mean,
        linear_fit,
        m,
        normal,
        mu_std,
        response,
    )

where should we add this? Or is there a recommended way of improving this?

Something like this?

    for idx in range(2):
        idx_data_point = new_idx_data_points[idx]
        node_value = draw_leaf_value(
            X[idx_data_point, selected_predictor],
            sum_trees[:, idx_data_point],
            m,
            normal.rvs() * kfactor,
            shape,
        )

juanitorduz and others added 2 commits May 11, 2023 20:11
Co-authored-by: Osvaldo A Martin <aloctavodia@gmail.com>
@juanitorduz
Copy link
Contributor Author

juanitorduz commented May 11, 2023

Thank you @aloctavodia for the quick feedback! I think mypy is happy now (all matches up!) However, I am getting weird errors running the tests

pymc.sampling.parallel.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 122, in run
    self._start_loop()
  File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/pymc/sampling/parallel.py", line 174, in _start_loop
    point, stats = self._step_method.step(self._point)
  File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/pymc/step_methods/compound.py", line 231, in step
    point, sts = method.step(point)
  File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/pymc/step_methods/arraystep.py", line 100, in step
    apoint, stats = self.astep(q)
  File "/Users/juanitorduz/Documents/pymc-bart/pymc_bart/pgbart.py", line 208, in astep
    if p.sample_tree(
  File "/Users/juanitorduz/Documents/pymc-bart/pymc_bart/pgbart.py", line 68, in sample_tree
    idx_new_nodes = grow_tree(
  File "/Users/juanitorduz/Documents/pymc-bart/pymc_bart/pgbart.py", line 426, in grow_tree
    node_value, linear_parms = draw_leaf_value(
  File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/Users/juanitorduz/.pyenv/versions/3.10.10/lib/python3.10/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Cannot unify float64 and array(float64, 1d, C) for 'mu_mean.4', defined at /Users/juanitorduz/Documents/pymc-bart/pymc_bart/pgbart.py (484)

File "pymc_bart/pgbart.py", line 484:
def draw_leaf_value(y_mu_pred, x_mu, m, norm, shape, response):
    <source elided>

    return norm + mu_mean, linear_params

Which I think it boils down to a numba error:

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Cannot unify float64 and array(float64, 1d, C) for 'mu_mean.4', defined at /Users/juanitorduz/Documents/pymc-bart/pymc_bart/pgbart.py (484)

probably something with the fast_linear_fit function? Any tip would be welcome!

@juanitorduz
Copy link
Contributor Author

@aloctavodia In 796c9a6 I fixed the numba shapes 🙌 ! The CI passes (current tests and mypy).
In f4e7074 I added a test to test the linear response and it is currently failing since I think there is a problem with the vectorization of the linear model.

pymc_bart/pgbart.py Outdated Show resolved Hide resolved
@juanitorduz
Copy link
Contributor Author

@aloctavodia In 796c9a6 I fixed the numba shapes 🙌 ! The CI passes (current tests and mypy).

In f4e7074 I added a test to test the linear response and it is currently failing since I think there is a problem with the vectorization of the linear model.

Now the base implementation with linear response seems to work. The issue now is the vectorization for shape parameters 🥲

pymc_bart/pgbart.py Outdated Show resolved Hide resolved
Co-authored-by: Osvaldo A Martin <aloctavodia@gmail.com>
Copy link
Member

@aloctavodia aloctavodia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copy method is missing one value linear_params

def copy(self) -> "Tree":
    tree: Dict[int, Node] = {
        k: Node(v.value, v.idx_data_points, v.idx_split_variable, v.linear_params)
        for k, v in self.tree_structure.items()
    }
    idx_leaf_nodes = self.idx_leaf_nodes.copy() if self.idx_leaf_nodes is not None else None
    return Tree(tree_structure=tree, idx_leaf_nodes=idx_leaf_nodes, output=self.output)

pymc_bart/bart.py Outdated Show resolved Hide resolved
pymc_bart/pgbart.py Outdated Show resolved Hide resolved
pymc_bart/utils.py Show resolved Hide resolved
@juanitorduz
Copy link
Contributor Author

juanitorduz commented May 17, 2023

ok! I think the code looks much nice now and I believe we are getting closer bit-by-bit 😅 ! The draw_leaf_value is clean and compiles to numba via njit. However, we are getting a weird shape error which I have not been able to fix after hours of debugging.

Click me to see a reproducible example and the error trace
import pymc_bart as pmb
import pymc as pm
import numpy as np

X = np.random.normal(0, 1, size=(250, 3))
Y = np.random.normal(0, 1, size=250)
X[:, 0] = np.random.normal(Y, 0.1)

with pm.Model() as model:
    mu = pmb.BART("mu", X, Y, m=10, response="linear")
    sigma = pm.HalfNormal("sigma", 1)
    y = pm.Normal("y", mu, sigma, observed=Y)
    idata = pm.sample(random_seed=3415, chains=2)

yields

ValueError                                Traceback (most recent call last)
File [~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/link/vm.py:414](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/link/vm.py:414), in Loop.__call__(self)
    411 for thunk, node, old_storage in zip_longest(
    412     self.thunks, self.nodes, self.post_thunk_clear, fillvalue=()
    413 ):
--> 414     thunk()
    415     for old_s in old_storage:

File [~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/graph/op.py:543](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/graph/op.py:543), in Op.make_py_thunk..rval(p, i, o, n, params)
    539 @is_thunk_type
    540 def rval(
    541     p=p, i=node_input_storage, o=node_output_storage, n=node, params=None
    542 ):
--> 543     r = p(n, [x[0] for x in i], o)
    544     for o in node.outputs:

File [~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/tensor/random/op.py:378](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/pytensor/tensor/random/op.py:378), in RandomVariable.perform(self, node, inputs, outputs)
    376 rng_var_out[0] = rng
--> 378 smpl_val = self.rng_fn(rng, *(args + [size]))
    380 if (
    381     not isinstance(smpl_val, np.ndarray)
    382     or str(smpl_val.dtype) != out_var.type.dtype
    383 ):

File [~/Documents/pymc-bart/pymc_bart/bart.py:55](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/Documents/pymc-bart/pymc_bart/bart.py:55), in BARTRV.rng_fn(cls, rng, X, Y, m, alpha, split_prior, size)
     54 else:
---> 55     return _sample_posterior(cls.all_trees, cls.X, cls.m, rng=rng).squeeze().T

File [~/Documents/pymc-bart/pymc_bart/utils.py:69](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/Documents/pymc-bart/pymc_bart/utils.py:69), in _sample_posterior(all_trees, X, m, rng, size, excluded)
     68     for tree in stacked_trees[idx[ind]]:
---> 69         p += np.vstack([tree.predict(x=x, m=m, excluded=excluded) for x in X])
     70 pred.reshape((*size_iter, shape, -1))

File <__array_function__ internals>:200, in vstack(*args, **kwargs)

File [~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/numpy/core/shape_base.py:296](https://file+.vscode-resource.vscode-cdn.net/Users/juanitorduz/Documents/pymc-bart/~/opt/anaconda3/envs/pymc-bart-env/lib/python3.10/site-packages/numpy/core/shape_base.py:296), in vstack(tup, dtype, casting)
    295     arrs = [arrs]
--> 296 return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)

File <__array_function__ internals>:200, in concatenate(*args, **kwargs)

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 104 and the array at index 1 has size 143

During handling of the above exception, another exception occurred:
...
Inputs values: [Generator(PCG64) at 0x7FA8D903A7A0, array([], dtype=int64), array(11), 'not shown', 'not shown', array(10, dtype=int8), array(0.25, dtype=float32), array([], dtype=float64)]
Outputs clients: [['output'], [Elemwise{second,no_inplace}(mu, TensorConstant{(1,) of -0..5211646611})]]

which is a shape error 😢 .

Any tips are welcome!

Copy link
Member

@aloctavodia aloctavodia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We forgot to add "v.linear_params" to the trim function. Something like this

        k: Node(v.value, None, v.idx_split_variable, v.linear_params) for k, v in self.tree_structure.items()

The trim function is used to delete some attributes we no longer need once we have sampled a tree, we do this just to save memory.

@juanitorduz
Copy link
Contributor Author

Thanks! Fixed in 6d46784

@juanitorduz
Copy link
Contributor Author

@aloctavodia thanks for the suggestion! In c2de9fb I added your code. The tests for tests/test_bart.py::test_shape keep failing. Surprisingly, the simple unit tests tests/test_pgbart.py::test_fast_linear_fit started falling. I looked into it and apparently, numba does not like the if ... else condition. Hence, I took I simplified it in 8c6963d and these tests pass now (so maybe the b = np.zeros(1) was not necessary?) .

Therefore I think we still have two problems for the linear response:

  • tests/test_bart.py::test_shape
  • tests/test_bart.py::test_missing_data sometimes fail and sometimes passes (see tests of 8c6963d for different python versions)

@juanitorduz
Copy link
Contributor Author

The vectorization has been resolved 🥳 ! The only edge case missing is tests/test_bart.py::test_missing_data[linear_response]

@juanitorduz juanitorduz marked this pull request as ready for review May 29, 2023 21:09
@juanitorduz
Copy link
Contributor Author

I think is finally ready for review 😆 !

@aloctavodia aloctavodia merged commit 144e048 into pymc-devs:main May 30, 2023
4 checks passed
@aloctavodia
Copy link
Member

thanks a lot for this @juanitorduz!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants