Developing Branch: GPU Support Broken #138

crimsonmote · 2021-09-24T11:13:51Z

Description

Cuda-related error when training NOTEARS on GPU. It seems some data are not placed onto GPU properly.

Context

Training NOTEARS

Steps to Reproduce

Training NOTEARS on GPU

Expected Result

The training should proceed.

Actual Result

Error

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument mat1 in method wrapper_addmm)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-b16165c83709> in <module>
      8 from causalnex.structure.pytorch import from_pandas
      9 fcols = list(set(num + cat + ordi))
---> 10 sm = from_pandas(sdf[fcols], dist_type_schema=schema, lasso_beta=1e-5, w_threshold=0.0, use_bias=True, use_gpu=True)
     11 sm.threshold_till_dag()

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/notears.py in from_pandas(X, dist_type_schema, lasso_beta, ridge_beta, use_bias, hidden_layer_units, max_iter, w_threshold, tabu_edges, tabu_parent_nodes, tabu_child_nodes, use_gpu, **kwargs)
    342         tabu_child_nodes=tabu_child_nodes,
    343         use_gpu=use_gpu,
--> 344         **kwargs,
    345     )
    346 

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/notears.py in from_numpy(X, dist_type_schema, lasso_beta, ridge_beta, use_bias, hidden_layer_units, w_threshold, max_iter, tabu_edges, tabu_parent_nodes, tabu_child_nodes, use_gpu, **kwargs)
    185         **kwargs,
    186     )
--> 187     model.fit(X, max_iter=max_iter)
    188     sm = StructureModel(model.adj)
    189 

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/core.py in fit(self, x, max_iter, h_tol, rho_max)
    257 
    258         for n_iter in range(max_iter):
--> 259             rho, alpha, h = self._dual_ascent_step(X_torch, rho, alpha, h, rho_max)
    260             if h <= h_tol or rho >= rho_max:
    261                 break

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/core.py in _dual_ascent_step(self, X, rho, alpha, h, rho_max)
    414                 method="L-BFGS-B",
    415                 jac=True,
--> 416                 bounds=bounds,
    417             )
    418             _update_params_from_flat(params, sol.x)

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    616     elif meth == 'l-bfgs-b':
    617         return _minimize_lbfgsb(fun, x0, args, jac, bounds,
--> 618                                 callback=callback, **options)
    619     elif meth == 'tnc':
    620         return _minimize_tnc(fun, x0, args, jac, bounds, callback=callback,

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, finite_diff_rel_step, **unknown_options)
    306     sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
    307                                   bounds=new_bounds,
--> 308                                   finite_diff_rel_step=finite_diff_rel_step)
    309 
    310     func_and_grad = sf.fun_and_grad

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/optimize.py in _prepare_scalar_function(fun, x0, jac, args, bounds, epsilon, finite_diff_rel_step, hess)
    260     # calculation reduces overall function evaluations.
    261     sf = ScalarFunction(fun, x0, args, grad, hess,
--> 262                         finite_diff_rel_step, bounds, epsilon=epsilon)
    263 
    264     return sf

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in __init__(self, fun, x0, args, grad, hess, finite_diff_rel_step, finite_diff_bounds, epsilon)
     74 
     75         self._update_fun_impl = update_fun
---> 76         self._update_fun()
     77 
     78         # Gradient evaluation

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in _update_fun(self)
    164     def _update_fun(self):
    165         if not self.f_updated:
--> 166             self._update_fun_impl()
    167             self.f_updated = True
    168 

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in update_fun()
     71 
     72         def update_fun():
---> 73             self.f = fun_wrapped(self.x)
     74 
     75         self._update_fun_impl = update_fun

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in fun_wrapped(x)
     68         def fun_wrapped(x):
     69             self.nfev += 1
---> 70             return fun(x, *args)
     71 
     72         def update_fun():

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/optimize.py in __call__(self, x, *args)
     72     def __call__(self, x, *args):
     73         """ returns the the function value """
---> 74         self._compute_if_needed(x, *args)
     75         return self._value
     76 

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/optimize.py in _compute_if_needed(self, x, *args)
     66         if not np.all(x == self.x) or self._value is None or self.jac is None:
     67             self.x = np.asarray(x).copy()
---> 68             fg = self.fun(x, *args)
     69             self.jac = fg[1]
     70             self._value = fg[0]

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/core.py in _func(flat_params)
    381 
    382             n_features = X.shape[1]
--> 383             X_hat = self(X)
    384             h_val = self._h_func()
    385             loss = 0.0

~/.env/spending/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/core.py in forward(self, x)
    182             output tensor from the model
    183         """
--> 184         x = self.dag_layer(x)  # [n, d * m1]
    185         x = x.view(-1, self.dims[0], self.dims[1])  # [n, d, m1]
    186 

~/.env/spending/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/.env/spending/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     94 
     95     def forward(self, input: Tensor) -> Tensor:
---> 96         return F.linear(input, self.weight, self.bias)
     97 
     98     def extra_repr(self) -> str:

~/.env/spending/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1845     if has_torch_function_variadic(input, weight):
   1846         return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1847     return torch._C._nn.linear(input, weight, bias)
   1848 
   1849 

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument mat1 in method wrapper_addmm)

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

CausalNex version used (pip show causalnex): developing branch cloned on Sep 24, (supposedly 0.11.0)
Python version used (python -V): 3.6
Operating system and version: Ubuntu LTS 18.04

The text was updated successfully, but these errors were encountered:

oentaryorj · 2021-09-26T13:59:35Z

Hi @crimsonmote, thanks for raising this. There is another PR currently in progress: #135

You may want to use the quick fixes provided in this PR if you need it urgently. We are still working on the appropriate test cases etc for these changes.

oentaryorj · 2022-08-29T03:26:09Z

Merged #135 for now. A separate formal testing would be required for GPU execution, but I think we can close this for now?

oentaryorj added the bug Something isn't working label Sep 27, 2021

liam-adams mentioned this issue Jul 14, 2022

Add Dockerfiles for development on CPU and GPU #163

Merged

1 task

GabrielAzevedoFerreiraQB assigned RyanNgQB Aug 24, 2022

oentaryorj closed this as completed Aug 29, 2022

oentaryorj reopened this Aug 29, 2022

oentaryorj mentioned this issue Aug 30, 2022

Fix DAGLayer moving out of gpu during optimization step #135

Merged

7 tasks

oentaryorj closed this as completed Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Developing Branch: GPU Support Broken #138

Developing Branch: GPU Support Broken #138

crimsonmote commented Sep 24, 2021

oentaryorj commented Sep 26, 2021 •

edited

Loading

oentaryorj commented Aug 29, 2022 •

edited

Loading

Developing Branch: GPU Support Broken #138

Developing Branch: GPU Support Broken #138

Comments

crimsonmote commented Sep 24, 2021

Description

Context

Steps to Reproduce

Expected Result

Actual Result

Your Environment

oentaryorj commented Sep 26, 2021 • edited Loading

oentaryorj commented Aug 29, 2022 • edited Loading

oentaryorj commented Sep 26, 2021 •

edited

Loading

oentaryorj commented Aug 29, 2022 •

edited

Loading