Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developing Branch: GPU Support Broken #138

Closed
crimsonmote opened this issue Sep 24, 2021 · 2 comments
Closed

Developing Branch: GPU Support Broken #138

crimsonmote opened this issue Sep 24, 2021 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@crimsonmote
Copy link

Description

Cuda-related error when training NOTEARS on GPU. It seems some data are not placed onto GPU properly.

Context

Training NOTEARS

Steps to Reproduce

Training NOTEARS on GPU

Expected Result

The training should proceed.

Actual Result

Error

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument mat1 in method wrapper_addmm)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-b16165c83709> in <module>
      8 from causalnex.structure.pytorch import from_pandas
      9 fcols = list(set(num + cat + ordi))
---> 10 sm = from_pandas(sdf[fcols], dist_type_schema=schema, lasso_beta=1e-5, w_threshold=0.0, use_bias=True, use_gpu=True)
     11 sm.threshold_till_dag()

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/notears.py in from_pandas(X, dist_type_schema, lasso_beta, ridge_beta, use_bias, hidden_layer_units, max_iter, w_threshold, tabu_edges, tabu_parent_nodes, tabu_child_nodes, use_gpu, **kwargs)
    342         tabu_child_nodes=tabu_child_nodes,
    343         use_gpu=use_gpu,
--> 344         **kwargs,
    345     )
    346 

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/notears.py in from_numpy(X, dist_type_schema, lasso_beta, ridge_beta, use_bias, hidden_layer_units, w_threshold, max_iter, tabu_edges, tabu_parent_nodes, tabu_child_nodes, use_gpu, **kwargs)
    185         **kwargs,
    186     )
--> 187     model.fit(X, max_iter=max_iter)
    188     sm = StructureModel(model.adj)
    189 

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/core.py in fit(self, x, max_iter, h_tol, rho_max)
    257 
    258         for n_iter in range(max_iter):
--> 259             rho, alpha, h = self._dual_ascent_step(X_torch, rho, alpha, h, rho_max)
    260             if h <= h_tol or rho >= rho_max:
    261                 break

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/core.py in _dual_ascent_step(self, X, rho, alpha, h, rho_max)
    414                 method="L-BFGS-B",
    415                 jac=True,
--> 416                 bounds=bounds,
    417             )
    418             _update_params_from_flat(params, sol.x)

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    616     elif meth == 'l-bfgs-b':
    617         return _minimize_lbfgsb(fun, x0, args, jac, bounds,
--> 618                                 callback=callback, **options)
    619     elif meth == 'tnc':
    620         return _minimize_tnc(fun, x0, args, jac, bounds, callback=callback,

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, finite_diff_rel_step, **unknown_options)
    306     sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
    307                                   bounds=new_bounds,
--> 308                                   finite_diff_rel_step=finite_diff_rel_step)
    309 
    310     func_and_grad = sf.fun_and_grad

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/optimize.py in _prepare_scalar_function(fun, x0, jac, args, bounds, epsilon, finite_diff_rel_step, hess)
    260     # calculation reduces overall function evaluations.
    261     sf = ScalarFunction(fun, x0, args, grad, hess,
--> 262                         finite_diff_rel_step, bounds, epsilon=epsilon)
    263 
    264     return sf

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in __init__(self, fun, x0, args, grad, hess, finite_diff_rel_step, finite_diff_bounds, epsilon)
     74 
     75         self._update_fun_impl = update_fun
---> 76         self._update_fun()
     77 
     78         # Gradient evaluation

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in _update_fun(self)
    164     def _update_fun(self):
    165         if not self.f_updated:
--> 166             self._update_fun_impl()
    167             self.f_updated = True
    168 

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in update_fun()
     71 
     72         def update_fun():
---> 73             self.f = fun_wrapped(self.x)
     74 
     75         self._update_fun_impl = update_fun

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in fun_wrapped(x)
     68         def fun_wrapped(x):
     69             self.nfev += 1
---> 70             return fun(x, *args)
     71 
     72         def update_fun():

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/optimize.py in __call__(self, x, *args)
     72     def __call__(self, x, *args):
     73         """ returns the the function value """
---> 74         self._compute_if_needed(x, *args)
     75         return self._value
     76 

~/.env/spending/lib/python3.6/site-packages/scipy/optimize/optimize.py in _compute_if_needed(self, x, *args)
     66         if not np.all(x == self.x) or self._value is None or self.jac is None:
     67             self.x = np.asarray(x).copy()
---> 68             fg = self.fun(x, *args)
     69             self.jac = fg[1]
     70             self._value = fg[0]

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/core.py in _func(flat_params)
    381 
    382             n_features = X.shape[1]
--> 383             X_hat = self(X)
    384             h_val = self._h_func()
    385             loss = 0.0

~/.env/spending/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/.env/spending/lib/python3.6/site-packages/causalnex-0.11.0-py3.6.egg/causalnex/structure/pytorch/core.py in forward(self, x)
    182             output tensor from the model
    183         """
--> 184         x = self.dag_layer(x)  # [n, d * m1]
    185         x = x.view(-1, self.dims[0], self.dims[1])  # [n, d, m1]
    186 

~/.env/spending/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/.env/spending/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     94 
     95     def forward(self, input: Tensor) -> Tensor:
---> 96         return F.linear(input, self.weight, self.bias)
     97 
     98     def extra_repr(self) -> str:

~/.env/spending/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1845     if has_torch_function_variadic(input, weight):
   1846         return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1847     return torch._C._nn.linear(input, weight, bias)
   1848 
   1849 

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument mat1 in method wrapper_addmm)

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • CausalNex version used (pip show causalnex): developing branch cloned on Sep 24, (supposedly 0.11.0)
  • Python version used (python -V): 3.6
  • Operating system and version: Ubuntu LTS 18.04
@oentaryorj
Copy link
Contributor

oentaryorj commented Sep 26, 2021

Hi @crimsonmote, thanks for raising this. There is another PR currently in progress: #135

You may want to use the quick fixes provided in this PR if you need it urgently. We are still working on the appropriate test cases etc for these changes.

@oentaryorj
Copy link
Contributor

oentaryorj commented Aug 29, 2022

Merged #135 for now. A separate formal testing would be required for GPU execution, but I think we can close this for now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants