-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Colab: AssertionError: libcuda.so cannot found! #14
Comments
I'm not familiar with Google Colab, and I'm not sure how well Triton works with V100. Seems like there's also an issue in the triton repo tracking this. |
I tried the command from the triton's open issue and it worked: thank you |
Hello, I have encountered the same problem, have you solved it? |
Hello, I have encountered the same problem, have you solved it? |
For this problem, I found |
I am using Google Colab Pro+ with V100 GPU. I have followed your example but couldn't get the output because of the error:
AssertionError: libcuda.so cannot found!
It seems that triton backend is causing the problem:
/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layernorm.py in _layer_norm_fwd(x, weight, bias, eps, residual, out_dtype, residual_dtype, is_rms_norm)
153 # heuristics for number of warps
154 with torch.cuda.device(x.device.index):
--> 155 _layer_norm_fwd_1pass_kernel[(M,)](
156 x,
157 y,
/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in run(self, *args, **kwargs)
98 pruned_configs = self.prune_configs(kwargs)
99 bench_start = time.time()
--> 100 timings = {config: self._bench(*args, config=config, **kwargs)
101 for config in pruned_configs}
102 bench_end = time.time()
/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in (.0)
98 pruned_configs = self.prune_configs(kwargs)
99 bench_start = time.time()
--> 100 timings = {config: self._bench(*args, config=config, **kwargs)
101 for config in pruned_configs}
102 bench_end = time.time()
/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in _bench(self, config, *args, **meta)
81 self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
82 try:
---> 83 return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8))
84 except OutOfResources:
85 return [float('inf'), float('inf'), float('inf')]
/usr/local/lib/python3.10/dist-packages/triton/testing.py in do_bench(fn, warmup, rep, grad_to_none, quantiles, fast_flush, return_mode)
102 """
103
--> 104 fn()
105 torch.cuda.synchronize()
106
/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in kernel_call()
79 config.pre_hook(full_nargs)
80 self.hook(args)
---> 81 self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
82 try:
83 return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8))
in _layer_norm_fwd_1pass_kernel(X, Y, W, B, RESIDUAL, RESIDUAL_OUT, Mean, Rstd, stride_x_row, stride_y_row, stride_res_row, stride_res_out_row, N, eps, IS_RMS_NORM, BLOCK_N, HAS_RESIDUAL, STORE_RESIDUAL_OUT, HAS_BIAS, grid, num_warps, num_stages, extern_libs, stream, warmup, device, device_type)
/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py in compile(fn, **kwargs)
423 # cache manager
424 if is_cuda or is_hip:
--> 425 so_path = make_stub(name, signature, constants)
426 else:
427 so_path = _device_backend.make_launcher_stub(name, signature, constants)
/usr/local/lib/python3.10/dist-packages/triton/compiler/make_launcher.py in make_stub(name, signature, constants)
37 with open(src_path, "w") as f:
38 f.write(src)
---> 39 so = _build(name, src_path, tmpdir)
40 with open(so, "rb") as f:
41 return so_cache_manager.put(f.read(), so_name, binary=True)
/usr/local/lib/python3.10/dist-packages/triton/common/build.py in _build(name, src, srcdir)
59 hip_include_dir = os.path.join(rocm_path_dir(), "include")
60 else:
---> 61 cuda_lib_dirs = libcuda_dirs()
62 cu_include_dir = cuda_include_dir()
63 suffix = sysconfig.get_config_var('EXT_SUFFIX')
/usr/local/lib/python3.10/dist-packages/triton/common/build.py in libcuda_dirs()
28 msg += 'Possible files are located at %s.' % str(locs)
29 msg += 'Please create a symlink of libcuda.so to any of the file.'
---> 30 assert any(os.path.exists(os.path.join(path, 'libcuda.so')) for path in dirs), msg
31 return dirs
32
AssertionError: libcuda.so cannot found!
How can I solve this on Google Colab environment?
The text was updated successfully, but these errors were encountered: