Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Colab: AssertionError: libcuda.so cannot found! #14

Closed
erenirmak opened this issue Dec 6, 2023 · 5 comments
Closed

Google Colab: AssertionError: libcuda.so cannot found! #14

erenirmak opened this issue Dec 6, 2023 · 5 comments

Comments

@erenirmak
Copy link

I am using Google Colab Pro+ with V100 GPU. I have followed your example but couldn't get the output because of the error:
AssertionError: libcuda.so cannot found!
It seems that triton backend is causing the problem:

/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layernorm.py in _layer_norm_fwd(x, weight, bias, eps, residual, out_dtype, residual_dtype, is_rms_norm)
153 # heuristics for number of warps
154 with torch.cuda.device(x.device.index):
--> 155 _layer_norm_fwd_1pass_kernel[(M,)](
156 x,
157 y,

/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in run(self, *args, **kwargs)
98 pruned_configs = self.prune_configs(kwargs)
99 bench_start = time.time()
--> 100 timings = {config: self._bench(*args, config=config, **kwargs)
101 for config in pruned_configs}
102 bench_end = time.time()

/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in (.0)
98 pruned_configs = self.prune_configs(kwargs)
99 bench_start = time.time()
--> 100 timings = {config: self._bench(*args, config=config, **kwargs)
101 for config in pruned_configs}
102 bench_end = time.time()

/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in _bench(self, config, *args, **meta)
81 self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
82 try:
---> 83 return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8))
84 except OutOfResources:
85 return [float('inf'), float('inf'), float('inf')]

/usr/local/lib/python3.10/dist-packages/triton/testing.py in do_bench(fn, warmup, rep, grad_to_none, quantiles, fast_flush, return_mode)
102 """
103
--> 104 fn()
105 torch.cuda.synchronize()
106

/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in kernel_call()
79 config.pre_hook(full_nargs)
80 self.hook(args)
---> 81 self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
82 try:
83 return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8))

in _layer_norm_fwd_1pass_kernel(X, Y, W, B, RESIDUAL, RESIDUAL_OUT, Mean, Rstd, stride_x_row, stride_y_row, stride_res_row, stride_res_out_row, N, eps, IS_RMS_NORM, BLOCK_N, HAS_RESIDUAL, STORE_RESIDUAL_OUT, HAS_BIAS, grid, num_warps, num_stages, extern_libs, stream, warmup, device, device_type)

/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py in compile(fn, **kwargs)
423 # cache manager
424 if is_cuda or is_hip:
--> 425 so_path = make_stub(name, signature, constants)
426 else:
427 so_path = _device_backend.make_launcher_stub(name, signature, constants)

/usr/local/lib/python3.10/dist-packages/triton/compiler/make_launcher.py in make_stub(name, signature, constants)
37 with open(src_path, "w") as f:
38 f.write(src)
---> 39 so = _build(name, src_path, tmpdir)
40 with open(so, "rb") as f:
41 return so_cache_manager.put(f.read(), so_name, binary=True)

/usr/local/lib/python3.10/dist-packages/triton/common/build.py in _build(name, src, srcdir)
59 hip_include_dir = os.path.join(rocm_path_dir(), "include")
60 else:
---> 61 cuda_lib_dirs = libcuda_dirs()
62 cu_include_dir = cuda_include_dir()
63 suffix = sysconfig.get_config_var('EXT_SUFFIX')

/usr/local/lib/python3.10/dist-packages/triton/common/build.py in libcuda_dirs()
28 msg += 'Possible files are located at %s.' % str(locs)
29 msg += 'Please create a symlink of libcuda.so to any of the file.'
---> 30 assert any(os.path.exists(os.path.join(path, 'libcuda.so')) for path in dirs), msg
31 return dirs
32

AssertionError: libcuda.so cannot found!

How can I solve this on Google Colab environment?

@tridao
Copy link
Collaborator

tridao commented Dec 6, 2023

I'm not familiar with Google Colab, and I'm not sure how well Triton works with V100. Seems like there's also an issue in the triton repo tracking this.
You can try !ldconfig.

@erenirmak
Copy link
Author

I tried the command from the triton's open issue and it worked:
!echo /usr/lib64-nvidia/ >/etc/ld.so.conf.d/libcuda.conf; ldconfig

thank you

@xianfeizhu
Copy link

I am using Google Colab Pro+ with V100 GPU. I have followed your example but couldn't get the output because of the error: AssertionError: libcuda.so cannot found! It seems that triton backend is causing the problem:

/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layernorm.py in _layer_norm_fwd(x, weight, bias, eps, residual, out_dtype, residual_dtype, is_rms_norm) 153 # heuristics for number of warps 154 with torch.cuda.device(x.device.index): --> 155 _layer_norm_fwd_1pass_kernel[(M,)]( 156 x, 157 y,

/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in run(self, *args, **kwargs) 98 pruned_configs = self.prune_configs(kwargs) 99 bench_start = time.time() --> 100 timings = {config: self._bench(*args, config=config, **kwargs) 101 for config in pruned_configs} 102 bench_end = time.time()

/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in (.0) 98 pruned_configs = self.prune_configs(kwargs) 99 bench_start = time.time() --> 100 timings = {config: self._bench(*args, config=config, **kwargs) 101 for config in pruned_configs} 102 bench_end = time.time()

/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in _bench(self, config, *args, **meta) 81 self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current) 82 try: ---> 83 return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8)) 84 except OutOfResources: 85 return [float('inf'), float('inf'), float('inf')]

/usr/local/lib/python3.10/dist-packages/triton/testing.py in do_bench(fn, warmup, rep, grad_to_none, quantiles, fast_flush, return_mode) 102 """ 103 --> 104 fn() 105 torch.cuda.synchronize() 106

/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in kernel_call() 79 config.pre_hook(full_nargs) 80 self.hook(args) ---> 81 self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current) 82 try: 83 return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8))

in _layer_norm_fwd_1pass_kernel(X, Y, W, B, RESIDUAL, RESIDUAL_OUT, Mean, Rstd, stride_x_row, stride_y_row, stride_res_row, stride_res_out_row, N, eps, IS_RMS_NORM, BLOCK_N, HAS_RESIDUAL, STORE_RESIDUAL_OUT, HAS_BIAS, grid, num_warps, num_stages, extern_libs, stream, warmup, device, device_type)

/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py in compile(fn, **kwargs) 423 # cache manager 424 if is_cuda or is_hip: --> 425 so_path = make_stub(name, signature, constants) 426 else: 427 so_path = _device_backend.make_launcher_stub(name, signature, constants)

/usr/local/lib/python3.10/dist-packages/triton/compiler/make_launcher.py in make_stub(name, signature, constants) 37 with open(src_path, "w") as f: 38 f.write(src) ---> 39 so = _build(name, src_path, tmpdir) 40 with open(so, "rb") as f: 41 return so_cache_manager.put(f.read(), so_name, binary=True)

/usr/local/lib/python3.10/dist-packages/triton/common/build.py in _build(name, src, srcdir) 59 hip_include_dir = os.path.join(rocm_path_dir(), "include") 60 else: ---> 61 cuda_lib_dirs = libcuda_dirs() 62 cu_include_dir = cuda_include_dir() 63 suffix = sysconfig.get_config_var('EXT_SUFFIX')

/usr/local/lib/python3.10/dist-packages/triton/common/build.py in libcuda_dirs() 28 msg += 'Possible files are located at %s.' % str(locs) 29 msg += 'Please create a symlink of libcuda.so to any of the file.' ---> 30 assert any(os.path.exists(os.path.join(path, 'libcuda.so')) for path in dirs), msg 31 return dirs 32

AssertionError: libcuda.so cannot found!

How can I solve this on Google Colab environment?

Hello, I have encountered the same problem, have you solved it?

@xianfeizhu
Copy link

I tried the command from the triton's open issue and it worked: !echo /usr/lib64-nvidia/ >/etc/ld.so.conf.d/libcuda.conf; ldconfig

thank you

Hello, I have encountered the same problem, have you solved it?

@lan-creator
Copy link

For this problem, I found !echo /usr/lib64-nvidia/ >/etc/ld.so.conf.d/libcuda.conf; ldconfig this not work for me, and I go in triton package and change it directly at here:
/data/App/anaconda3/envs/mamba/lib/python3.10/site-packages/triton/common/build.py:line 25
add this code: dirs.append("/usr/local/cuda/cuda/lib64/stubs"), this path depands on your own stubs path, and I fixed this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants