-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Triton Error [CUDA]: invalid argument by run_dnabert2.sh #36
Comments
Here are my installed packages. (dnabert2) shiro@GTUNE:~/DNABERT_2/finetune$ pip list absl-py 1.0.0 |
Here is my GPU implementation. (dnabert2) shiro@GTUNE:~$ nvidia-smi -L (dnabert2) shiro@GTUNE:~$ /usr/local/cuda/bin/nvcc --version (dnabert2) shiro@GTUNE:~$ dpkg -l | grep cudnn |
I have the same error Do you solve it?Thanks! |
I just gave up..... Sorry. |
What is the problem and the solution??
The provided data_path is /home/shiro/DNABERT_2/finetune
2023-08-31 17:57:18.856636: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/local/cuda/lib64:
2023-08-31 17:57:18.856685: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias']
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['classifier.weight', 'bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using cuda_amp half precision backend
***** Running training *****
Num examples = 36,496
Num Epochs = 5
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 4
Total optimization steps = 5,700
Number of trainable parameters = 117,070,851
0%| | 0/5700 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 21, in _bwd_kernel
KeyError: ('2-.-0-.-0-1e8410f206c822547fb50e2ea86e45a6-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-42648570729a4835b21c1c18cebedbfe-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, torch.float32, torch.float16, torch.float32, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, False, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_modified.py", line 332, in
train()
File "train_modified.py", line 314, in train
trainer.train()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 2745, in training_step
self.scaler.scale(loss).backward()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 1041, in backward
_flash_attn_backward(do,
File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 949, in _flash_attn_backward
_bwd_kernel[grid]( # type: ignore
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 63, in _bench
return do_bench(kernel_call)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/testing.py", line 140, in do_bench
fn()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 62, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 200, in run
return self.fn.run(*args, **kwargs)
File "", line 43, in _bwd_kernel
RuntimeError: Triton Error [CUDA]: invalid argument
0%| | 0/5700 [00:00<?, ?it/s
The text was updated successfully, but these errors were encountered: