Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an error while training #159

Closed
ChrisXULC opened this issue May 18, 2023 · 5 comments
Closed

an error while training #159

ChrisXULC opened this issue May 18, 2023 · 5 comments
Assignees

Comments

@ChrisXULC
Copy link

File "/root/anaconda3/envs/llava/lib/python3.10/multiprocessing/shared_memory.py", line 104, in init
self._fd = _posixshmem.shm_open(
FileExistsError: [Errno 17] File exists: '/000000_barrier'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/homeLMFlow/cn_llama/packages/new/llm-foundry/scripts/train/train.py", line 256, in
main(cfg)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/scripts/train/train.py", line 151, in main
train_loader = build_dataloader(
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/scripts/train/train.py", line 73, in build_dataloader
return build_text_dataloader(
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/scripts/train/llmfoundry/data/text_data.py", line 253, in build_text_dataloader
dataset = StreamingTextDataset(
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/scripts/train/llmfoundry/data/text_data.py", line 110, in init
super().init(
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/streaming/base/dataset.py", line 331, in init
self._worker_barrier = SharedBarrier(worker_barrier_filelock_path, worker_barrier_shm_path)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/streaming/base/shared.py", line 51, in init
shared_barrier_shm = CreateSharedMemory(name=shm_path, size=size)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/streaming/base/shared.py", line 215, in init
shm = SharedMemory(name, False, size)
File "/root/anaconda3/envs/llava/lib/python3.10/multiprocessing/shared_memory.py", line 104, in init
self._fd = _posixshmem.shm_open(
FileNotFoundError: [Errno 2] No such file or directory: '/000000_barrier'

----------End global rank 1 STDERR----------
ERROR:composer.cli.launcher:Global rank 0 (PID 30509) exited with code 1

@ChrisXULC
Copy link
Author

Starting training...

----------End global rank 1 STDOUT----------
----------Begin global rank 1 STDERR----------
/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/composer/callbacks/speed_monitor.py:120: UserWarning: gpu_flop count not found for None with precision: amp_fp16; MFU cannot be calculated and reported. gpu_flops_available can be manuallyoverridden by setting gpu_flops_available in SpeedMonitor.
warnings.warn(
Traceback (most recent call last):
File "", line 21, in _bwd_kernel
KeyError: ('2-.-0-.-0-5ef8f334a15fe35aaf5db62d90ceef62-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, None, torch.float16, torch.float32, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('none', True, 64, False, True, True, True, 128, 128), (True, True, True, (False,), True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/homeLMFlow/cn_llama/packages/new/llm-foundry/scripts/train/train.py", line 257, in
main(cfg)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/scripts/train/train.py", line 245, in main
trainer.fit()
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/composer/trainer/trainer.py", line 1766, in fit
self._train_loop()
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/composer/trainer/trainer.py", line 1940, in _train_loop
total_loss_dict = self._train_batch(use_grad_scaling)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/composer/trainer/trainer.py", line 2118, in _train_batch
self._train_microbatches(microbatches, total_loss_dict)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/composer/trainer/trainer.py", line 2213, in _train_microbatches
microbatch_loss_dict = self._train_microbatch(use_grad_scaling, current_batch_size, is_final_microbatch)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/composer/trainer/trainer.py", line 2340, in _train_microbatch
microbatch_loss.backward(create_graph=self._backwards_create_graph)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/flash_attn/flash_attn_triton.py", line 827, in backward
_flash_attn_backward(do, q, k, v, o, lse, dq, dk, dv,
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/flash_attn/flash_attn_triton.py", line 694, in _flash_attn_backward
_bwd_kernel[grid](
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "/homeLMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 73, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 73, in
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 63, in _bench
return do_bench(kernel_call)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/triton/testing.py", line 140, in do_bench
fn()
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 62, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "/home/LMFlow/cn_llama/packages/new/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 200, in run
return self.fn.run(*args, **kwargs)
File "", line 43, in _bwd_kernel
RuntimeError: Triton Error [CUDA]: invalid argument

----------End global rank 1 STDERR----------
ERROR:composer.cli.launcher:Global rank 0 (PID 86132) exited with code 1

@singhalshikha518
Copy link

I am also facing same error

@hanlint
Copy link
Collaborator

hanlint commented May 18, 2023

Hello @singhalshikha518 and @ChrisXULC , thanks for raising these! Its possible that RuntimeError: Triton Error [CUDA]: invalid argument arises because of (1) running on older hardware that the Triton kernel does not support, or (2) some incompatibilty in CUDA versions, etc. For example see microsoft/DeepSpeed#3382 or microsoft/DeepSpeed-MII#170 (comment).

Could you provide some system specs to help us debug? (GPU type, OS version, etc.). Our recommended setup can be found here: https://github.com/mosaicml/llm-foundry#prerequisites

As a fallback, you can also turn off triton by setting the attn_impl: torch. This is slower and uses more memory, but may work if Triton kernels are difficult to setup properly in your environment.

@hanlint hanlint self-assigned this May 18, 2023
@hanlint
Copy link
Collaborator

hanlint commented May 26, 2023

Closing this as stale -- please reopen if using torch does not work!

@hanlint hanlint closed this as completed May 26, 2023
@mikeybellissimo
Copy link

Hi, just want to start by thanking you and your company for doing such great work in making Ai so accessible to the public.

I am also having this same error training MPT-7B using triton. I had the same error when I tried going through the docker available on the LLM-Foundry github page but my current specs are:
OS: Ubuntu on Windows WSL2
GPU: 3090 (Which rules out the error being only present for GPU's that predate Ampere)
Triton: 2.0.0.dev20221202
Flash-attn:1.0.3.post0
Python: 3.10.9
Torch: Currently 2.0.1 (I've tried with 1.13.1 as well)
CUDA: 11.7

Torch as the attn_impl does work just fine but it's highly limiting for Sequence length for with my memory capacity so it would be really great to be able to use Triton.
Thanks!
Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants