fine tuning mpt7b using local dataset #143

singhalshikha518 · 2023-05-16T12:42:08Z

I tried fine tuning mpt7b using dolly dataset. Using below command:

composer train.py yamls/finetune/mpt-7b_dolly_sft.yaml

yaml file: https://github.com/mosaicml/llm-foundry/blob/main/scripts/train/yamls/finetune/mpt-7b_dolly_sft.yaml

Before strating training i am getting below error:

[Eval batch=321/321] Eval on eval data:
Eval metrics/eval/LanguageCrossEntropy: 9.1594
Eval metrics/eval/LanguagePerplexity: 9503.6523
/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Traceback (most recent call last):
File "", line 21, in _bwd_kernel
KeyError: ('2-.-0-.-0-842f0fbd42a6607893f7134cdd9d16f2-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-f24b6aa9b101a518b6a4a6bddded372e-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.float32, torch.bfloat16, torch.bfloat16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('vector', True, 128, False, True, True, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/stsingha/LLM/llm-foundry/scripts/train/train.py", line 254, in
main(cfg)
File "/home/stsingha/LLM/llm-foundry/scripts/train/train.py", line 243, in main
trainer.fit()
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 1766, in fit
self._train_loop()
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 1940, in _train_loop
total_loss_dict = self._train_batch(use_grad_scaling)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 2115, in _train_batch
optimizer.step(closure=lambda **kwargs: self._train_microbatches(
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
out = func(*args, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/optim/decoupled_weight_decay.py", line 288, in step
loss = closure()
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 2115, in
optimizer.step(closure=lambda **kwargs: self._train_microbatches(
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 2213, in _train_microbatches
microbatch_loss_dict = self._train_microbatch(use_grad_scaling, current_batch_size, is_final_microbatch)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/composer/trainer/trainer.py", line 2340, in _train_microbatch
microbatch_loss.backward(create_graph=self._backwards_create_graph)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/flash_attn/flash_attn_triton.py", line 827, in backward
_flash_attn_backward(do, q, k, v, o, lse, dq, dk, dv,
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/flash_attn/flash_attn_triton.py", line 694, in _flash_attn_backward
_bwd_kernel[grid](
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 73, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 73, in
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 63, in _bench
return do_bench(kernel_call)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/testing.py", line 140, in do_bench
fn()
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 62, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "/home/stsingha/LLM/llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 200, in run
return self.fn.run(*args, **kwargs)
File "", line 43, in _bwd_kernel
RuntimeError: Triton Error [CUDA]: invalid argument

Paladiamors · 2023-05-17T07:01:16Z

I'm also getting the same problem too, using triton 2.0.0.dev20221202 as recommended in the setup script.

fbiere · 2023-05-17T14:34:49Z

I am also seeing this problem.

alextrott16 · 2023-05-17T19:00:50Z

What kind of hardware are you using? And have you tried starting from our recommended docker image mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04?

Any other details about your environments would be helpful to know.

singhalshikha518 · 2023-05-18T05:49:36Z

@alextrott16
I am using slurm cluster with 4 A10 GPUs.
Cuda Version : 11.6
nvcc version : Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0

GCC version : gcc (GCC) 7.3.1 20180303

torch : 1.13.1+cu116

Along with the above error when i am trying multinode i am getting nccl error

singhalshikha518 · 2023-05-18T07:29:01Z

Also i am getting below error with 'attn_impl: torch' :
File "llm-foundry/llmfoundry-venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b/d8304854d4877849c3c0a78f3469512a84419e84/modeling_mpt.py", line 142, in forward
raise NotImplementedError('MPT does not support training with left padding.')

singhalshikha518 · 2023-05-18T15:30:06Z

What kind of hardware are you using? And have you tried starting from our recommended docker image mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04?

Any other details about your environments would be helpful to know.

Looks like the error is with torch1.13.1+cu116 version.
and with torch2.0.1 getting error of saving checkpoint.
Which torch version should be used for cuda 11.6?

vchiley · 2023-05-18T16:49:33Z

KeyError: ('2-.-0-.-0-842f0fbd42a6607893f7134cdd9d16f2-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-f24b6aa9b101a518b6a4a6bddded372e-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.float32, torch.bfloat16, torch.bfloat16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('vector', True, 128, False, True, True, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False)))

Is an error you see if you try to use torch>=2.0.0 (torch>=2.0.0 requires a version of triton that has issues)
We are working on a workaround.

vchiley · 2023-05-18T16:51:29Z

This error tells you the issue.
Your dataset is outputting data with left padding. MPT does not support training with left padding.
This is a dataset issue.

Louis-y-nlp · 2023-05-19T05:53:38Z

This question may sound a bit silly, but why is right padding used during training while left padding is chosen during inference?

jwatte · 2023-05-24T19:50:23Z

I'm getting the same kernel crash / key error:

[Eval batch=1/13] Eval on eval data
[Eval batch=2/13] Eval on eval data
[Eval batch=3/13] Eval on eval data
[Eval batch=5/13] Eval on eval data
[Eval batch=6/13] Eval on eval data
[Eval batch=7/13] Eval on eval data
[Eval batch=8/13] Eval on eval data
[Eval batch=9/13] Eval on eval data
[Eval batch=11/13] Eval on eval data
[Eval batch=12/13] Eval on eval data
[Eval batch=13/13] Eval on eval data:
         Eval metrics/eval/LanguageCrossEntropy: 10.0889
         Eval metrics/eval/LanguagePerplexity: 24073.9668
Traceback (most recent call last):
  File "<string>", line 21, in _bwd_kernel
KeyError: ('2-.-0-.-0--2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.float32, torch.bfloat16, torch.bfloat16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('vector', True, 128, False, True, True, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False)))

....

  File "/home/ubuntu/.local/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 200, in run
    return self.fn.run(*args, **kwargs)
  File "<string>", line 43, in _bwd_kernel
RuntimeError: Triton Error [CUDA]: invalid argument

Using a g5.24xlarge instance with 4xA10G GPUs on EC2.
It's using Torch 1.13.1 already:

ubuntu@ip-172-31-12-71:/opt/mpt-7b/llm-foundry/scripts/train$ pip uninstall torch
Found existing installation: torch 1.13.1

And triton 2.0.0.dev20221202:

ubuntu@ip-172-31-12-71:/opt/mpt-7b/llm-foundry/scripts/train$ pip uninstall triton
Found existing installation: triton 2.0.0.dev20221202

abhi-mosaic · 2023-05-31T01:09:27Z

Hi @jwatte , could you try installing this fork of triton we have setup? It uses a pre-MLIR tag that should work with torch 1.13.1:

llm-foundry/setup.py

Line 63 in 3c66b1c

    
           'triton-pre-mlir@git+https://github.com/vchiley/triton.git@triton_pre_mlir#subdirectory=python',

In general we have not tried training with A10s so it's a bit of uncharted territory. I hope we can get more internally so we can start adding it to our support matrix, but it's unlikely to happen in the next few weeks.

This question may sound a bit silly, but why is right padding used during training while left padding is chosen during inference?

I think the choice at training time is a bit arbitrary, but at inference time, left padding is used so that the ends of sequences line up, since you generate 1 token at a time, you want to make sure the new tokens are "lined up".

* Tweak example config dict + clarify running tests in subdirectories

abhi-mosaic · 2023-06-13T16:38:39Z

Closing this issue as it's gone a bit stale, but I just want to note that we are actively testing A10 support now and will update the support matrix on the top README once we have confirmed that it works.

singhalshikha518 mentioned this issue May 18, 2023

Error while saving checkpoint #163

Closed

vchiley mentioned this issue May 18, 2023

Finetune MPT models with local dataset #94

Closed

jacobfulano mentioned this issue Jun 1, 2023

ERROR:composer.cli.launcher:Rank 2 crashed with exit code -7 #258

Closed

abhi-mosaic self-assigned this Jun 2, 2023

bmosaicml pushed a commit that referenced this issue Jun 6, 2023

Top-level readme fixes (#143)

0118641

* Tweak example config dict + clarify running tests in subdirectories

abhi-mosaic closed this as completed Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fine tuning mpt7b using local dataset #143

fine tuning mpt7b using local dataset #143

singhalshikha518 commented May 16, 2023

Paladiamors commented May 17, 2023 •

edited

fbiere commented May 17, 2023

alextrott16 commented May 17, 2023

singhalshikha518 commented May 18, 2023 •

edited

singhalshikha518 commented May 18, 2023 •

edited

singhalshikha518 commented May 18, 2023

vchiley commented May 18, 2023

vchiley commented May 18, 2023

Louis-y-nlp commented May 19, 2023

jwatte commented May 24, 2023

abhi-mosaic commented May 31, 2023

abhi-mosaic commented Jun 13, 2023

fine tuning mpt7b using local dataset #143

fine tuning mpt7b using local dataset #143

Comments

singhalshikha518 commented May 16, 2023

Paladiamors commented May 17, 2023 • edited

fbiere commented May 17, 2023

alextrott16 commented May 17, 2023

singhalshikha518 commented May 18, 2023 • edited

singhalshikha518 commented May 18, 2023 • edited

singhalshikha518 commented May 18, 2023

vchiley commented May 18, 2023

vchiley commented May 18, 2023

Louis-y-nlp commented May 19, 2023

jwatte commented May 24, 2023

abhi-mosaic commented May 31, 2023

abhi-mosaic commented Jun 13, 2023

Paladiamors commented May 17, 2023 •

edited

singhalshikha518 commented May 18, 2023 •

edited

singhalshikha518 commented May 18, 2023 •

edited