[FT][ERROR] CUDA runtime error: operation not supported #20

karlind · 2022-08-12T03:19:50Z

I am using a T4 gpu, host machine's cuda is 11.0 and driver is 450.102.04. When running launch.sh, got such error.
Detail log:

fauxpilot-triton-1         | W0812 03:06:40.864778 92 libfastertransformer.cc:620] Get output name: cum_log_probs, type: TYPE_FP32, shape: [-1]
fauxpilot-triton-1         | W0812 03:06:40.864782 92 libfastertransformer.cc:620] Get output name: output_log_probs, type: TYPE_FP32, shape: [-1, -1]
fauxpilot-triton-1         | [FT][WARNING] Custom All Reduce only supports 8 Ranks currently. Using NCCL as Comm.
fauxpilot-triton-1         | I0812 03:06:41.156692 92 libfastertransformer.cc:307] Before Loading Model:
fauxpilot-triton-1         | after allocation, free 6.56 GB total 8.00 GB
fauxpilot-triton-1         | [WARNING] gemm_config.in is not found; using default GEMM algo
fauxpilot-triton-1         | terminate called after throwing an instance of 'std::runtime_error'
fauxpilot-triton-1         |   what():  [FT][ERROR] CUDA runtime error: operation not supported /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:181 
fauxpilot-triton-1         | 
fauxpilot-triton-1         | [5f61fab36b85:00092] *** Process received signal ***
fauxpilot-triton-1         | [5f61fab36b85:00092] Signal: Aborted (6)
fauxpilot-triton-1         | [5f61fab36b85:00092] Signal code:  (-6)
fauxpilot-triton-1         | [5f61fab36b85:00092] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f3a7ef7e420]

moyix · 2022-08-13T16:29:48Z

This looks similar to #14 – the T4 is not that old (although you will have trouble running some of the models with only 8GB of RAM), and the compute capability is listed as 7.5, so I'm surprised it doesn't work. Could you try upgrading the version of CUDA to something more recent?

karlind · 2022-08-15T01:54:52Z

This looks similar to #14 – the T4 is not that old (although you will have trouble running some of the models with only 8GB of RAM), and the compute capability is listed as 7.5, so I'm surprised it doesn't work. Could you try upgrading the version of CUDA to something more recent?

I have already tried using cuda 10.0, 11.0, 11.2 and 11.7 in host machine. None of them works.

moyix · 2022-08-15T07:52:12Z

Did that include upgrading the NVIDIA driver? 450.102.04 seems to be fairly old (it came out in January 2021).

leemgs · 2022-08-18T03:21:07Z

terminate called after throwing an instance of 'std::runtime_error'
fauxpilot-triton-1 | what(): [FT][ERROR] CUDA runtime error: operation not supported /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:181
fauxpilot-triton-1 |

The source contents of ./utils/allocator.h:181 are as follows.

   virtual ~Allocator()
    {
        FT_LOG_DEBUG(__PRETTY_FUNCTION__);
        while (!pointer_mapping_->empty()) {
            free((void**)(&pointer_mapping_->begin()->second.first));
        }
        delete pointer_mapping_;
    }

In other words, the error seems to occur because
the GDDR memory capacity of the Nvidia GPU card required
to run the selected model is insufficient.

https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/utils/allocator.h#L181

Therefore, I interpret this issue as follows:

You had 6.56 GB of free space out of the total of 8 GB before running the model of your choice.
However, to execute CUDA's GEMM (General Matrix Multiply) algorithm,
In the process of allocating the required memory by the CUDA runtime, due to insufficient free memory capacity,
The process was killed on receiving the "aborted" signal.

luanshaotong · 2022-09-30T03:19:21Z

The same issue.

Driver Version: 515.65.01 CUDA Version: 11.7.
MODEL=codegen-350M-multi

I convert the model on my machine.
I use the old Tesla M60. Maybe the operation is not supported by the card?

luanshaotong · 2022-09-30T07:37:46Z

terminate called after throwing an instance of 'std::runtime_error'
fauxpilot-triton-1 | what(): [FT][ERROR] CUDA runtime error: operation not supported /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:181
fauxpilot-triton-1 |

The source contents of ./utils/allocator.h:181 are as follows.
   virtual ~Allocator()
    {
        FT_LOG_DEBUG(__PRETTY_FUNCTION__);
        while (!pointer_mapping_->empty()) {
            free((void**)(&pointer_mapping_->begin()->second.first));
        }
        delete pointer_mapping_;
    }
In other words, the error seems to occur because the GDDR memory capacity of the Nvidia GPU card required to run the selected model is insufficient.

https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/utils/allocator.h#L181

Therefore, I interpret this issue as follows:

You had 6.56 GB of free space out of the total of 8 GB before running the model of your choice. However, to execute CUDA's GEMM (General Matrix Multiply) algorithm, In the process of allocating the required memory by the CUDA runtime, due to insufficient free memory capacity, The process was killed on receiving the "aborted" signal.

Indeed I think the code that causes this issue is https://github.com/NVIDIA/FasterTransformer/blob/a44c38134cefe17a81c269b6ec23d91cfe4e7216/src/fastertransformer/utils/allocator.h#L181

I guess some old gpus, like tesla M60, don't support 'Async cudaMalloc/Free' even with cuda version higher than 11.2. Unfortunately, Fasttransformer don't know this. Evidence is ./launch.sh does not show log like https://github.com/NVIDIA/FasterTransformer/blob/a44c38134cefe17a81c269b6ec23d91cfe4e7216/src/fastertransformer/utils/allocator.h#L126 or https://github.com/NVIDIA/FasterTransformer/blob/f73a2cf66fb6bb4595277d0d029ac27601dd664c/src/fastertransformer/utils/allocator.h#L149 .

As for @karlind, you changed the cuda version to 11.0 on host machine, but in triton_with_ft container it is still cuda 11.7.

So i think maybe we could downgrade the cuda version in triton_with_ft docker image to 11.1 to avoid this issue. And wait Fasttransformer to fix it.

Now i don't know how to rebuild the triton_with_ft image. So i can't test the case. Can anyone help me and give some suggestions? @moyix @leemgs

luanshaotong · 2022-10-01T03:28:58Z

I rebuilt the triton_with_ft image by changing ft code like NVIDIA/FasterTransformer#263 (comment) .
Now it works. You can just use luanshaotong/triton_with_ft:22.06 instead of moyix/triton_with_ft:22.09.

~~however when I put requests, the server always returns errors.~~

~~Details about openai api:~~

~~And the same error for vscode copilot extension.~~

~~When stoped the triton container, copilot proxy still returns error code 422. It seems like a different issue.~~

Already solved.

luanshaotong · 2022-10-01T08:19:26Z

Sorry, I forgot to check the GPU compute capability. It's imposible to run FT on M60 which has compute capability of 5.2.

Though, I think my method will work for T4.

BTW, can I run fauxpilot on P40?

leemgs · 2022-11-09T05:46:20Z

BTW, can I run fauxpilot on P40?

Could you tell me the execution result (e.g., CC, Compute Capability) of deviceQuery?

https://github.com/NVIDIA/cuda-samples/tree/master/Samples/1_Utilities/deviceQuery

And, please, refer to the https://github.com/moyix/fauxpilot/wiki/GPU-Support-Matrix

luanshaotong · 2022-11-09T07:07:37Z

@leemgs I tested fauxpilot on my tesla p40 using main branch, and it works well( a little bit slow ). And I'm happy that I can use 6B model on it.
The compute capability of p40 is 6.1. I'll test the deviceQuery later.

leemgs · 2022-11-14T00:16:51Z

And I'm happy that I can use 6B model on it.

Congrats. I think that a major reaon is a video memory capacity of your GPU on virtual ~Allocator() issue.

haisongzhang · 2023-02-06T09:02:52Z

I am using a T4 gpu, host machine's cuda is 11.0 and driver is 450.102.04. When running launch.sh, got such error. Detail log:

fauxpilot-triton-1         | W0812 03:06:40.864778 92 libfastertransformer.cc:620] Get output name: cum_log_probs, type: TYPE_FP32, shape: [-1]
fauxpilot-triton-1         | W0812 03:06:40.864782 92 libfastertransformer.cc:620] Get output name: output_log_probs, type: TYPE_FP32, shape: [-1, -1]
fauxpilot-triton-1         | [FT][WARNING] Custom All Reduce only supports 8 Ranks currently. Using NCCL as Comm.
fauxpilot-triton-1         | I0812 03:06:41.156692 92 libfastertransformer.cc:307] Before Loading Model:
fauxpilot-triton-1         | after allocation, free 6.56 GB total 8.00 GB
fauxpilot-triton-1         | [WARNING] gemm_config.in is not found; using default GEMM algo
fauxpilot-triton-1         | terminate called after throwing an instance of 'std::runtime_error'
fauxpilot-triton-1         |   what():  [FT][ERROR] CUDA runtime error: operation not supported /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:181 
fauxpilot-triton-1         | 
fauxpilot-triton-1         | [5f61fab36b85:00092] *** Process received signal ***
fauxpilot-triton-1         | [5f61fab36b85:00092] Signal: Aborted (6)
fauxpilot-triton-1         | [5f61fab36b85:00092] Signal code:  (-6)
fauxpilot-triton-1         | [5f61fab36b85:00092] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f3a7ef7e420]

do you have solved this problem? and finally how do you do? thanks. my gpu is T4-8C, can i load codegen-350M-mono model and build a server?

karlind changed the title ~~[FT][ERROR] CUDA runtime error: operation not supported /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/allocator.h:181~~ [FT][ERROR] CUDA runtime error: operation not supported Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FT][ERROR] CUDA runtime error: operation not supported #20

[FT][ERROR] CUDA runtime error: operation not supported #20

karlind commented Aug 12, 2022

moyix commented Aug 13, 2022

karlind commented Aug 15, 2022

moyix commented Aug 15, 2022

leemgs commented Aug 18, 2022 •

edited

luanshaotong commented Sep 30, 2022

luanshaotong commented Sep 30, 2022 •

edited

luanshaotong commented Oct 1, 2022 •

edited

luanshaotong commented Oct 1, 2022 •

edited

leemgs commented Nov 9, 2022

luanshaotong commented Nov 9, 2022

leemgs commented Nov 14, 2022

haisongzhang commented Feb 6, 2023

[FT][ERROR] CUDA runtime error: operation not supported #20

[FT][ERROR] CUDA runtime error: operation not supported #20

Comments

karlind commented Aug 12, 2022

moyix commented Aug 13, 2022

karlind commented Aug 15, 2022

moyix commented Aug 15, 2022

leemgs commented Aug 18, 2022 • edited

luanshaotong commented Sep 30, 2022

luanshaotong commented Sep 30, 2022 • edited

luanshaotong commented Oct 1, 2022 • edited

luanshaotong commented Oct 1, 2022 • edited

leemgs commented Nov 9, 2022

luanshaotong commented Nov 9, 2022

leemgs commented Nov 14, 2022

haisongzhang commented Feb 6, 2023

leemgs commented Aug 18, 2022 •

edited

luanshaotong commented Sep 30, 2022 •

edited

luanshaotong commented Oct 1, 2022 •

edited

luanshaotong commented Oct 1, 2022 •

edited