Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q:how to run cli via cuda in docker container #229

Closed
dengzheng-cloud opened this issue May 25, 2023 · 3 comments
Closed

Q:how to run cli via cuda in docker container #229

dengzheng-cloud opened this issue May 25, 2023 · 3 comments
Labels
question Question about the usage

Comments

@dengzheng-cloud
Copy link

im using docker container (ubuntu22.04 cuda 11.8, without vulkan), graphics card:A100 dirver:470.42 cuda:11.4
python3 build.py --model path/to/vicuna-v1-7b --quantization q3f16_0 --max-seq-len 768
only output vicuna-v1-7b-q3f16_0-cpu.so
i tried to install vicuna-v1-7b-q3f16_0-vulkan.so from https://github.com/mlc-ai/binary-mlc-llm-libs
error log is
image
is there a way to use cuda directly without exchanging driver inside container, or supporting a dockerfile

@junrushao junrushao added the question Question about the usage label May 26, 2023
@junrushao
Copy link
Member

junrushao commented May 26, 2023

I'm trying to figure out what you are actually looking for. Are you trying to build or run a model? Those are two different requests.

The screenshot indicates that you are trying to run a prebuilt model with Vulkan, but as I could possibly infer from your question "a way to use cuda directly without exchanging driver inside", there is no Vulkan or CUDA inside your docker image, so it's impossible to run a CUDA/Vulkan code without the driver, right?

Instead if you are trying to build the model, it is definitely possible in MLC LLM, meaning as long as you have the compiler toolchain (e.g. nvcc), you can build a model that could be run elsewhere.

@dengzheng-cloud
Copy link
Author

@junrushao yeah, i found that there is mistake in what i describe, i didn't specify the target in prebuilt model,and didn't make USE_CUDA=ON in compile mlc_llm

@dengzheng-cloud
Copy link
Author

dengzheng-cloud commented May 26, 2023

for all cuda users, content as blow is how i compile mlc_llm

// compile tvm
git clone https://github.com/mlc-ai/relax.git --recursive
cd relax
mkdir build
cp cmake/config.cmake build
// modify USE_CUDA=ON, USE_CUDNN=ON, USE_CUBLAS=ON
make -j
export TVM_HOME=/path/to/relax
export PYTHONPATH=$PYTHONPATH:$TVM_HOME/python
// here i use my local vicuna-v1-7b

git clone https://github.com/mlc-ai/mlc-llm.git --recursive
cd mlc-llm
// prebuild model get done here
python3 build.py --model path/o/vicuna-v1-7b --quantization q4f16_0 --target cuda --max-seq-len 768

// compile mlc-llm
mkdir build && cd build
cmake .. -DUSE_CUDA=ON
make
cd ..

then run the mlc-chat-cli, set --device-name cuda will work
i have not verified these again, if something go wrong, please comment, if i know, i will reply
#119 did help a lot.

shtinsa pushed a commit to Deelvin/mlc-llm that referenced this issue Mar 25, 2024
… runtime to improve perf (mlc-ai#229)

Use custom nd to scalar absolute max reduce kernel
in max calibration runtime to improve perfomance.

Co-authored-by: Chris Sullivan <csullivan@octo.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question about the usage
Projects
None yet
Development

No branches or pull requests

2 participants