Error in launch using a docker image #4242

hzhaoy · 2024-06-12T08:44:22Z

Reminder

I have read the README and searched the existing issues.

System Info

System: Ubuntu 20.04.2 LTS
GPU: NVIDIA A100-SXM4-80GB
Docker: 24.0.0
Docker Compose: v2.17.3
llamafactory: 0.8.2.dev0

Reproduction

Dockerfile: https://github.com/hiyouga/LLaMA-Factory/blob/557891debb8a64b73eea012f99780a7b76424cd5/Dockerfile

Build Command:

docker build -f ./Dockerfile \
    --build-arg INSTALL_BNB=true \
    --build-arg INSTALL_VLLM=true \
    --build-arg INSTALL_DEEPSPEED=true \
    --build-arg PIP_INDEX=https://pypi.tuna.tsinghua.edu.cn/simple \
    -t llamafactory:latest .

docker-compose.yml

name: llm-fct

services:
  webui:
    image: llamafactory:latest
    command: ["llamafactory-cli", "webui"]
    volumes:
      - /models:/models
      - ./hf_cache:/root/.cache/huggingface/
      - ./data:/app/data
      - ./output:/app/output
    ports:
      - "7860:7860"
      - "8000:8000"
    ipc: host
    security_opt:
      - seccomp:unconfined
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: "all"
            capabilities: [gpu]
    restart: unless-stopped

Startup Command:
docker compose -f docker-compose.yml up -d

Error:
llm-fct-webui-1 | Traceback (most recent call last):
llm-fct-webui-1 | File "/usr/local/bin/llamafactory-cli", line 5, in
llm-fct-webui-1 | from llamafactory.cli import main
llm-fct-webui-1 | File "/app/src/llamafactory/init.py", line 3, in
llm-fct-webui-1 | from .cli import VERSION
llm-fct-webui-1 | File "/app/src/llamafactory/cli.py", line 7, in
llm-fct-webui-1 | from . import launcher
llm-fct-webui-1 | File "/app/src/llamafactory/launcher.py", line 1, in
llm-fct-webui-1 | from llamafactory.train.tuner import run_exp
llm-fct-webui-1 | File "/app/src/llamafactory/train/tuner.py", line 10, in
llm-fct-webui-1 | from ..model import load_model, load_tokenizer
llm-fct-webui-1 | File "/app/src/llamafactory/model/init.py", line 1, in
llm-fct-webui-1 | from .loader import load_config, load_model, load_tokenizer
llm-fct-webui-1 | File "/app/src/llamafactory/model/loader.py", line 13, in
llm-fct-webui-1 | from .patcher import patch_config, patch_model, patch_tokenizer, patch_valuehead_model
llm-fct-webui-1 | File "/app/src/llamafactory/model/patcher.py", line 16, in
llm-fct-webui-1 | from .model_utils.longlora import configure_longlora
llm-fct-webui-1 | File "/app/src/llamafactory/model/model_utils/longlora.py", line 6, in
llm-fct-webui-1 | from transformers.models.llama.modeling_llama import (
llm-fct-webui-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 54, in
llm-fct-webui-1 | from flash_attn import flash_attn_func, flash_attn_varlen_func
llm-fct-webui-1 | File "/usr/local/lib/python3.10/dist-packages/flash_attn/init.py", line 3, in
llm-fct-webui-1 | from flash_attn.flash_attn_interface import (
llm-fct-webui-1 | File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in
llm-fct-webui-1 | import flash_attn_2_cuda as flash_attn_cuda
llm-fct-webui-1 | ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

Expected behavior

Successfully started

Others

Maybe there are some solutions here oobabooga/text-generation-webui#4182
And I found that everything is fine when using nvcr.io/nvidia/pytorch:24.01-py3 as the base image instead of nvcr.io/nvidia/pytorch:24.02-py3.

The text was updated successfully, but these errors were encountered:

hiyouga · 2024-06-12T08:50:25Z

please try again with the latest docker file

github-actions bot added the pending This problem is yet to be addressed label Jun 12, 2024

hiyouga closed this as completed in 577de2f Jun 12, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 12, 2024

camposs1979 mentioned this issue Jun 13, 2024

采用最新代码，运行vllm（0.4.3）报错：undefined symbol: _ZN2at4_ops5zeros4ca... #4264

Closed

1 task

hzhaoy mentioned this issue Jun 25, 2024

support flash-attn in Dockerfile #4461

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in launch using a docker image #4242

Error in launch using a docker image #4242

hzhaoy commented Jun 12, 2024

hiyouga commented Jun 12, 2024

Error in launch using a docker image #4242

Error in launch using a docker image #4242

Comments

hzhaoy commented Jun 12, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

hiyouga commented Jun 12, 2024