Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TGI seems is not supported that other position encoding, like baichuan2-7B(alibi position). #1786

Closed
2 of 4 tasks
Night-Quiet opened this issue Apr 22, 2024 · 1 comment
Closed
2 of 4 tasks

Comments

@Night-Quiet
Copy link

Night-Quiet commented Apr 22, 2024

System Info

2024-04-22T02:18:37.204227Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.75.0
Commit sha: 2d0a7173d4891e7cd5f9b77f8e0987b82a339e51
Docker label: N/A
nvidia-smi:
Mon Apr 22 10:18:37 2024       
   +---------------------------------------------------------------------------------------+
   | NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
   |-----------------------------------------+----------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
   |                                         |                      |               MIG M. |
   |=========================================+======================+======================|
   |   0  NVIDIA GeForce RTX 4090        On  | 00000000:1C:00.0 Off |                  Off |
   | 30%   29C    P8              30W / 450W |      2MiB / 24564MiB |      0%      Default |
   |                                         |                      |                  N/A |
   +-----------------------------------------+----------------------+----------------------+
                                                                                            
   +---------------------------------------------------------------------------------------+
   | Processes:                                                                            |
   |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
   |        ID   ID                                                             Usage      |
   |=======================================================================================|
   |  No running processes found                                                           |
   +---------------------------------------------------------------------------------------+

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

text-generation-launcher --model-id /root/autodl-tmp/baichuan --trust-remote-code

2024-04-22T02:11:00.927702Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2024-04-22T02:11:00.928097Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-04-22T02:11:03.592505Z ERROR text_generation_launcher: exllamav2_kernels not installed.

2024-04-22T02:11:03.623388Z  WARN text_generation_launcher: Could not import Flash Attention enabled models: cannot import name 'FastLayerNorm' from 'text_generation_server.utils.layers' (/root/text-generation-inference/server/text_generation_server/utils/layers.py)

2024-04-22T02:11:03.623918Z  WARN text_generation_launcher: Could not import Mamba: No module named 'mamba_ssm'

2024-04-22T02:11:04.132360Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Traceback (most recent call last):

  File "/root/miniconda3/envs/tgi/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
             ^^^^^

  File "/root/text-generation-inference/server/text_generation_server/cli.py", line 71, in serve
    from text_generation_server import server

  File "/root/text-generation-inference/server/text_generation_server/server.py", line 16, in <module>
    from text_generation_server.models.vlm_causal_lm import VlmCausalLMBatch

  File "/root/text-generation-inference/server/text_generation_server/models/vlm_causal_lm.py", line 14, in <module>
    from text_generation_server.models.flash_mistral import (

  File "/root/text-generation-inference/server/text_generation_server/models/flash_mistral.py", line 18, in <module>
    from text_generation_server.models.custom_modeling.flash_mistral_modeling import (

  File "/root/text-generation-inference/server/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 30, in <module>
    from text_generation_server.utils.layers import (

ImportError: cannot import name 'PositionRotaryEmbedding' from 'text_generation_server.utils.layers' (/root/text-generation-inference/server/text_generation_server/utils/layers.py)
 rank=0
2024-04-22T02:11:04.230211Z ERROR text_generation_launcher: Shard 0 failed to start
2024-04-22T02:11:04.230247Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

Expected behavior

Please let me know if it supports baichuan-inc/Baichuan2-7B-Chat, or perhaps this error was caused by some of my operational errors. Thank you.

@Night-Quiet
Copy link
Author

I referred to the following link to complete the repair:
#1778

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant